Artificial Intelligence is the study and practice that enables machines to solve problems like a human (i.e. solving problems intelligently). The broad field of AI is a superset that includes the field of machine learning.

In contrast to traditional programming where explicit steps to achieve a task are provided by the programmer, **machine learning** enables a machine to perform a task optimally by learning from examples (i.e. through analysis of input data and its relationship with the desired output.) Using the cake baking process as a descriptive analogy, we could either provide a machine with a recipe for baking a sponge cake (*traditional programming*) or we could provide the machine with cake ingredients and an already baked cake and allow it to learn through trial and error how best to interact with the ingredients in order to get the desired cake (machine learning).

*How do machines learn from examples? *

Studies have shown that almost every phenomenon can be modelled, analyzed or explained using mathematical formula. So yes, the magic tool for machine learning is none other than mathematical functions. Supposing we are given an input, say cake ingredients and on another hand, we are given a nicely baked cake as the desired output, what our mathematical function does is this; it accepts the ingredients as input, plays with them following certain mathematical laws using a function, then returns the desired result to us, a cake as output.

There are numerous mathematical functions or algorithms/models applied in machine learning some of which include Random Forests, Linear Regression, Logistic Regression, Support Vector Machines, Neural Networks etc.

The use of large neural networks for machine learning purposes is referred to as **deep learning**. This means that deep learning is indeed a subset of the field of machine learning, contrary to the false belief that machine learning and deep learning are two separate subsets of artificial intelligence.

Deep learning as a subset of AI has become so popular over the years that there is a common reference to AI as consisting of **‘***deep learning and other machine learning algorithms.’*

In order to understand the reason for deep learning’s popularity, let us compare a common machine learning algorithm like logistic regression with a deep learning approach.

The problem at hand is email spam classification. This means that we want our machine to be able to correctly classify relevant emails from spam ones.

We have a collection of emails both relevant and spam, and that is our data. Our desired result is a label that says if a particular email is spam or not spam.

For most machine learning models such as the logistic regression model, after gathering our data and their corresponding labels, another necessity is to have something called **features**, which must be selected carefully through a tedious process called ** Feature Selection**.

Feature selection involves detailed analysis and manipulation of all available data attributes so as to emerge with the most useful features for an excellent model.

Features are attributes that can be extracted from data. For example: If we are gathering data for house price prediction, possible data features include; size of the house, age, number of rooms, location etc. or if we have pictures of women from different nationalities, possible features could include; face shape, hair color, skin color etc.

It is important to note that the features chosen must be relevant to the problem to be solved. It is not wise to choose the color of paint for the house prediction problem or presence of eyes as a feature from the women’s pictures. If irrelevant choices of features are made, the accuracy of your model will be greatly compromised. Basically, an ideal feature is a characteristic of the data that a human expert will consider while trying to solve the problem at hand.

In our case, we have emails. Possible features that a human would look for when deciding if the email is spam or not would include things like; presence of certain words like ‘deal’, ‘free’, ‘offer’, ‘buy’, presence/absence of email subject, sender’s email address etc.

Note that some features can be numeric such as age, length and size, while others are referred to as categorical; like email addresses, words or categories such as female vs male etc. Numerical features can be passed directly into mathematical functions, while categorical data require a process called encoding in which specific values are assigned to represent a particular category. This means a feature like presence of email subject could have the number ‘1’represent email subject present, and ‘0’ represent email subject absent.

For logistic regression, we take in these features as our input to our function (usually a linear function such as mx + b = y or even a polynomial function, where x represents the input), after which the result from the function is compared against a threshold function (here, for values above a certain threshold say 0.5, predict that the email is spam, while for values below the threshold predict that the email is not spam).

In deep learning we do not need features selection. We usually pass in the data in its raw form and allow the neural network to extract the important data attributes that it needs by itself. So, the first layer of neurons in our neural network will receive the emails as they are, or rather direct numerical representations of the contents of each email (as it is necessary for all non-numerical data to be converted to a numerical representation for easy interaction with the model).

Several approaches for understanding and visualizing Convolutional Networks have been developed in the literature, and it was discovered that each layer in a deep convolutional neural network is usually dedicated to detecting the presence of a particular detail or feature in the input data.

This means there is no need for the expertise usually needed for good feature selection during a model like logistic regression. In a nutshell, we say that deep learning does not require structured data (data with appropriate features) unlike other machine learning models.

And that is awesome news, because I do not need to be a medical doctor to be able to train a deep learning model on how to detect cancer from images of patients; all I need is data! And more importantly we can save time and effort used up in performing feature selection, extraction and engineering. Please note that feature selection and feature engineering for most real-world problems can be very tasking and the success of your models greatly depends on the kind of features you choose.

The second reason is that deep learning has achieved unbeatable results in solutions for most of our very challenging real-world problems. Problems like image classification, object detection, image segmentation, visual relationship identification, natural language processing, speech to text processing, etc. have been solved to an astonishing degree using deep neural networks. We can also use deep learning for tabular data classification and regression problems like the famous Titanic- Predict Survival and Housing Prices problems.

## Leave a Reply