Sign up for Actminds’ newsletter to get inspiring ideas to
transform your company into a digital company.

Machine Learning – What is it?

Dec 13, 2017

Machine Learning – What is it?

You should have already listened to the term “Machine Learning”. But do you know exactly what it is?

Let’s imagine that you have several emails and you need to be able to define whether these emails are spam or not.

Looking at the details of old emails you have received, it is possible to list some characteristics, as shown in the table below. This table contains 3 made-up characteristics in order to make the explanation easier.

machine learning

Analyzing this data set, we realize that most e-mails containing the words “Lose weight” were SPAM. Usually e-mails that were sent to more than 10 recipients were also SPAM. Having images in the e-mail body did not influence the result.

So, we can try to predict if an e-mail is spam or not.

This is exactly what Machine Learning algorithms do.

Based on historical data (several received e-mails that were classified as SPAM or NOT SPAM), it is possible to apply an algorithm that extracts rules from these data, creating a model. We are TRAINING the model.

If we receive a new e-mail, we can apply the model and generate a probable classification. In this step, we are PREDICTING results.

Usually, we do not use all historical data to train the model.

The ideal approach is to split the data into 2 groups: 70% and 30% (or 90% and 10%, depending on what is best for the business rules).

70% of the data is used to train the model and the remaining 30% is applied to the generated model, in order to verify if the classification predicted by it is really correct.

This way we can verify if the model tends to work for unknown data. We call this TESTING the model.

If 30% of the data correspond to 10 e-mails and the model correctly predicted 9 of them, we can say that the accuracy of our model is 90%.

In summary:

  • first, we need to TRAIN the model based on 70% of the historical data;
  • second, we need to TEST if the model can predict correct results, comparing the predicted results to the remaining 30% of the data;
  • lastly, the model can be used in production to PREDICT new data.

Classifying an e-mail as SPAM or NOT SPAM is a classic case of Machine Learning. Obviously, it is a very simple case and Machine Learning algorithms make a lot of other kinds of predictions. For instance:

  • what is the next movie the customer should like to watch?
  • what product should I suggest based on the probability of the customer buying it?
  • do these patient symptoms indicate any type of cancer?
  • can a specific financial transaction represent a fraud?

There are several Machine Learning algorithms out of the box and they are divided into 2 main categories:

  • supervised learning: use historical data to predict new ones (as in the example above);
  • unsupervised learning: predict new data without having historical data to be based on.

 

Author
Ronaldo Chicareli
Software Architect

 

Custom Application Development
Application Modernization & Integration
Strategic Cloud Enablement
Application Managed Services

Learn More

Headquarters
(650) 353-5019 1801 Market Street, 17th Floor Philadelphia, PA 19103 – USA

Pin It on Pinterest

Share This

Share this post with your friends!