Supervised learning is a type of machine learning that uses labeled sets of data to train artificial intelligence (AI). Here's what supervised learning is all about, how it works, and its applications.
In supervised learning, an AI algorithm is fed training data (inputs) with clear labels (outputs). Based on the training set, the AI learns how to label future inputs of unlabeled data. Ideally, the algorithm will improve its accuracy as it learns from past experiences.
If you wanted to train an AI algorithm to classify shapes, you would show it examples of accurately labeled shapes along with instructions explaining the reasoning behind each label (for example, "a shape that has three sides is a triangle" or "a shape with four sides is a square.")
Once you've provided the training data, you would test the algorithm by showing it shapes without labels. The AI will then use its knowledge from the training set to assign the appropriate labels/outputs to each shape.
Supervised learning is used to train AI algorithms to perform many tasks, including:
To adequately train a supervised learning algorithm, you need a lot of accurately labeled data. The training data set must also be diverse enough for the algorithm to identify slight pattern variances.
One of the benefits of supervised learning is that it can be highly accurate, but high accuracy isn't always good. That's because it could indicate overfitting, which is when the training and test data are too similar. When you test the algorithm, the test data should be different enough from the training set to ensure it will work in real-world settings.
The training data is unlabeled in unsupervised learning, so the AI must identify patterns and create its own labels. In semi-supervised learning, part of the input data is already labeled.
Supervised learning can be time-consuming since it requires a human, or supervisor, to label all the data in the training set. A supervisor must also test the algorithm for accuracy. This introduces the possibility of human error, so the person labeling the training data must be a data expert.
Unlike unsupervised learning, supervised learning algorithms can't classify data independently. So, if a supervised learning algorithm trained to identify triangles and squares is presented with a hexagon, it wouldn't be able to label it. If it were an unsupervised algorithm, it would identify the hexagon as neither a triangle nor a square and create a new category.
Supervised vs. Unsupervised Learning: What's the Difference?Supervised learning algorithms can be divided into two types:
Within these two categories are several popular supervised learning algorithms like linear regression, logistic regression, and naive Bayes classifiers. Some algorithms, such as support vector machines (SVM) and random forests, combine elements of classification and regression.
Supervised learning algorithms can be combined with neural networks to reassess their own outputs and fine-tune themselves.
FAQSelf-supervised learning is similar to supervised learning in that an algorithm uses past examples to identify new data. The difference is that in self-supervised learning, humans don't provide labels. It's also distinct from unsupervised learning, however, in that later stages of a self-supervised training program can include some supervised tasks.
Supervised learning is most useful when you have objects that you definitely want to train the program to identify. For example, autonomous car programmers really want vehicles to know a stop sign when they see one. Unsupervised learning's application is more for building understanding of a particular field (e.g., physics).