Supervised learning is a common approach in machine learning where models are trained on labeled data. However, obtaining large datasets with clean, human-annotated labels can be expensive and time-consuming. As a result, there has been increasing interest in techniques that reduce the amount of labeled data needed.
Here are 4 different types of supervision:
Training with labels that may include mistakes.
- Example: You want to teach a computer to classify fruits like apples, bananas, and oranges based on images. You have a dataset, but some of the labels might be incorrect due to occasional labeling mistakes. For instance, a banana image is labeled as an apple or vice versa.
- Learning Approach: In weak supervision, you would train the computer to recognize fruits using this dataset, even though you know some labels might be wrong. The computer learns from the data with the understanding that it’s not always reliable.
Using external sources to label data indirectly.
- Example: You’re trying to build a fruit classification model, but you don’t have the time or resources to manually label each fruit image. Instead, you decide to use external sources like fruit websites to label your images. If a fruit image appears on a fruit-selling website, you label it as the corresponding fruit.
- Learning Approach: Distant supervision means your computer learns from these indirectly assigned labels, understanding that they might not always be perfectly accurate since they depend on external sources.
Combining labeled data with a larger set of unlabeled data.
- Example: In this case, you have some labeled fruit images, such as apples and bananas, and a large collection of unlabeled fruit images. The labeled dataset represents only a small portion of your entire dataset.
- Learning Approach: Semi-supervised learning would involve training your computer using the labeled images of apples and bananas, and then incorporating the information from the…