Regression models, a cornerstone of supervised learning, enable algorithms to predict numerical outcomes based on historical data. They exemplify the power of machine learning by identifying and utilizing patterns within labeled datasets to forecast values, such as estimating a person’s weight from their height.

**What is supervised learning?**

Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset. This means that the algorithm is given examples of input variables (X) along with the corresponding output variables (Y), which are the labels or targets. The goal of supervised learning is for the algorithm to learn a mapping function from the input to the output so that when it is provided with new, unseen input data (X), it can predict the appropriate output (Y). Essentially, the algorithm uses the examples from the training set to infer the underlying rules or patterns that determine the relationship between the input and output variables, which it then applies to make predictions on new data.

**Linear Regression Model**

A regression model is a statistical tool in the field of machine learning and statistics that is used to predict continuous outcomes. It works by estimating the relationships among variables. The model essentially maps the input variables (predictors) to a continuous output variable (response) by fitting a line (in linear regression) or a more complex surface (in non-linear regression) through the data points.

For example, consider a dataset that contains the heights and weights of individuals. Using this data, we can predict a person’s weight based on their height.

One approach to estimate this is to use a linear regression model constructed from a dataset of heights and weights. Your model will fit a straight line through the data, which could be represented graphically. Imagine that based on this straight line fit to the dataset, you observe that a person is 170 centimeters tall. The line intersects with the person’s height on the graph, and if you follow this intersection point horizontally to the left, you reach the vertical axis that denotes weight. Right there, the model indicates a weight, let’s say around 86 kilograms.

This approach exemplifies what’s called a supervised learning model. We term it ‘supervised learning’ because you train the model with data that already contains the ‘right answers’ — in this case, you provide the model with examples of individuals’ heights and their corresponding weights. The ‘right answers’, meaning the actual weights, are available for every individual in the dataset.

The linear regression model employed here is a specific type of supervised learning model focused on regression because it predicts a continuous numeric output, like weight in kilograms. Any supervised learning model that predicts numerical values, be it 86 kg, 0.5 kg, or even negative values, is solving a regression problem. Linear regression is just one form of a regression model, but there are various other models that can also address regression problems in the realm of health and beyond.