Explanation of the generalized additive model on a university level

The Generalized Additive Model, or short GAM, is a machine learning regression model first mentioned in the 1990s that is similar in structure to the multiple linear model.

(Baayen, R. H. and M. Linke (2020). “An introduction to the generalized additive model.” A practical handbook of corpus linguistics. New York: Springer: 563–591.)

More on the linear regression here:

The fundamental difference between the two models is that the generalized additive model does not assume a linear dependence between the independent characteristics and the dependent characteristics.

(Hastie, T. and R. Tibshirani (1987). “Generalized additive models: some applications.” Journal of the American Statistical Association 82(398): 371–386.)

The formula to set up a regression equation for a GAM is:

f1, f2, f3, …, fn are so-called smooth functions, which represent a continuous curve and the relationship between the dependent variable and the independent variables x1, x2, x3, …, xn.

(Baayen, R. H. and M. Linke (2020). “An introduction to the generalized additive model.” A practical handbook of corpus linguistics. New York: Springer: 563–591.)

These functions are estimated using splines, which are a flexible method for modeling non-linear relationships between variables. Splines are special curves composed of piecewise defined polynomials and are used in mathematics and engineering for interpolation and approximation. These are chosen in such a way that they represent the data as well as possible and at the same time limit the complexity of the model.

(Ma, S. (2012). “Two-step spline estimating equations for generalized additive partially linear models with large cluster sizes.” The Annals of Statistics 40(6): 2943–2972, 2930.)