In machine learning, KNN (K-Nearest Neighbors) plays an important role in classification and regression tasks.

The major challenge when using KNN is choosing the right (best) value for *k *which is* *the number of neighbor instances considered for a new-instance classification.

In technical terms, *k* is a hyperparameter in the KNN algorithm. The user needs to define its best value, as it can’t learn the value from the input data.

`from sklearn.neighbors import NearestNeighbors`KNN = NearestNeighbors(n_neighbors=???)

In the Scikit-learn KNN class, *k* is specified as a hyperparameter using the ** n_neighbors **argument. Scikit-learn provides a default value of 5, but it is useless in most cases as the best

*k*value depends on many other factors.

The theoretical largest for *k* is the total number of observations in the dataset. The smallest value is 1. But, we never use these two extremes. The best value occurs somewhere between the highest and lowest.

Today, we will discuss six effective methods of choosing the right *k* value. We will also discuss the effect of the *k* value on KNN model performance by plotting decision boundaries.

Both regression and classification tasks can be performed with KNN. But, here, we will only consider building classification models.

In machine learning terms, KNN is a supervised learning algorithm. It requires labeled data. When fitting the model, we need to provide both a data matrix and a label (target) vector as X, y. More technically, KNN falls under the category of **instance-based learning** which is also called **lazy learning**.

In the training phase, an instance-based model such as KNN does not learn anything from data, instead, it only stores data and nothing happens. No parameters are learned from the data. Thatâ€™s why instance-based methods are also known as **non-parametric** methods.

In the testing phase (in the case of predicting the class of a new instance), the algorithmâ€¦