Topic to be included:

RBF Kernel

Advantages of RBF Kernel

Local Decision Boundary

Effect of Gamma

Relationship Between RBF and Polynomial Kernel

RBF kernel shares significant similarities with the **“Normal Distribution”**

It is the

In Scikit Learn. This is the best **“Default Kernel”**

which means when we are not sure which kernel performs well on data, we can use **“out of the box Kernel”**

It can work with so many types of datasets.**“RBF”**

RBF kernel is a measure of how alike or closely related two data points are in a dataset. The RBF kernel`is often used to`

also known as the "Gaussian Kernel"`.`

compute the similarity between data points based on their Euclidean distance in a high-dimensional space

It is a powerful kernel as it can create such

.**types of decision boundaries as per type of data which is difficult with a polynomial kernel**

**It can be rewritten as:**

In sklearn gamma **"γ"**** **is a **"Hyperparameter"**

Both formulas are the same, we can write both ways.

**How Calculating Kernel is the same as calculating similarity :**

This quantity is basically the **||xi -xj ||**

between **“Euclidean Distance”**

.**xi and xj vector**

So it square ( ||xi -xj ||² ) will be :

If

the entire **distance increases**

quantity will **( e — dist² / 2σ² )**

because we have **decrease**

. If this decreases **e to the power negative**

.**K(xi, xj) will also decrease**

So in a way, we can say

. So **K is inversely proportional to distance**

.**k can be treated as a similarity**

If we have two points and if we decrease the distance between them, then it will increase similarity. And if increasing the distance between them then similarity will reduce. So by this logic, we can understand kernel function is a different flavor of calculating similarity.

The

assigns a higher similarity value **RBF kernel**

when data points are close to each other and a lower similarity value **(closer to 1)**

when they are farther apart. **(closer to 0)***This means that data points with smaller Euclidean distances are considered more similar, and data points with larger distances are considered less similar.*

In summary, similarity in the context of RBF Kernel is a measure of how close or similar two data points are, and it is calculated using the formula that takes into account the Euclidean distance between these points. The RBF kernel quantifies this similarity, which is useful in various machine-learning tasks, including clustering, classification, and regression.

**1. Non-Linear Transformation-**

It means by using RBF we can create Nonlinear Boundaries to classify data.

## 2. Local Decisions-

Unlike some other kernels, the

. That is the effect of each data point is limited to a certain region around that point. This can make the model more robust to outliers and create complex decision boundaries.**RBF kernel makes “Local Decisions”**

## 3. Flexibility-

The RBF Kernel has a parameter

that determines the complexity of the decision boundary. **gamma “γ”**

, allowing for a flexible range of decision boundaries.**By tuning this parameter, we can adjust the trade-off between bias and variance**

## 4. Universal Approximation Property-

The RBF Kernel has a property known as the universal approximation property, meaning it can approximate any continuous function.

just like Neural Networks not that good but gives good results. Basically, RBF is applicable to any kind of data.**There is any kind of relationship between X and Y Linear, Non-Linear, Polynomial, Exponential, any relationship RBF roughly approximate the function**

## 5. General Purpose-

The RBF Kernel does not make any strong assumptions about the data and can therefore be a good choice in many different situations, making it a versatile, general-purpose kernel.

and this is why this is very robust and useful in very different types of scenarios.**RBF does not have any pre-assumption so it can work easily with any type of data**

when graphing this function we can see the distance between two points xi and xj and what will be the impact on their similarity. if we have data points xi and xj and find the distance between both points let’s say the distance is

. and if **“0.5” then similarities between both points will high**

. **distance increase lets say “1” then can see on graph similaritiy decreasing exponetially**

. Between two points **if ditance is “2” between two points then similarity derease drastically and reach towards “Zero”****maximum similarity can be “One” and if distance is zero then it is become maximum similarity and it will be “1”**

*So it means if the points distance between (-2 to 2) then some similarity exists and if outside this range we have any distance we can say there is no similarity between two points, and both points are different.*

*Because of this function e^-x² relationship between **“Similarity” and “Distance” is decreasing exponentially**. which means **if distance is between some range similarity exist , behind that range no similarity** so it means every point has a region, so in a region, if one point finds another point it will have some similarity to another it won’t find any similarity. That’s the reason there is **“Local Decision Boundary”** RBF has one power of **“create small small regions” **which helps in creating “**complex decision boundary”*

The parameter γ in the “Radial Basis Function” (RBF) kernel of a Support Vector Machine (SVM) is a hyperparameter that determines the spread of the kernel and therefore the decision region.

Suppose we have different sigma square “σ² “values “1”, “100” and “0.01” and we create a graph by this all sigma square value.

where “Black Curve” represents “ σ² = 100”, “Purple Curve” represents “ σ² = 1” and “Red Curve” represents “ σ² = 0.01”

As the “**sigma square**” is in the **“denominator” **its impact same as **“sigma”** in **“Normal Distribution”**. In normal distribution, we can control bell curve distance with the help of sigma.

When the “**sigma is very big**” the “**curve starts becoming “flat”** so the **“local region”** of “**every point also increases**” and every point becomes more accommodative for another point. It means if the distance is high still data points are counting in the same region.

So if we** increase sigma “σ “ “range” also increases** and points become “accommodative” and the locality of points increases and more distance-based data points also take in a similar region.

Whereas, if we** decrease the sigma “σ” it looks like a shrinking bell curve** which means the local region also decreases and the same region points will become very low.

In essence, we can view

in this context. **“sigma σ”** as a crucial **"hyperparameter"**

. When we adjust gamma:**“Gamma γ” serves as the inverse of sigma**

meaning data points are considered similar over a smaller range.**Decreasing gamma enhances “locality,”**

causing data points to be seen as similar over a broader range.**Increasing gamma reduces “locality,”**

Now, if we significantly reduce sigma σ to a very small value, such as 0.1, we start to count data points as similar within an extremely narrow range. This results in very tight decision boundaries, where each data point essentially only accepts other points in its immediate vicinity. The consequence is “Overfitting” because our model tailors the decision boundaries too precisely to the training data.

So, to sum it up:

## Small sigma or increased gamma = “Overfitting” (too specific decision boundaries)

## Large sigma or decreased gamma = “Underfitting” (too general and inaccurate decision boundaries)

These adjustments help us strike the right balance between model complexity and generalization in machine learning.

In a sense, γ in the RBF kernel plays a role similar to that of the inverse of the regularization parameter: it controls the trade-off between Bias (Underfitting) and variance (Overfitting).

Tuning the **γ parameters using “Cross Validation”** or a similar technique is typically a **crucial step when training SVMs** with an **RBF kernel**.

## Code:

`import numpy as np`

import matplotlib.pyplot as plt

from sklearn import svm, datasets

from sklearn.model_selection import train_test_split# Load iris dataset

iris = datasets.load_iris()

X = iris.data[:, :2] # we only take the first two features for visualization

y = iris.target

# Create a mesh to plot in

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

h = (x_max / x_min)/100

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

# Title for the plots

titles = ['Low gamma value', 'Medium gamma value', 'High gamma value']

# Different gamma values to experiment with

gamma_values = [0.1, 1, 10]

# SVM regularization parameter

C = 1.0

plt.figure(figsize=(10, 8))

for i, gamma in enumerate(gamma_values):

# Fit the model

clf = svm.SVC(kernel='rbf', gamma=gamma, C=C)

clf.fit(X_train, y_train)

plt.subplot(2, 2, i + 1)

plt.subplots_adjust(wspace=0.4, hspace=0.4)

# Predict and reshape

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)

# Plot

plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)

plt.xlabel('Sepal length')

plt.ylabel('Sepal width')

plt.xlim(xx.min(), xx.max())

plt.ylim(yy.min(), yy.max())

plt.xticks(())

plt.yticks(())

plt.title(titles[i])

plt.show()

When selecting a kernel function for an industrial classification problem, there is no one-size-fits-all answer. **It depends on the data characteristics, such as its size, dimensionality, distribution, noise, and outliers.** Generally, a **linear kernel should be used if the data is linearly separable or has many features**, a **polynomial kernel if it has nonlinear patterns or interactions between features**, an **RBF kernel if it has complex and nonlinear patterns or clusters**, and a **sigmoid kernel if it is binary or looks like a logistic function**. You can also experiment with different combinations of kernels and compare their performance using the methods discussed previously. Additionally, domain knowledge and expert judgment can be employed to select the most appropriate kernel for your industrial problem.

**Infinite Dimensional Mapping:** The RBF kernel implicitly maps input data to an infinite-dimensional feature space, which allows for even greater flexibility in forming decision Boundaries.