*This article is part of the series **Demystifying Machine Learning Models**.*

Machine learning (ML) is a field that has seen explosive growth and interest in the past few decades, transforming industries and becoming a staple of modern technology. At the heart of many ML breakthroughs is a concept both profoundly powerful and often mystifying: the neural network. This post aims to demystify neural networks, offering a clear understanding of what they are, how they work, their applications, and when it might be prudent to consider alternatives.

A neural network is a computational model inspired by the human brainâ€™s structure and function. It consists of layers of interconnected nodes or â€śneurons,â€ť where each connection represents a synapse in the biological brain, capable of transmitting signals between neurons. Neural networks are designed to recognize patterns in data, learning and making decisions without being explicitly programmed for specific tasks.

Neural networks work through a process involving forwarding propagation and backpropagation. Letâ€™s break down these steps with some technical details:

## Forward Propagation

1. Input Layer: Data is fed into the network through the input layer. Each feature of the input data corresponds to one neuron in this layer.

2. Hidden Layers: After the input layer, data passes through one or more hidden layers. Each neuron in these layers processes the inputs by applying weights (indicative of the importance of each input) and biases (an extra input that allows the neuron to adjust its output), followed by an activation function that determines whether the neuron fires.

3. Output Layer: The final layer produces the networkâ€™s prediction. The output depends on the problem type â€” regression, classification, etc.

## Backpropagation

Backpropagation is the process through which the network learns. After a forward pass, the difference between the network output and the actual target values (error) is calculated. This error is then propagated back through the network, adjusting the weights and biases to minimize the error.

## Training

Training involves iteratively adjusting the weights and biases using an optimization algorithm (like Gradient Descent) and a learning rate, which determines the size of the steps taken towards minimizing the error.

## When to Use Neural Networks

Neural networks excel in handling and making predictions or classifications from complex, high-dimensional data such as images, audio, and text. They are particularly useful when:

- The relationship between input and output is too complex for traditional statistical methods.
- The task involves pattern recognition, such as image or speech recognition.
- Thereâ€™s a large amount of data available for training.

## When to Consider Alternatives

Despite their versatility, neural networks are not always the best choice. Consider alternatives when:

- Data is scarce: Neural networks require large datasets to learn effectively. In cases of limited data, simpler models or techniques like transfer learning might be more appropriate.
- Interpretability is crucial: Neural networks are often described as â€śblack boxesâ€ť because itâ€™s challenging to understand how they make decisions. If interpretability is essential, models like decision trees or linear regression might be preferable.
- Computational resources are limited: Neural networks, especially deep ones, can be computationally intensive and slow to train. Simpler models may offer a more practical solution in resource-constrained environments.

Now letâ€™s implement a neural network from scratch to solve the MNIST digit classification problem involves several steps: building the network, training it on the MNIST dataset, evaluating its performance, and visualizing the results. Given the complexity and length of such an implementation, weâ€™ll focus on a simple yet effective model: a feedforward neural network with one hidden layer.

This code will cover:

- Data Preparation: Loading and preparing the MNIST dataset.
- Network Implementation: Implementing a simple neural network.
- Training: Training the network on the dataset.
- Evaluation: Evaluating the networkâ€™s performance.
- Visualization: Visualizing the weights and example predictions.

The code is available in this colab notebook:

`import numpy as np`

from sklearn.datasets import fetch_openml

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import OneHotEncoder

from sklearn.metrics import accuracy_score

import matplotlib.pyplot as plt# Activation function and its derivative

def sigmoid(x):

"""Sigmoid activation function."""

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):

"""Derivative of the sigmoid function."""

return x * (1 - x)

# Initialize weights and biases

def initialize_weights(input_size, hidden_size, output_size):

"""Initialize weights and biases with small random values."""

weights_input_hidden = np.random.randn(input_size, hidden_size) * 0.01

weights_hidden_output = np.random.randn(hidden_size, output_size) * 0.01

bias_hidden = np.zeros((1, hidden_size))

bias_output = np.zeros((1, output_size))

return weights_input_hidden, weights_hidden_output, bias_hidden, bias_output

# Forward propagation

def forward_pass(X, weights_input_hidden, bias_hidden, weights_hidden_output, bias_output):

"""Perform a forward pass through the network."""

hidden_layer_input = np.dot(X, weights_input_hidden) + bias_hidden

hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + bias_output

predicted_output = sigmoid(output_layer_input)

return hidden_layer_output, predicted_output

# Backpropagation

def backpropagation(X, y, hidden_layer_output, predicted_output, weights_hidden_output):

"""Backpropagate the error and calculate the gradients."""

error = predicted_output - y

d_predicted_output = error * sigmoid_derivative(predicted_output)

error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)

d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

return d_hidden_layer, d_predicted_output, error

# Update weights and biases

def update_weights_biases(X, hidden_layer_output, d_hidden_layer, d_predicted_output, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output, learning_rate):

"""Update the network's weights and biases."""

weights_hidden_output -= hidden_layer_output.T.dot(d_predicted_output) * learning_rate

bias_output -= np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate

weights_input_hidden -= X.T.dot(d_hidden_layer) * learning_rate

bias_hidden -= np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

# Prediction

def predict(X, weights_input_hidden, bias_hidden, weights_hidden_output, bias_output):

"""Make predictions with the trained network."""

_, predicted_output = forward_pass(X, weights_input_hidden, bias_hidden, weights_hidden_output, bias_output)

return np.argmax(predicted_output, axis=1)

# Load and prepare the MNIST dataset

X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

X /= 255.0 # Normalize to [0, 1]

y = y.astype(int)

y = np.array(y).reshape(-1, 1)

y = OneHotEncoder(sparse=False).fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Neural Network Parameters

input_size = X_train.shape[1] # 784 for MNIST

hidden_size = 64 # Nodes in the hidden layer

output_size = 10 # Output classes (digits 0-9)

learning_rate = 0.1

epochs = 10

batch_size = 64

# Initialize network

weights_input_hidden, weights_hidden_output, bias_hidden, bias_output = initialize_weights(input_size, hidden_size, output_size)

# Training loop

for epoch in range(epochs):

for i in range(0, X_train.shape[0], batch_size):

X_batch = X_train[i:i+batch_size]

y_batch = y_train[i:i+batch_size]

# Forward pass

hidden_layer_output, predicted_output = forward_pass(X_batch, weights_input_hidden, bias_hidden, weights_hidden_output, bias_output)

# Backpropagation

d_hidden_layer, d_predicted_output, error = backpropagation(X_batch, y_batch, hidden_layer_output, predicted_output, weights_hidden_output)

# Update weights and biases

update_weights_biases(X_batch, hidden_layer_output, d_hidden_layer, d_predicted_output, weights_input_hidden, weights_hidden_output, bias_hidden, bias_output, learning_rate)

print(f"Epoch {epoch+1}, Loss: {np.mean(np.square(error))}")

# Evaluation

y_pred = predict(X_test, weights_input_hidden, bias_hidden, weights_hidden_output, bias_output)

y_true = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_true, y_pred)

print(f"Accuracy: {accuracy * 100}%")

# Visualization of Predictions for 4 Images

X_test_np = np.array(X_test)

fig, axes = plt.subplots(1, 4, figsize=(9, 3))

for i, ax in enumerate(axes):

ax.imshow(X_test_np[i].reshape(28, 28), cmap='gray')

prediction = predict(X_test_np[i:i+1], weights_input_hidden, bias_hidden, weights_hidden_output, bias_output)

ax.set_title(f"Pred: {prediction[0]}")

ax.axis('off')

plt.tight_layout()

plt.show()

Output:

`Accuracy: 94.75%`

Neural networks are a cornerstone of modern machine learning, offering unparalleled power in recognizing complex patterns and making predictions from vast datasets. However, they are not a one-size-fits-all solution. Understanding when and how to use neural networks, as well as when to seek alternatives, is crucial for leveraging the full potential of machine learning technologies. By demystifying how neural networks function, we hope to have made this powerful tool more accessible and understandable, empowering more individuals and organizations to harness its potential wisely.