Understanding gradient descent of a neural network is very important in terms of understanding the concept of back propagation.
Let’s consider a Neural Network with one hidden layer for now.
Now we are slowly moving into programing part of the lessons. Understanding of these basics will be vital for implementing a good Python program.
Formulas for computing derivatives
dw, db, dw, db
Always good to remember:
Forward Propagation is calculating the output of the network.
We have the Cost Function in between.
Back Propagation is the gradient descent algorithm which computes derivatives.
Then we have everything for updating parameters w and b
As we saw several times before where I described forward prop and backward prop algorithms, you might noticed that I have initialized parameters to 0. But the best way is initialize weights randomly. Here, I will explain.
In neural network initializing weights randomly is very important for gradient descent.
What happens if we initialize weights to zeros?
So, the every hidden unit computes the same values and outputs the same values.
Therefore, no point of keeping more than one hidden units.
In neural network we need to compute different functions in different hidden units. This can be achieved by initializing weights randomly.
In Python we can initialize w and b randomly as follows.
The 0.01 value is for making W and W small. Because, when we use Sigmoid/tanh activation function z need to be small for faster learning.