This is walkthrough of basic concepts for Back-Propagation (BP) Process and AutoGrad project, refering to the following resources. Great thanks to Andrew Karpathy.
[1] Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0
[2] Andrew Karpathy, ‘The spelled-out intro to neural networks and backpropagation: building micrograd’ (Youtube) , ‘micorgrad’ (Github).
Personally, building a neural network is to fit a function that minimizes the loss of training data. The hidden layers of the neural network are doing the job of feature engineering.
A neuron — a very rough model of a human neuron — takes in some inputs, calculates the linear combination of them, adds a bias, passes through certain activation functions and outputs the value. The goal of activation functions are mainly to enable non-linear modelling or to satify certain tasks (sigmoid for binary classifcation).
Multi-Layer Perceptrons (MLP) is a neural network (nn) structure that each neuron in hidden layers is fully connected to all activations from the last layer.
Back-Propagation (BP) is a method to obtain the effect of changing internal parameters on the loss of the nn. The combination of BP and gradient descent is a method to update the parameters inside a neural network(nn) to reduce the loss (i.e. to train a nn).
<aside> 📢
In calculus, the way to find the minimum point of a function is to take the partial derivatives of varibles and find the varibles’ values that render partial derivatives to zero. In nn, we use the same approach to find the local minimum of the loss function since our goal is to minimize the loss for a specfic tasks.
</aside>
<aside> 🖇️
Notably, something usually ignored is that we are working on the weight space of the loss function (i.e. weights inside the nn are the variables). The input data (training set), here, are the coefficients for weights.
</aside>