Acknowledgement

This is walkthrough of basic concepts for Back-Propagation (BP) Process and AutoGrad project, refering to the following resources. Great thanks to Andrew Karpathy.

[1] Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0

[2] Andrew Karpathy, ‘The spelled-out intro to neural networks and backpropagation: building micrograd’ (Youtube) , ‘micorgrad’ (Github).

GitHub - karpathy/micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

https://youtu.be/VMj-3S1tku0

Simple Neural Network Structure

Personally, building a neural network is to fit a function that minimizes the loss of training data. The hidden layers of the neural network are doing the job of feature engineering.

A Neuron

A neuron — a very rough model of a human neuron — takes in some inputs, calculates the linear combination of them, adds a bias, passes through certain activation functions and outputs the value. The goal of activation functions are mainly to enable non-linear modelling or to satify certain tasks (sigmoid for binary classifcation).

Multi-Layer Perceptrons

Multi-Layer Perceptrons (MLP) is a neural network (nn) structure that each neuron in hidden layers is fully connected to all activations from the last layer.

Ideas in Back-Propagation

Back-Propagation (BP) is a method to obtain the effect of changing internal parameters on the loss of the nn. The combination of BP and gradient descent is a method to update the parameters inside a neural network(nn) to reduce the loss (i.e. to train a nn).

<aside> 📢

In calculus, the way to find the minimum point of a function is to take the partial derivatives of varibles and find the varibles’ values that render partial derivatives to zero. In nn, we use the same approach to find the local minimum of the loss function since our goal is to minimize the loss for a specfic tasks.

</aside>

<aside> 🖇️

Notably, something usually ignored is that we are working on the weight space of the loss function (i.e. weights inside the nn are the variables). The input data (training set), here, are the coefficients for weights.

</aside>