Learn to make your first Neural Network Now

I ff’ed up

6 min readApr 15, 2023

I’m an electrical engineering student specializing in computer engineering. And let me tell you I fucked up bad. I chose a thesis topic related to Transfer Learning and it’s not going so well because I have been procrastinating learning about AI and ML. So in a week I challenged myself to create a neural network that recognizes numbers, it’s daily simple, it’s been done before, nothing new, except I’ll try to do it as fast as possible.

Where I started

I watched the playlist about neural networks that the awesome 3blue 1brown has on the topic.

What is a Neural network

You already may know some things about ML and how neural networks behave. However if you don’t, I find it a bit more easy to understand how the human eye works with the brain to recognize a number. Here’s a video about Michael from Vsauce making an experiment for the series Mind Field.

So, long story short, Neural Networks are built to “simulate” actual neurons. We take an input in this case a drawing of a number, and our model consisting of a network of neurons should be able to give us the desired number.

From a previous blog on digital image processing. We know that an image can be viewed as a function f(x,y) = z, where x and y are coordinates

From a previous blog on digital image processing. We know that an image can be viewed as a function f(x,y) = z, where x and y are coordinates and z is the intensity of our pixel. 1 or 255 corresponds to white and 0 to black.

I’ll be using the MNIST database of handwritten digits, which is very common. And to not make it almost impossible to do in a short time. I’ll be using Google Colab and python.

The input.

The images that are going to be analyzed are simple grayscale 28x28 pixels with handwritten numbers. So the input layer will have 28x28 = 784 neurons. That’s our first layer.

The hidden layers

The hidden layers are everything that is in between the input and the last layer. This is where the magic happens. An oversimplification is that after the input layer we would expect that every other layer that come after would light up the right set of neurons so that the next one also gets closer to light up the right neurons so that the last layer can output the correct number

Everything boils down to choosing the right weights or parameters for each of the lines that sets the right path for each neuron.

With this idea we can assume the next equation. For every pixel value let’s call it ai, i being the number of pixel. and let’s add a weight to every line that connects to the next layer.

so we would have => a1.w1 + a2.w2 + … + an.wn = c

where a is the pixel value, w is the weight and c is the next neuron on the following layer.

However this may introduce another problem. Addition can lead us astray from the original goal which was obtaining a 0 or 1 for every activation number of a neuron. So to fix this problem we use a sigmoid function. which receives any input and spits out a value between 1 and 0.

now, while we want the output to be between 0 and 1. We don’t want every neuron to be active just because the output of the sigmoid is bigger than zero. To solve this we would have to introduce a new variable called a “bias” so that we can have a bit more control over which neuron activates.

our equation now looks like this adding the sigmoid function and the bias

σ(a1.w1 + a2.w2 + … + an.wn + b) = c

Where σ is the sigmoid function. And b is the bias, we would have to actually find out for every neuron which amount is good enough. Now this next part really makes me wish I had a better teacher for linear algebra.

We can actually transform the equations into matrices, which are computationally a bit faster, since a lot of libraries and programming languages support optimized matrix operations

For a later project I’ll try to dig into ReLU which stands for Rectified Linear Unit, it serves a similar purpose to the sigmoid function, but ReLU has proven to be more optimal for training models.

How do we start

At least for the first training round we’ll set all the weights and biases as random numbers, just so we can get started. However we need a way of “telling” the computer that they’re on the right track. We need a “cost” function, a way to give feedback to the network.

In math terms, we need to add the square value of the difference between the output of the last layer and the desired output for a given input. The figure below explains it a bit better.

You can notice that the cost is high when the output is trash and small when the network confidently guesses correctly. However, we have now turned ourselves into a neglectant angry parent and our baby doesn’t know how to grow. What we want is for this cost to be as small as possible. And a tool that we can use to find minimums in this function is with derivatives.

But wait, there’s more. Our function it’s a simple parabolic function. We have copious amounts of weights and biases to set. The image below gives us a hint at what we should do. We start at a random point and from that random point we use a slope to guide us towards the next lower step. And we can be one step closer to the desired minimum for our cost function.

Gradient Descent

The gradient descent is a multivariable calculus tool that helps us determine the “quickest route” or direction in which the function decreases faster. The gradient descent can be denoted as ▼C. This will help the weights and biases to nudge in the right direction

The gradient descent is useful because not only it helps us define if we should tune up or down each weight but also the gives an indication on which weight affects the next layer.

Backpropagation

To be honest I’m a bit shaky still on how this works but we need to continue. As I understand it, backpropagation is related to what you’d like the second to last layer to be. To to do this let’s imagine we have the following result on the last layer

For input 2 we know that the last layer should increase the activation number, and lower it for the rest. We’d have to keep track of the “direction” for every neuron in terms of whether we want it to increase or decrease. I’m sure I am butchering this explanation, but I hope it gets a bit more clear after.

For now this is it, I’ll try to share the code on the next blog