Hey there. I hope you are doing well. In my last post, we had set up the required environment. Let us get started with today's post about perceptron.

A perceptron is the fundamental unit of a neural network. It is an algorithm for supervised learning (we say what is right and wrong) for binary classification (either zero or one).

Perceptron is also a basic mathematical model of a neuron. It tries to replicate the work of a neuron. So to implement a perceptron let us study a little bit about neuron.


We know neurons are the basic building block of the brain (A massive and complex neural network).

Image Source - Wikipedia

So this was the guy I was talking about. Let us see the functions of this guy on a very high level of abstraction.

  • He takes in the signal from other neurons through dendrites.
  • It will process the signal in the nucleus.
  • Passes the processed information to other cells via the Axon Termina.

So let us try to come up with a model that replicates the neuron's activity to some extent.

Birth of the perceptron

Perceptron tries to do above-mentioned activities. Let us try to model one.

So what it should do?

  • It can take \(n\) number of inputs.
  • Does some computation on the input.
  • Passes it through a non-linear function (Very important) .
  • Produces a single output.
Perceptron (Image Source: Wikibooks)

\(x_i\) is the \(i^{th}\) input.
\(W_i\) is the weight value corresponding to the \(i^{th}\) input.
\(\sigma\) is the activation function.

We multiply all \(x_i\) with \(W_i\) and sum them to produce a single output which is then fed into an activation function to produce the output. This is our model of the neuron.

Why Activation?

Activation functions are non-linear, meaning their graph is not a straight line. The usage of non-linearity is helpful in approximating any function given we have a sufficient number of units.
Let us consider one activation function called as step function.

Step function

\( S(x) = \begin{cases}
1,& \text{if } x\geq 0 \\
0,& \text{if } x\lt 0
\end{cases} \)

As we see below.
\( x \) can take values from \( {-\infty} \) to \( {\infty} \)
\( S(x) \) can take values from \( 0 \) to \( 1 \)

Step Function (Image Source: Wikibooks)

Here the graph is not a straight line and its slope at different points are different.

Enough of Theory, Let us start coding.

We have modelled our perceptron, now let us see what it can do.

Fire up your jupyter notebook by using this command

$ jupyter notebook

Create a new notebook named Exercise-2

Import the following modules

   In [ 1 ] :   
import numpy as np
import math
import plotly.offline as py
import plotly.graph_objs as go

Let us create a perceptron

Here we model a Perceptron using a class and it has some important methods which are used to mimic the action of the real neuron to some extent. The Complete code is given below for you to try and a break down of the components is also provided.

   In [ 2 ] :   

class Perceptron():
    """ The implementation of the perceptron model. """
    def __init__(self,num_inputs,lr):
        num_inputs: number of inputs to the perceptron
        lr : Learning rate for the perceptron
        #Here we use num_inputs + 1, because this would
        #take the bias into account if we pad the input with one.  
        #So x dot product of W gives us Wx + b
        self.W = np.random.randn(num_inputs+1)  
        self.lr = lr
    def step_function(self,x):
        x : the input on which the step_function should be applied
        if (x>0):
            return 1
        else :
            return 0  
    def forward(self,x):
        x : the input numpyt array on which the perceptron is trained.
        output = x.T.dot(self.W)
        return self.step_function(output)
    def loss(self,predict,label):
        predict : value prediced by the perceptron
        label : original values to be predicted
        l = label - predict
        return l
    def back_propagate(self,loss_value,x):
        loss_value : the calculated loss for a set of label and predicted value
        x : the set of input values used for training
        self.W += (self.lr*loss_value*x)
    def batch_train(self,x,label,epochs=2):
        x : an array of data for training
        label : the orignal label for the set of training data
        epochs : total number of the times the data is used to train
        x = np.array(x)
        n = x.shape[0]
        bias_axis = np.ones([n,1])
        x = np.concatenate((x,bias_axis),axis=1)
        loss_hist = []
        assert x.shape[1] == (self.W.shape[0])
        for I in range(epochs):
            avg_loss = 0
            for J in range(x.shape[0]):
                pred = self.forward(x[J])
                l = self.loss(pred,label[J])
                avg_loss += abs(l)
        return loss_hist

Let us break down the code.

We have an internal State W for our perceptron, which has to be learned. The size of the state is the number of inputs plus one. It is because it gives us a way to integrate the bias inside the W matrix itself if we can pad the input with one. This would be an easy implementation of
$$ W*x + b $$

So the perceptron has been initialized. Now we can know more about the other methods in the perceptron.


This takes a single input and applies the step_function which gives us output in binary.


This takes in an array of input and performs an element-wise multiplication and sums it to give a single output (dot product). Then it is passed to a non-linear activation. This would yield an output, either 1 or 0. This is where the decisions are made about the data.


This function acts as a critic. This guy has both the predicted value and the original label. He compares both of them and teaches the perceptron on the correct and wrong things.


This takes the inputs from the loss function and adjusts the weights of \( W \).


This function takes the input and feeds it to the network and trains the perceptron.


Let us consider the forward function as a student who is going to a school.

  • Let \( W \) represent his actions in a test.
  • So forward is the action of taking the test.
  • The loss function is like a teacher who corrects his paper and tells him what is wrong and what is correct.
  • back_propagate are the parents of that student. They take input from the teacher and correct his actions so that he can earn more marks in the next exam.

Now our perceptron is ready. Let us use this to approximate AND function.

   In [ 3 ] :   
p = Perceptron(2,0.5)

# AND gate input
x = np.array([[0,0],

# Output of AND gate
y = np.array([0,0,0,1])

hist = p.batch_train(x,y,12)

print (p.forward(np.array([1,1,1])))

Here the AND gate takes two inputs and its learning rate is 0.5. You can change the learning rate and see what happens. The last line prints the value of AND gate's output when both the inputs are 1. Now try with different inputs.

Do you have some burning questions about this?

Because I had some questions when I first heard about this.

How do we know when the perceptron is ready?

Since we know all the inputs that this perceptron will ever get and luckily it is small, we can use all the data to train the perceptron. So in this case when the loss value goes to zero, we know that our perceptron is ready for action.

Loss values recorded during the training

We know after the 5 th epoch the perceptron has seen all the possible inputs and correctly classified them.

What is the correct number for learning rate and epochs?

The answer is "we don't know". They are called as hyperparameter. We can use the trial and error method to find the correct numbers.

We know that it works, but how does it work?

This is a classification problem, where we should separate the given data into \( 0 \) or \( 1 \). All the inputs yield either \( 0 \) or \( 1 \). So the perceptron classified them correctly. It somehow knows inputs with two one should be classified as one and all other should be zero. So how does it know this? The answer to the above question is that it tries to create a boundary line between data points(Linearly separable). In this case, it creates a plane which divides the input points which yields zero and one separately.

Here three points are separated by the plane. Those three points lie under the plane and one point lies above it. That plane is the decision plane. The inputs to the AND gate is mapped to X and Y in the graph. The XY points below the plane belong to one class and points above the plane belongs to another class.

Okay. How does the perceptron know the plane?

The perceptron starts with a random plane in 3-Dimension and moves it so that it could separate the inputs based on their class. This is where the loss function and the backpropagation comes into the picture. Loss function evaluates the current plane and finds where it goes wrong and by means of backpropagation teaches the perceptron.

Where is the code for the graphs and 3D plots?

It is this git-repo. Try out the code for yourself and tinker with it. Use this link to view the executed outputs.  Feel free to ask your questions in comments.

In my next blog...

  • Sneak peek at a Neural Network.
  • The relation between a perceptron and a Neural Network.
  • Training a single layer perceptron network.
  • Evaluation and Testing it.