Deep Learning (Part 1):Understanding Basic Neural Networks | by Sumbatilinda

Dr. Emily thought in a busy lab, looking for ideas to make computers intelligent like people. A thought was inspired by the elegantly soaring birds outside. What if she imitated the genius of nature? And thus her adventure with neural networks started. These computer systems have learning mechanisms similar to those of the human brain. Neural networks are driving technological innovation in the same way that birds invented flying. They solve problems, identify patterns in data, and analyze it all to revolutionize industries all around the world. Dr. Emily’s journey, which draws inspiration from the efficiency and beauty of the natural world, reflects humanity’s never-ending search for invention.

So, the question is, How can we define neural networks. I will break it down as follows:

Mimics the human brain to process information
Series of algorithms recognizing data patterns
Foundation of deep learning and machine learning

Neural networks, sometimes referred to as simulated neural networks (SNNs) or artificial neural networks (ANNs), are the foundation of deep learning techniques and a subset of machine learning. Because they resemble the way brain neurons communicate with one another, they are referred to as “neural.”

Many concepts that humans have come from nature, such as planes, which were inspired by birds . For instance, birds served as an inspiration for creating airplanes. One kind of machine-learning system that draws inspiration from the human brain is called a neural network. It is an artificial neural network made up of interconnected nodes that can identify patterns in data.

The architecture and operation of biological neural networks in the human brain served as the paradigm for Artificial Neural Networks (ANNs), which are computer models. ANNs are made up of networked nodes, sometimes known as “neurons,” which process and send data.

Application: This method has been used to handle difficult machine-learning issues including picture classification, recommendation engines, and language-to-language translation.

So you might be asking yourself, what is deep learning???

History of Deep Learning

1943: Warren McCulloch and Walter Pitts create a computational model for neural networks based on mathematics and algorithms called threshold logic.

1958: Frank Rosenblatt creates the perceptron, an algorithm for pattern recognition based on a two-layer computer neural network using simple addition and subtraction. He also proposed additional layers with mathematical notations, but these wouldn’t be realized until 1975.

1980: Kunihiko Fukushima proposes the Neoconitron, a hierarchical, multilayered artificial neural network that has been used for handwriting recognition and other pattern recognition problems.

1989: Scientists were able to create algorithms that used deep neural networks, but training times for the systems were measured in days, making them impractical for real-world use.

1992: Juyang Weng publishes Cresceptron, a method for performing 3-D object recognition automatically from cluttered scenes.

Mid-2000s: The term “deep learning” begins to gain popularity after a paper by Geoffrey Hinton and Ruslan Salakhutdinov showed how a many-layered neural network could be pre-trained one layer at a time.

2009: NIPS Workshop on Deep Learning for Speech Recognition discovers that with a large enough data set, the neural networks don’t need pre-training, and the error rates drop significantly.

2012: Artificial pattern-recognition algorithms achieve human-level performance on certain tasks. And Google’s deep learning algorithm discovers cats.

2014: Google buys UK artificial intelligence startup Deepmind for £400m

2015: Facebook puts deep learning technology — called DeepFace — into operations to automatically tag and identify Facebook users in photographs. Algorithms perform superior face recognition tasks using deep networks that take into account 120 million parameters.

2016: Google DeepMind’s algorithm AlphaGo masters the art of the complex board game Go and beats the professional go player Lee Sedol at a highly publicized tournament in Seoul.

** Deep learning does not promise that computers will begin to think like people. It would be similar to asking an apple to turn orange. Instead, it shows that, with enough data, quick processors, and sophisticated algorithms, computers can start to perform tasks that were previously limited to human perception, such as identifying cat videos on the internet (among other, possibly more practical uses).***

So what is Deep learning??

Deep Learning is a form of Machine Learning. It is known as ‘Deep’ Learning because it contains many layers of neurons. A neuron within a Deep Learning network is similar to a neuron of the human brain — another name for Deep Learning is ‘Artificial Neural Networks’.

A Deep Learning model is trained by first defining its learning objective and then fine-tuning its parameters to maximize its output. This is not the same as traditional machine learning, which takes time and isn’t necessarily efficient because it involves manually creating and picking features — properties of data that the system should look at.

Okay, I hope you now mastering what deep learning is? Pat yourself on the shoulder for the milestone. Let us move on

Perceptron

Perceptron is an algorithm that makes the neurons learn from the given information. A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output:

In the example shown the perceptron has three inputs, x1,x2,x3 and an output

We have two types of perceptron

Single-layer Perceptron does not contain hidden layers. Single-layer Perceptron is the simplest form of an Artificial neural network (ANN).

Single Layer Perceptron has just two layers of input and output. It only has single layer hence the name single layer perceptron. It does not contain Hidden Layers as that of Multilayer perceptron. Input nodes are connected fully to a node or multiple nodes in the next layer. A node in the next layer takes a weighted sum of all its inputs

Multi-layer Perceptron contains one or more hidden layers.

A multi-layer perception is a neural network that has multiple layers. To create a neural network we combine neurons together so that the outputs of some neurons are inputs of other neurons.

How Do Neural Networks Work?

Layers of neurons (nodes) processing data

2. Includes input, hidden, and output layers

3. Utilizes weights, biases, and activation functions

Neurons work like this:

They receive one or more input signals. These input signals can come from either the raw data set or from neurons positioned at a previous layer of the neural net.
They perform some calculations.
They send some output signals to neurons deeper in the neural net through a synapse.

Here is a diagram of the functionality of a neuron in a deep learning neural net:

The process involves:

Input Layer: Receives the input signal (data)
Hidden Layers: Perform computations with weights and activation functions
Output Layer: Produces the final prediction or classification

Understanding the Terminology

Neuron/Node: At the heart of every neural network lies the neuron, also known as a node. Much like the neurons in our brains, these computational units process information. They receive input signals, perform computations, and produce an output signal.

Weights: Imagine each connection between neurons as a pathway carrying information. Weights represent the significance or importance of the input values flowing through these pathways. Adjusting these weights during the training process fine-tunes the network’s ability to recognize patterns and make accurate predictions.

Bias: Just as our perspectives might skew our judgment, biases in neural networks adjust the output along with the weighted inputs. Think of bias as an additional parameter that allows the network to account for variations and make more nuanced decisions.

Activation Function: Without activation functions, neural networks would be linear machines, limited to simple tasks. Activation functions introduce non-linearity, enabling the network to learn complex patterns and relationships within data. They determine whether a neuron should be activated or not based on its input, adding flexibility and depth to the network’s capabilities(e.g., sigmoid, tanh, ReLU)

Understanding these foundational concepts lays the groundwork for delving deeper into the world of neural networks. As you navigate through tutorials, research papers, and practical applications, remember that behind the intricate algorithms and complex architectures lie these simple yet powerful building blocks. So, embrace the terminology, for it holds the key to unraveling the mysteries of artificial intelligence.

Next, neurons in a deep learning model are capable of having synapses that connect to more than one neuron in the preceding layer. Each synapse has an associated weight, which impacts the preceding neuron’s importance in the overall neural network.

Weights are a very important topic in the field of deep learning because adjusting a model’s weights is the primary way through which deep learning models are trained.

Once a neuron receives its inputs from the neurons in the preceding layer of the model, it adds up each signal multiplied by its corresponding weight and passes them on to an activation function, like this:

The activation function calculates the output value for the neuron. This output value is then passed on to the next layer of the neural network through another synapse.

A more illustrating video is here for you to see what happens

Hurray, I hope you now getting a better understanding

Activation functions serve as the backbone of deep learning, enabling communication among neurons within a neural network. Acting as gatekeepers, these functions determine whether a neuron should be activated or not based on the input it receives. By introducing non-linearity to the network, activation functions enable complex pattern learning, making them vital components in the realm of artificial intelligence.

Understanding the significance of activation functions in deep learning is paramount. They facilitate the network’s ability to model and comprehend intricate relationships within data, paving the way for enhanced performance in tasks such as image recognition, natural language processing, and predictive analytics.

Now let us delvedelve deeper into the mechanics of activation functions, exploring their various types, functionalities, and implications in the realm of deep learning. By grasping the essence of activation functions, you’ll gain a profound insight into the inner workings of neural networks and their transformative potential in the world of technology and beyond.

There are four main types of activation functions that we’ll discuss in this tutorial:

Threshold functions
Sigmoid functions
Rectifier functions, or ReLUs
Hyperbolic Tangent functions

Threshhold Functions

Threshold functions, also known as step functions, are a type of activation function used in neural networks. These functions compute a different output signal depending on whether the input value exceeds a certain threshold or not.

Imagine a scenario where you have a threshold set at a specific value. If the input value to the threshold function is greater than this threshold, the function outputs a certain value (often 1 or a similar value representing activation). Conversely, if the input value is below the threshold, the function outputs another value (often 0 or a value representing inactivation).

In the context of neural networks, the input value to a threshold function is typically the weighted sum of input values from the preceding layer. Each input value is multiplied by a corresponding weight, and these weighted inputs are summed together. If this sum exceeds the threshold, the neuron is activated and produces a specific output; otherwise, it remains inactive.

Threshold functions were among the earliest activation functions used in neural networks, but they have limitations, particularly in the context of gradient-based learning algorithms like backpropagation. Their output is discontinuous, which makes it challenging to compute gradients for training purposes. Despite this, threshold functions laid the groundwork for more sophisticated activation functions used in modern neural networks.

As the image above suggests, the threshold function is sometimes also called a unit step function.

Threshold functions are similar to boolean variables in computer programming. Their computed value is either 1 (similar to True) or 0 (equivalent to False).

The sigmoid function is a pivotal tool in data science, prominently featured in logistic regression, a foundational technique for tackling classification problems. Unlike the binary output of the threshold function, the sigmoid function produces values between 0 and 1, making it ideal for estimating probabilities.

Mathematically, the sigmoid function is defined as:

Imagine plotting this function on a graph. As the input varies, the sigmoid curve smoothly transitions from 0 to 1. This smoothness is a significant advantage over the threshold function, which has a sharp, discontinuous transition.

Let’s illustrate this with an example: consider a binary classification task where we’re determining whether an email is spam or not based on features like the sender, subject, and content. The sigmoid function can predict the probability that an email is spam. For instance, if the sigmoid output is 0.8, it suggests an 80% chance that the email is spam.

Moreover, the sigmoid function’s smooth curve enables the calculation of derivatives at any point, facilitating gradient-based optimization techniques like gradient descent. This property is crucial for training neural networks efficiently, as it allows us to adjust model parameters iteratively to minimize prediction errors.

In essence, the sigmoid function’s versatility and smoothness make it indispensable in various machine learning applications, providing a reliable tool for estimating probabilities and optimizing models for accurate predictions.

Below is a sigmoid graph illustration

Okay , let us see how this works in code

import numpy as npdef sigmoid(x):
"""
Compute the sigmoid function for the given input x.
Arguments:
x -- A scalar or numpy array of any size.
Returns:
s -- The sigmoid of x.
"""
s = 1 / (1 + np.exp(-x))
return s
# Example usage:
x_scalar = 0
sigmoid_scalar = sigmoid(x_scalar)
print("Sigmoid of", x_scalar, ":", sigmoid_scalar)
x_array = np.array([1, 2, 3])
sigmoid_array = sigmoid(x_array)
print("Sigmoid of", x_array, ":", sigmoid_array)

This is my output:

One benefit of the sigmoid function over the threshold function is that its curve is smooth. This means it is possible to calculate derivatives at any point along the curve.

The Rectified Linear Unit (ReLU) function is a simple yet powerful activation function widely used in deep learning. Unlike the sigmoid function, which has a smooth curve, ReLU has a piecewise linear nature, making it computationally efficient and easier to train.

Here’s how the ReLU function is defined:

If the input value is less than 0, the function outputs 0.
If the input value is non-negative, the function outputs the input value itself.

Mathematically, the ReLU function can be expressed as:

Lets illustrate using a code example. Are you ready

import numpy as np
import matplotlib.pyplot as pltdef relu(x):
return np.maximum(0, x)
# Generate input values
x_values = np.linspace(-5, 5, 100)
# Compute ReLU values
relu_values = relu(x_values)
# Plot the ReLU function
plt.plot(x_values, relu_values, label='ReLU Function', color='blue')
plt.xlabel('Input Values')
plt.ylabel('Output Values')
plt.title('Rectified Linear Unit (ReLU) Function')
plt.axhline(0, color='black', linewidth=0.5, linestyle='--')
plt.axvline(0, color='black', linewidth=0.5, linestyle='--')
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend()
plt.show()

This is the output from the code:

I defined a Python function called relu(x) that implements the ReLU function using NumPy’s np.maximum() function, which returns the element-wise maximum of two arrays (or scalar values).
Next I generated a range of input values x_values using NumPy’s np.linspace() function.
Then I computed the corresponding ReLU values relu_values by passing the input values through the ReLU function.
Finally, I plotted the ReLU function using Matplotlib, labeling the axes and adding grid lines for clarity.

This code generated a plot illustrating the ReLU function, showing how it behaves for different input values, with a sharp turn at 0 where the function transitions from outputting 0 to outputting the input value itself.

Let us look at another illustration:

python code Example Basic calculation (Neuron)

import numpy as npdef neuron_output(weights, inputs, bais):
'''Simple neron activation function using ReLU'''
weighted_sum = np.dot(weights, inputs) + bias
return np.maximum(0, weighted_sum) #ReLU activation
#Example weight and imputs
weights = np.array([0.5, -0.5])
inputs = np.array([0.7, 0.3])
bias = 1.0
output = neuron_output(weights, inputs, bias)
print("Neuron output", output)

This is my output:

The provided code snippet demonstrates the implementation of a neuron activation function using the Rectified Linear Unit (ReLU) activation function, a popular choice in neural networks.

The function neuron_output takes three parameters: weights, inputs, and bias. Inside the function, it computes the weighted sum of the inputs by performing the dot product of the weights and inputs arrays and then adding the bias term.

The ReLU activation function is applied to the weighted sum using NumPy’s np.maximum() function, ensuring that negative values are replaced with zeros. This function effectively introduces non-linearity into the neural network.

An example of this activation function is illustrated by computing the output of a neuron given example weights, inputs, and bias values.

Finally, the computed neuron output is printed for inspection. Through this code, we gain insight into the process of activating a neuron within a neural network using the ReLU activation function, a fundamental concept in deep learning.

The hyperbolic tangent function, often denoted as tanh, is a widely used activation function in neural networks. Unlike the sigmoid function, which outputs values between 0 and 1, the tanh function produces output values between -1 and 1. It resembles the sigmoid function in shape but is symmetric around the origin, with its output values shifted downwards.

Mathematically, the hyperbolic tangent function is defined as:

The hyperbolic tangent function is similar in appearance to the sigmoid function, but its output values are all shifted downwards.

Lets us see an example illustrated through coding:

import numpy as np
import matplotlib.pyplot as pltdef tanh(x):
return np.tanh(x)
# Generate input values
x_values = np.linspace(-5, 5, 100)
# Compute tanh values
tanh_values = tanh(x_values)
# Plot the tanh function
plt.plot(x_values, tanh_values, label='Hyperbolic Tangent (tanh) Function', color='blue')
plt.xlabel('Input Values')
plt.ylabel('Output Values')
plt.title('Hyperbolic Tangent (tanh) Function')
plt.axhline(0, color='black', linewidth=0.5, linestyle='--')
plt.axvline(0, color='black', linewidth=0.5, linestyle='--')
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend()
plt.show()

This is my output:

Explanation of the code:

We define a Python function called tanh(x) that computes the hyperbolic tangent of the input x using NumPy’s np.tanh() function.
We generate a range of input values x_values using NumPy’s np.linspace() function.
We compute the corresponding tanh values tanh_values by passing the input values through the tanh function.
Finally, we plot the tanh function using Matplotlib, labeling the axes and adding grid lines for clarity.

This code will generate a plot illustrating the hyperbolic tangent function, showing how it behaves for different input values, ranging from -5 to 5. The output values will be between -1 and 1, demonstrating the characteristics of the tanh function as described.

The hyperbolic tangent function is similar in appearance to the sigmoid function, but its output values are all shifted downwards.