Mastering Handwritten Digit Recognition with Deep Learning | by Q

In this story, we’ll explore the handwritten digit recognition using deep learning. It’s a journey that will take us from the raw pixels of digital images to the intricate workings of neural networks that can classify these images with remarkable accuracy.

Before diving into the neural networks, let’s lay the groundwork by setting up our computational toolbox. Here’s a peek into the Python packages that make this magic happen:

Numpy: Our foundational block for scientific computing, Numpy gives us the power to handle arrays and matrices with ease.
TensorFlow: The cornerstone of our neural network, TensorFlow provides the tools to build and train models that can learn from data.
Keras: Living within TensorFlow, Keras simplifies neural network construction with high-level building blocks like layers and activation functions.
Matplotlib: The artist of our story, Matplotlib will help us visualize data and neural network performance in full color and style.
Utility Scripts: Tucked behind the scenes are custom utility scripts designed to streamline our workflow and visualization.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import linear, relu, sigmoid
%matplotlib widget
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)
from public_tests import * 
from autils import *
from lab_utils_softmax import plt_softmax
np.set_printoptions(precision=2)

In the neural network’s symphony, activation functions are very important, it dictates how signals transform and flow. What it does is straightforward — it takes a number and if that number is less than zero, it outputs zero; otherwise, it just outputs the number itself. It’s the epitome of a ‘keep it positive’ mindset!

a = max(0, z)

plt_act_trio()

In a neural network designed for classifying into multiple categories, the final layer generates a vector z of raw scores (also known as logits), one for each category. The softmax function takes these logits and converts them into a probability distribution. The formula for the softmax function is as follows:

Each output is constrained between 0 and 1, mirroring the properties of probabilities.
The sum of all outputs equals 1, ensuring a proper probabilistic distribution.
Outputs with larger input values (logits) result in larger probabilities, allowing the network’s confidence to be directly interpreted from the output values.

# UNQ_C1
# GRADED CELL: my_softmaxdef my_softmax(z):  
""" Softmax converts a vector of values to a probability distribution.
Args:
z (ndarray (N,))  : input data, N features
Returns:
a (ndarray (N,))  : softmax of z
"""    
### START CODE HERE ### 
e_z = np.exp(z - np.max(z))
sum_e_z = e_z.sum()
a = e_z / sum_e_z
### END CODE HERE ### 
return a

z = np.array([1., 2., 3., 4.])
a = my_softmax(z)
atf = tf.nn.softmax(z)
print(f"my_softmax(z):         {a}")
print(f"tensorflow softmax(z): {atf}")# BEGIN UNIT TEST  
test_my_softmax(my_softmax)
# END UNIT TEST

Embarking on the journey from last week’s exploration of binary classification, this week, we elevate our challenge to the realm of multiclass classification. Armed with the softmax activation function, our neural network will now learn to recognize the ten handwritten digits, 0 through 9, a pivotal step in the evolution of machine learning applications.

The objective of this exercise is straightforward yet profoundly impactful: to employ a neural network in recognizing ten handwritten digits.

Our endeavor is fueled by a meticulously curated dataset, pivotal in training our neural network. Here’s an overview of the dataset characteristics and how it’s structured:

Origin: This dataset is a subset of the renowned MNIST handwritten digit dataset, a benchmark in the field of machine learning for digit recognition tasks.
Composition: It consists of 5,000 training examples, each representing a digit in grayscale images of 20 pixels by 20 pixels.
Preprocessing: To facilitate the learning process, each image is unrolled into a 400-dimensional vector, transforming our dataset into a 5000 x 400 matrix, X, where each row is a training example.
Labels: Accompanying the images is a vector, y, containing the labels for each training example. These labels range from 0 to 9, corresponding to the digit depicted in the image.

X, y = load_data()
# X now contains 5000 training examples, each a 400-dimensional vector
# y contains the labels for each example

As we embark on constructing a neural network capable of recognizing handwritten digits, it’s essential to visualize and understand the architecture we’re about to build. Our network, designed specifically for this task, comprises an input layer, two hidden layers, and an output layer, each serving a unique role in the learning process.

Here’s a high-level overview of the neural network architecture:

Input Layer: Since we’re dealing with 20×20 pixel images, our input layer must accommodate 400 input features, each representing a pixel’s grayscale intensity.
First Hidden Layer (Dense): This layer consists of 25 units and utilizes the ReLU (Rectified Linear Unit) activation function. ReLU introduces non-linearity to the network, allowing it to learn complex patterns.
Second Hidden Layer (Dense): Following the first hidden layer, this layer contains 15 units, also with ReLU activation. It further abstracts the representations learned by the previous layer.
Output Layer: Culminating in the output layer, we have 10 units, corresponding to the ten digits (0–9). This layer employs a linear activation function, preparing the network’s output for the softmax activation that will convert logits to probabilities.

Understanding the dimensions of the network’s parameters is crucial for grasping how data flows and transforms through it:

Layer 1 Parameters: The first hidden layer’s weights, W1, are shaped (400, 25), reflecting the transition from 400 input features to 25 units. Its bias vector, b1, has a shape of (25,).
Layer 2 Parameters: For the second hidden layer, W2 is shaped (25, 15), mapping the 25 inputs from the previous layer to 15 units. The bias vector, b2, is (15,).
Layer 3 Parameters: Finally, the output layer’s weights, W3, have dimensions (15, 10), linking the 15 inputs to the 10 output units. The corresponding bias vector, b3, has a dimension of (10,).

This structured approach in designing the network ensures that each layer’s output seamlessly becomes the subsequent layer’s input, facilitating a smooth forward propagation of data.

# UNQ_C2
# GRADED CELL: Sequential model
tf.random.set_seed(1234) # for consistent results
model = Sequential(
[               
### START CODE HERE ### 
tf.keras.Input(shape=(400,)),     # @REPLACE 
Dense(25, activation='relu', name = "L1"), 
Dense(15, activation='relu',  name = "L2"),   
Dense(10, activation='linear', name = "L3"),  ### END CODE HERE ### 
], name = "my_model" 
)
model.build(input_shape=(None, 400))

by using the model.summary(), we can get more specific information for each layer:

With our neural network architecture laid out, the next critical step is training it to recognize handwritten digits. This process involves defining a loss function, selecting an optimizer, and setting the training parameters, such as epochs and batch sizes. Let’s break down each of these components:

To evaluate how well our model is performing, we use a loss function called SparseCategoricalCrossentropy. This choice is particularly suited for multiclass classification problems where each class is exclusive. Importantly, we incorporate the softmax activation directly into the loss calculation by setting from_logits=True. This approach is both numerically stable and efficient:

loss_function = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In the training phase, we define the number of epochs — the number of times the entire dataset is passed forward and backward through the neural network. For our purposes, we’ve set it to 40 epochs:

epochs = 40

As the model trains, TensorFlow provides real-time feedback for each epoch, indicating the loss for the current batch of data. This feedback loop is crucial for understanding how well the training is progressing and whether the loss is decreasing over time, as expected.

Epoch 1/40
157/157 [==============================] - 0s 1ms/step - loss: 2.2770

And there we have it — a journey from pixels to possibilities, where each step brought us closer to bridging the gap between human creativity and computer understanding. We’ve navigated through the essentials of neural networks, dived into the depths of activation functions, and emerged with a model that can recognize the scribbles we call numbers with surprising accuracy.

But this isn’t just about teaching a machine to recognize numbers. It’s a glimpse into a future where technology understands us a little better, where the barrier between the digital and the personal blurs. The magic of neural networks doesn’t end here; it’s just the beginning. As we explore further, who knows what mysteries we’ll unravel next?

Thank you for joining me on this adventure. Here’s to many more discoveries, learning, and, most importantly, to the countless ways we can make our world a little more understandable, one pixel at a time.

Source link