Dive Into Deep Learning — Part1. In this series, I will be sharing with… | by Nouran Ali

In this series, I will be sharing with you my summarization of the Dive into deep learning book, I started reading it as a review of what I know about DL and also to explore new concepts I might have missed.

The book is available for free here along with the code examples: https://d2l.ai

I will start with chapter 3: Linear Regression as the first two chapters are about mathematics needed in order to understand the concepts in the book.
I recommend this specialization on Coursera it’s very well illustrated and covers all that you need in order to start learning about DL and ML.

In this piece, I will summarize section 3.1 from pg 82 till page 86, the content of the book is very rich so I don’t want to make the article very long.

1- When to use regression?
When we want to predict a numerical value like the price of an item, or its length.

The book walks us through an example of predicting house prices.

2- There are some terminologies that we need to be aware of while reading which are:

training set: dataset where each row corresponds to one observation
an example/data point/instance/sample: a single observation or row
label/target: the thing we are trying to predict
features/covariants: the variables upon which predictions are made
n: number of examples in the dataset and i is the ith sample

3- Linear Regression assumptions

given a function y=wx+b

The relationship between the target y and the features x is approximately linear meaning that the conditional mean E[y|x=x] can be represented as a weighted sum of all the features x.
This assumption means that the expected value of the target can be expressed as a weighted sum of features.

house prices formula

Any noise follows a Gaussian distribution

normal (Gaussian) distribution formula

4- What is a model?

A model is a solution that describes how features can be transformed into an estimate of the target.

as we said the target is a weighted sum of the features, here w-area and w-age are called weights and b is the bias/offset/intercept.

weights: determine the effect of each feature on the target.
bias: determines the value of the target when all features are zero.

The goal of regression is finding w and b that make our model’s predictions as close to the real values (prices) as possible and produce the lowest error.

We will use the vectorized formula, refer to section 2.1.4 in the book to understand broadcasting.

the vectorized formula

5- Loss Function

The loss function is a quality measure for the model meaning, it quantifies the distance between the real and predicted values of the target.
The loss value is a non-negative number where smaller values are better and perfect predictions give a loss of 0.

The most common loss function for regression is the squared error given by

the squared error loss

large differences between y-hat and y lead to excessive sensitivity to anomalies but also encourages the model to avoid large errors.

Fitting a linear regression model to one-dimensional data

Then the average of losses over the training set is denoted by

As we said before the goal of the training process is to find w and b that minimize the total loss for all training sample

6- Analytic Solution

We need a method to update the model to improve its quality and unlike most models linear regression is considered an easy optimization problem, we need to apply the following steps:

Add b to w by appending a column for b in w matrix.
Then the prediction goal is to minimize ||y-Xw||² >> x must be of full rank meaning no feature is linearly dependent on other features.
There will be just one critical point on the loss surface and it corresponds to the minimum loss over the entire domain.
Take the derivative of the loss with respect to w and set it equal to zero