The first machine learning appearance was back in the 1950s, when Arthur Samuel invented a program that calculated the winning chance in checkers for each side. That was the beginning of a revolution, with much more to come.

ML is seen every day, by almost every living person. Some of the most common applications are:

**Recommendation Systems**: used by platforms like Amazon, Netflix, and Spotify to suggest products, movies, TV shows, or songs based on users past interactions and preferences.

**Predictive Text and Autocorrect**: integrated into smartphones, keyboards, and messaging apps, these ML models predict and suggest words or phrases as users type, improving typing speed and accuracy.

**Virtual Personal Assistants**: Voice-activated assistants like Siri, Google Assistant, and Alexa utilize machine learning to understand and respond to user voice commands, perform tasks, provide information, and control smart devices.

As we all know, since 2022, year of release of ChatGPT, there’s been a boom in Artificial Intelligence. So much has changed with the arrival of these technologies. If you are still a student, think of how you life was three years ago… Probably a lot different (and harder). But it doesn’t just apply in “Q&A” AIs. It can help change the world into a better place, for example with applications in **Healthcare**, **Finance**, **Education **and so on…

In machine learning, there is a lot of models we can work with. But, starting from the beginning, models can be classified as

**Supervised Learning**: model is fed with features (input) and targets (output) to further training and testing. The objective is to predict a outcome from a set of one or more inputs. Examples of supervised learning models:

– Regression

– Classification

**Unsupervised Learning**: model is fed only with features, and no targets. The objective is to find a structure in dataset without any prior guidance. Examples of unsupervised learning models:

– Clustering

– Anomaly detection

– Dimension reduction

**Reinforcement Learning**: model learns by getting rewards when it gets the outcome right, and penalties when it don’t.

Starting from scratch, regression models can be easy to understand, when you have at least a little math background. You can think of it as a function with one or more degrees. Beginning with **univariable linear regression**, we can see it as this function:

Where w and x are vectors. Vectorization is applied in order to reduce processing consumption. After we have a structure to our function, we need to identify whats the ideal parameters, the ones that will reduce the error of prediction. For that, we establish a **cost function** (J) that is calculated by the square of predicted outcome minus real value divided by the number of samples.

**Cost Function**

To achieve the optimal parameters w, it’s applied **Gradient Descent**

## Gradient Descent

Gradient descent is a method used to minimize a function by iteratively adjusting its parameters in the direction that reduces the function’s value the most, effectively ‘descending’ towards the function’s minimum. It takes baby steps going deeper and deeper, until it reaches a local minimum (low error)

The optimal w is obtained by n iterations (until w converges), always subtracting a portion of its derivative, looking for a way to reach a local minimum. The mathematical equation is as follow:

**Practical example**: Let’s assume you are looking to buy a house, but you don’t know how to price a house correctly. To solve that, you collected data from 5 houses, with different sizes and prices. With these informations, a simple linear regression can be built in order to predict how much a house would cost based on its size. Starting, we have to a dataset with two samples:

Then, we can create our train subset

As we have only one parameter (w), there’s no need to use vectorization. So, our base function will be:

Now, let’s create a algorithm to compute our cost function (J)

Next, we need to create a algorithm to compute gradients

Now that gradients can be computed, gradient descent can be implemented below in `gradient_descent`

. This function will be applied to find optimal values of 𝑤 and 𝑏 on the training data.

It’s necessary to define initial values for w and b. Usually, it’s chosen zero for both of them. Using **Gradient Descent**, optimal values for w and b was found

**Cost vs. iterations of gradient descent**. In successful runs, costs should consistently decrease. The rate of cost reduction is particularly rapid at the outset, making it beneficial to represent the initial decline on a separate scale from the final descent.

**Predictions**

We got pretty accurate predictions. Testing for sizes of [1.0, 1.5, 1.75, 2.0, 2.5, 3.0] (1000 sqft):

We can also see the gradient descent steps taken until find optimal w and b