Application of Gradient Descent and Normal Equation Algorithms for Development of Mathematical Models for Engineering Uses | by Shunyu (Andy) Tang

Engineers frequently develop mathematical models by empirically fitting a dataset of Xs to an observational y with various curves and picking the curve with the lowest error or the highest R2 value. The following dataset is an example:

In the real context, x is the rate that describe how fast water flows through a filter medium, and y is the observed head loss through the filter column. Intuitively, it can be obtained from the plot that higher the filtration rate (i.e. the x), higher the head loss (i.e. the y). It is thus possible for the engineers to develop a mathematical model that correlates the filtration rate (the x) to the head loss (the y) so that in future given any x, they can predict the corresponding y.

In this article, I’m not going to propose the simplest model like y = mx+b that any spreadsheet would do easily. Also in this case, we can easily see that a linear relationship cannot really describe the relationship between the two parameters, because the correlation is a little bit parabolic. What we are trying to do is to develop a model like y = ax2+bx+c as follows:

The parabolic function makes more sense than the linear function, because according to the fluid mechanics, as the flow rate goes higher, the fluid tends to be more turbulent, which is totally different from the laminar flow scenarios at low flow rates. Common spreadsheets cannot do this, at least at a such high accuracy with the parameters. Let’s do this using some Python code with the gradient descent and normal equation algorithms.

First, let’s import relevant libraries and download the dataset.

import numpy as np
import pandas as pd
import matplotlib.pyplot as pltlink = 'https://raw.githubusercontent.com/waterprofessor/wre/main/data/crf.csv'
df = pd.read_csv(link)

For this demo analysis, we take the first 19 rows only, as they correspond to a complete run of an experiment. For x, we take the filtration rate column, while for y, we take the actual loss column. Plotting x vs. y returns the first figure above.

x = df.loc[0:18,'Filt rate(m3/m2h)'].values
y = df.loc[0:18,'Actual loss (m)'].values
plt.scatter(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Now, we need to construct a matrix X with a dimension of 19×3. Rows of 19 is easily understood, because that’s the number of tests. Columns of 3 means we need to have 3 features, which are mapped to the coefficients c, b, a, respectively, in the formula: