Supervised Learning in R: Regression Diagnosis | by Fatih Emre Ozturk

Supervised Learning in R: Regression Diagnosis | by Fatih Emre Ozturk | Jun, 2023

The Tech Guy June 30, 2023 1 min read

If we want our linear regression model is not to be suspicious, there must be a straight-line relationship between predictor(s) and response variable. If there is no this kind of relationship, our predictions becomes suspicious. In order to identify non-linearity, we can check residual plots. Residual plots is a plot which there is fitted values(predicted values) in the x-axis and residuals in the y-axis.

In residual plots, what we must check is any trend. For instance, if there is a “U” shape in the data, it means that the data set is not linear, which something we do not want. The following plot may be a good example of it:

Assume that we observed non-linearity in our model. What should we do? Of course, we can use non-linear transformations of the predictor(s) X(s). For example X², X³, sqrt(X), or log X might solve non-linearity problem.

As an example for non-linearity and all other problems, I will use Carseats data set from ISLR package. I builded the following model with the following features:

df <- Carseats
model <- lm(Sales~Advertising+Price+Age+CompPrice+Income+Population,data=df)

We, now, check residual plot with the following codes:

plot(model$fitted.values, model$residuals, xlab = "Fitted Values", ylab = "Residuals")

Also, it is possible to draw residuals plot as the following:

par(mfrow = c(2,2))
plot(model)

As it can easily be seen from both of the plots, there is no observed pattern. Thus, it should be noted that there is not a non-linearity problem for this model.

Source link