The Diggs Equation — Forecasting Josh Allen Passes and Stefon Diggs Catches | by Joe Alongi

From our initial data we generated a representative sample based on the percentage of passes thrown to Stefon Diggs 28% by Josh Allen and the percentage of passes caught by Stefon Diggs 70%. Through this lens we will extrapolate the initial forecast to illuminate the outcomes of future passes based on the historical percentage of historical caught and received passes at 40%.

We can follow the Scikit Learn example again, utilizing the same Kernel processes for forecasting our data as the model in the example is based on a yearly occurring frequency, which we may have to adapt later.

Forecasting the data with the frequency kernels means, we need to install Scikit Learn with pip install scikit-learn in addition to Numpy with pip install numpy and begin to import the necessary kernels for running the projections of our data across the existing indices, along with dementional-izing the data into arrays.

For reference, a projection of this fitted model is based on the total number of passes thrown from the original data ingestion of Josh Allen’s ATT (Passing Attempts) column. The size of this data sample, if you recall, is based on the years in which Josh Allen and Stefon Diggs were both playing and hypothetically starting for the Buffalo Bills, seasons 2020–2023. Our multiplier or parameter for the ten (10) throw sample set is one hundred and ninety (190) to get to the 1990 total, which is also constrained in deviation by the same sum for numpy.

Add the following code block to the existing script to continue the development of this model.

Kernel Dependency Script — Addition:

# Import Kernel Dependencies For Extrapolated Data
X = (hist_data_samp.index * 190).to_numpy().reshape(-1, 1)
y = hist_data_samp["Outcome"].to_numpy()
long_term_trend_kernel = 50.0**2 * RBF(length_scale=50.0)
seasonal_kernel = (
2.0**2
* RBF(length_scale=100.0)
* ExpSineSquared(length_scale=1.0, periodicity=1.0, periodicity_bounds="fixed")
)
irregularities_kernel = 0.5**2 * RationalQuadratic(length_scale=1.0, alpha=1.0)
noise_kernel = 0.1**2 * RBF(length_scale=0.1) + WhiteKernel(
noise_level=0.1**2, noise_level_bounds=(1e-5, 1e5)
)
hist_data_samp_kernel = (
long_term_trend_kernel + seasonal_kernel + irregularities_kernel + noise_kernel
)
y_mean = y.mean()
gaussian_process = GaussianProcessRegressor(kernel=hist_data_samp_kernel, normalize_y=False)
gaussian_process.fit(X, y - y_mean)

The process of extrapolation goes hand-in-hand with fitting the data over the forecasted model and for this we need to add in Numpy with pip install numpy which will allow for the deviation of the numbers through the use of dimensional arrays.

Fitting Data Script — Addition:

# Fit Model With Extrapolated Data
X_test = np.linspace(start=1, stop=1990, num=1_990).reshape(-1, 1)
mean_y_pred, std_y_pred = gaussian_process.predict(X_test, return_std=True)
mean_y_pred += y_mean

Once the data has been fitted and the kernels have run their distribution for extrapolation, we can again graph the outcome of these theoretical forecasts to exemplify how this data my enumerate.

Add the following code block to the existing script to continue the development of this model.

Extrapolated Data Edition— Addition:

# Graph Extrapolated Data
plt.plot(X, y, color="black", linestyle="dashed", label="Measurements")
plt.plot(X_test, mean_y_pred, color="tab:blue", alpha=0.4, label="Gaussian process")
plt.fill_between(
X_test.ravel(),
mean_y_pred - std_y_pred,
mean_y_pred + std_y_pred,
color="tab:blue",
alpha=0.2,
)
plt.legend()
plt.xlabel("Passes Thrown")
plt.ylabel("Passes Caught")
_ = plt.title(
"Fantasy Football QB/WR - Forecasted Data"
)
plt.show()

Finally we can run python model.py again to graph our forecasted data across our fitted model for the extrapolation of data that leads to our forecast. In this run, the data should be enumerated through Numpy and the Scikit Learn kernels to increase the awareness of noise and divergence of the initial statistical probability.

Fantasy Football QB/WR — Forecasted Data

We can assume from that as the measurement and outcome lines in the chart exemplifies the distribution of 1990 passes from Josh Allen, should land Stefon Diggs around 950 catches. Our initial mathematical hypothesis was that 40% of catches would be caught, this is the sample with which we started, where 1990 total passed multiplied by .4 catches would reveal 796 perceived catches.

Our forecasted outcome is inflated by our models extrapolation, but hypothetically just as we experienced our fandom-like calculation of simulations for mutually exclusive events, we see that the model even meets our secondary estimations.

Continue Exploring This Machine Learning Model, on Github and Previous Blogs

The blog in reference to beginning this data science deep dive, can shed more light on Ingesting Data Sets and Prediction Algorithms. Primer code for this machine learning model and forecast can be found on GitHub.

Looking for more Application Development advice? Follow along on Twitter, GitHub, and LinkedIn. Visit online for the latest updates, news, and information at heyitsjoealongi.com.

Source link