A New Framework for Valuing Player Actions in Football Games | by Alex Marin Felices

Going Beyond Goals and Shots: Introducing a Comprehensive Approach to Analyzing Soccer Player Performance

Are you a soccer enthusiast who’s curious about how player actions on the pitch are assessed? Do you want to learn about a new language for describing individual player actions and a framework for valuing any type of player action? If so, you won’t want to miss the research paper “Actions Speak Louder than Goals: Valuing Player Actions in Soccer”. The paper addresses the shortcomings of traditional metrics in evaluating player performance by introducing a new language for describing individual player actions on the pitch and a framework for valuing any type of player action based on its impact on the game outcome while accounting for the context in which the action happened. By aggregating soccer players’ action values, their total offensive and defensive contributions to their team can be quantified. The paper demonstrates how their approach considers relevant contextual information that traditional player evaluation metrics ignore and presents a number of use cases related to scouting and playing style characterization in the 2016/2017 and 2017/2018 seasons in Europe’s top competitions.

This research paper seeks to answer the question of how a football player’s actions impact their team’s performance during games. While this question is important for player acquisition, evaluation, scouting, media coverage, and fan engagement, it is challenging to quantify the impact of individual actions due to the low-scoring and dynamic nature of football games. The paper argues that most existing approaches to valuing actions in football suffer from three limitations: they focus on goals and shots, assign fixed values to actions regardless of circumstances, and only consider immediate effects rather than longer-term effects.

To address these limitations, the paper proposes a novel data-driven framework for valuing all types of player actions and accounting for the circumstances under which they happen, as well as their possible longer-term effects. The approach assigns a value to each action that reflects its expected influence on the scoreline, with positive values indicating a contribution to the team’s goals and negative values indicating a contribution to the opponent’s goals.

The paper presents five contributions:

A language for representing player actions
A framework for valuing player actions and rating players based on their impact on the game
A model for predicting short-term scoring and conceding probabilities at any moment in a game
Use cases showcasing interesting results and insights
A Python package that converts existing event stream data to the proposed language, implements the framework, and constructs a model that estimates scoring and conceding probabilities.

Overall, the paper aims to objectively quantify player performances and provide valuable insights for football clubs, media, and fans.

This section discusses the data sources available for football games and explains that two primary data sources exist: event stream data and optical tracking data. While tracking data is expensive and only available in wealthy leagues or clubs, event stream data is more widely and cheaply available. Therefore, this paper focuses on event stream data, which annotates the times and locations of specific events such as passes, shots, and cards that occur in a game. However, analyzing event stream data poses several challenges from a data science perspective, and the section discusses these challenges in detail. To address these challenges, the paper proposes the use of a new language called SPADL, which stands for “Soccer Player Action Description Language”. The language provides a standardized way of describing player actions in football games and enables researchers to extract more detailed insights from the data. The section concludes by noting that while the proposed language was developed for event stream data, it could also be applied to tracking data with some minor extensions.

Five data science challenges posed by current event stream data

The paper describes five data science challenges posed by current event stream data. The first challenge is that event stream data is designed for multiple objectives, which means that some important information may be missing, and irrelevant information may hinder data analysis. The second challenge is that each vendor uses their own unique terminology and definitions to describe events, making it difficult to analyze data from different vendors. The third challenge is that vendors’ event stream formats typically remain backward compatible with previous formats, which can lead to suboptimal design choices and cumbersome data querying. The fourth challenge is that vendors offer optional information snippets per event, which makes it hard to apply automatic analysis tools. Finally, most machine learning algorithms require fixed-length feature vectors and cannot handle variable-sized vectors, so analysts have to write complicated event preprocessors tailored to one specific vendor’s current event stream format.

Language description

The authors propose SPADL (Soccer Player Action Description Language), a language for describing player actions, to unify the existing event stream formats into a common vocabulary that enables subsequent data analysis. SPADL is designed to be human-interpretable, simple, and complete to accurately define and describe actions on the pitch. It distinguishes between actions and events and represents a game as a sequence of on-the-ball actions with nine attributes, including start and end times and locations, player, team, action type, body part used, and result. The authors also release a Python package that automatically converts event streams to SPADL, which currently supports event streams provided by Opta, Wyscout, and StatsBomb. The language distinguishes between 21 possible types of actions, including passes, shots, and tackles, and considers up to four different body parts and up to six possible results.

In this section, the authors introduce the VAEP (Valuing Actions by Estimating Probabilities) framework for valuing actions performed by football players. The framework is used to compute objective action values and convert them into a player rating that represents the player’s total offensive and defensive contribution to their team.

Converting scoring and conceding probabilities to action values

The authors explain that most actions in a football game are performed with the intention of increasing the chance of scoring a goal or decreasing the chance of conceding a goal. To assess an action’s effect, the authors propose calculating how much it alters the chances of both scoring and conceding a goal in the near future.

The authors define an action’s offensive value as the change in the probability of a team scoring a goal as a result of that action. Similarly, the defensive value is defined as the change in the probability of a team conceding a goal. All actions should aim to decrease the probability of conceding, so the negation of the defensive value is used as the defensive value of an action for a team.

The authors combine the offensive and defensive values to derive an action’s total VAEP value. The total VAEP value of an action is the sum of its offensive and defensive values. The VAEP framework is independent of the representation used to describe the actions and transforms the subjective task of valuing an action into the objective task of predicting the likelihood of a future event in a natural way.

Converting action values to player ratings

The authors then describe how to convert the previously calculated action values into a player rating that represents a player’s overall offensive and defensive contribution to their team. The authors suggest that player ratings could be derived for any given time frame, such as a time window within a game, a full game, or a full season.

To compute a player’s rating, the authors aggregate the individual action values into a player rating per 90 minutes of game time, which captures the average net goal difference contributed to the player’s team per 90 minutes. Given a time frame T and player p, the player’s rating is computed by summing the offensive and defensive values of all actions the player performed during the time frame T and then dividing by the number of minutes the player played during T.

In addition to computing a player’s rating based on all actions, a player’s rating can also be computed per action type, allowing for the construction of a player profile that may help identify different playing styles. Overall, player ratings can be computed along different dimensions, depending on the use case.

As a following step, the authors discuss their method for estimating the probabilities of a team scoring or conceding in the near future. They denote a goal scored by the home team as goal(h) and a goal scored by the visiting team as goal(v). The task is to estimate the probability of scoring and conceding for both the home and visiting teams based on the current game state. The authors simplify the problem by only estimating the probabilities for the team currently in possession of the ball. This results in two separate binary probabilistic classification problems with identical inputs but different labels.

To estimate the probabilities, the authors train a probabilistic classifier using machine learning models such as Logistic Regression, Random Forest, or Neural Network. They require the probability estimates to be well-calibrated, and they use CatBoost for this purpose. The sequence of actions describing an entire game is converted into examples in feature-vector format for each game state. The authors then compute the labels and features for each game state.

Constructing labels

This section explains how labels are constructed for the two binary classification problems of estimating the probability of scoring and conceding in the near future for the team possessing the ball after an action. The first problem is to estimate the probability of scoring a goal, while the second is to estimate the probability of conceding a goal.

To construct the labels, a game state is assigned a positive label (= 1) if the team possessing the ball after an action scored or conceded a goal in the subsequent k actions, and a negative label (= 0) in all other cases. Here, k is a user-defined parameter that represents how far ahead in the future we look to determine the effect of an action. In this paper, the authors chose k = 10 based on domain knowledge and preliminary experiments.

Constructing features

The features are constructed for each example. Instead of defining features based on the entire game state, the previous three actions are considered. This approach has advantages such as allowing for a fixed number of features, which is necessary for most machine learning techniques. Additionally, considering a small window focuses on the most relevant aspects of the current context.

Three categories of features are considered based on the SPADL representation of the game. First, SPADL features are defined for each of the three actions, including categorical and real-valued features for the action type, result, body part used, start and end location coordinates, and time elapsed. Second, complex features combine information within an action and across consecutive actions. For each action, distance and angle to the goal for both the start and end locations, as well as the distance covered in both x and y directions, are included. Between two consecutive actions, distance and elapsed time are computed, along with whether the ball changed possession. These features provide intuition about the current speed of play. Finally, game context features are included, such as the number of goals scored by the team possessing the ball after the action, the number of goals scored by the defending team after the action, and the goal difference. These features are included to capture how teams adapt their playing style to the current scoreline.

In this section, the authors describe the experiments they conducted to evaluate their VAEP framework. They begin by stating that there are no objective ground truth action values or player ratings, making the evaluation challenging. Therefore, they focus on three main questions:

Providing intuitions into how the framework behaves and compares to other metrics
Presenting use cases revolving around player acquisition and characterization
Evaluating several of the design decisions.

The authors use Wyscout data for the English, Spanish, German, Italian, French, Dutch, and Belgian top divisions and apply the VAEP framework to 11,565 games played in the 2012/2013 through 2017/2018 seasons, considering only league games and ignoring friendly, cup, and European games.

To produce scoring and conceding probabilities, action values, and player ratings, the authors train two classification models using the CatBoost algorithm and the feature set detailed in previos section. They train the first model on the 2012/2013 through 2015/2016 seasons to produce the outcomes for the 2016/2017 season. Similarly, they train the second model on the 2012/2013 through 2016/2017 seasons to produce the outcomes for the 2017/2018 season.

Intuition behind the action values

In this section, the authors provide an intuition behind the action values generated by their framework by analyzing the actions leading to Barcelona’s goal against Real Madrid on December 23, 2017. The authors use Figure 1 to illustrate the attack leading up to the goal, which consists of six actions.

The first action is a pass from Sergio Busquets towards the right flank, which receives a neutral action value of 0.00. The second action is Lionel Messi’s pass back to Busquets, which is penalized with an action value of -0.01 since it moves the ball backward to a less favorable position. Busquets then makes an excellent through ball to Messi, which moves the ball closer to the goal and receives an action value of +0.01.

“**Figure 1:** The attack leading up to Barcelona’s final goal in their 3–0 win against Real Madrid on December 23, 2017.”

Messi then receives the ball and dribbles past a Real Madrid defender into the box, receiving an action value of +0.05 for significantly raising the scoring odds from 0.03 to 0.08. His next action is passing the ball backwards and away from the crowded six-yard box, which receives an action value of +0.09 for raising the scoring odds from 0.08 to 0.17. This action shows the power of the framework in rewarding players for moving the ball away from the opponent’s goal in a data-driven way.

Finally, Aleix Vidal shoots the ball into the goal, receiving an action value of +0.83 for converting a 0.17 scoring chance to a goal. If Vidal had missed his shot, he would have been penalized with an action value of -0.17. This example provides an insight into how the framework calculates the action values based on the potential impact of each action on the scoring chances.

Comparing VAEP player ratings to traditional player performance metrics

In this section, the authors compare their VAEP player ratings with traditional player performance metrics currently used, such as counting goals and assists. To evaluate the capabilities of these metrics, the authors produce each metric’s top-10 list for the 2017/2018 English Premier League season and compare them to the top-10 list produced by their VAEP framework. The top 10 players in terms of goals per 90 minutes consist of strikers who focus on finishing, while the top 10 in terms of assists per 90 minutes mostly consist of midfielders who specialize in setting up chances for their teammates. The ranking in terms of goals + assists per 90 minutes aims to strike a balance between both archetypes.

“**Table 1:** The top-10 players who played at least 900 minutes in the 2017/2018 English Premier League season in terms of (g) goals, (a) assists, (g+a) goals + assists, and (vaep) our VAEP player ratings. Rm denotes the rank of the player out of 305 players for metric m. The market value denotes the player’s market value on February 1, 2019 according to Transfermarkt.de.”

However, the authors note that their VAEP framework identifies impactful players who do not rate high on these traditional metrics, such as Kevin De Bruyne, Eden Hazard, and Riyad Mahrez. These players are considered Premier League stars but do not appear in any of the traditional top-10s. Additionally, the combined market value for the players in the VAEP top-10 list is considerably higher than that for goals, assists, and goals + assists. These observations suggest that the VAEP framework captures players’ contributions to their teams’ performances better than the traditional player performance metrics currently in use.

Identifying promising young players and minor league talent

In this section, the authors focus on identifying promising young players and minor league talent using their VAEP framework. The authors note that the English and Spanish leagues are the toughest and wealthiest, making it difficult for young players to earn playing time. Therefore, clubs often sign promising young players from smaller leagues such as the French, Dutch, and Belgian leagues. The authors investigate the top-ranked young talents who played at least 900 minutes in the 2017/2018 season in the English and Spanish leagues separately (Table 2a) and the French, Dutch, and Belgian leagues (Table 2b).

“**Table 2:** The top-5 players born after January 1, 1997 in terms of our VAEP player ratings during the 2017/2018 season in (a) the tougher English and Spanish leagues, and (b) the smaller French, Dutch, and Belgian leagues.”

The authors note that their framework identifies promising young talents who may be undervalued by traditional metrics. For example, Jonjoe Kenny is ranked fourth in Table 2a, despite having a much lower estimated market value than more well-known players such as Marcus Rashford and Ousmane Dembélé. The authors explain that Kenny is a defensive player, which are typically valued lower than offensive players by clubs and fans. Additionally, he plays for a mid-table club, which means he is surrounded by fewer world-class players. Nevertheless, the authors suggest that Kenny’s player ratings suggest a much higher valuation than his current estimated market value of €5 million.

In Table 2b, the authors note that David Neres tops the rankings, followed by Mason Mount and Kylian Mbappé, who won the Best Young Player Award at the 2018 World Cup. The authors note that their framework can serve as a useful tool for talent scouts as it can generate rankings for each league in the world, provided that the required event stream data is available.

Characterizing playing style

In the world of football, clubs are increasingly interested in identifying players who best fit their team’s preferred style of play during the recruitment process. However, scouts’ time is often limited, which makes it difficult to consider the entire pool of candidate reinforcements. Metrics that assess a player’s ability to perform different types of actions can help select a relevant set of players who are worth extra attention.

The VAEP framework can be used to address this task by computing a player’s rating per 90 minutes for each type of action. As an example, the text compares the total ratings per 90 minutes for four action types of Barcelona’s attempts to offset the loss of Neymar by acquiring Borussia Dortmund’s Ousmane Dembélé and Liverpool’s Philippe Coutinho in the summer of 2017. According to the metric, both Dembélé and Coutinho’s passes receive a higher value than Neymar’s, while Neymar is a superior dribbler. From a stylistic perspective, this suggests that both Dembélé and Coutinho were reasonable targets, as they are decent dribblers and better passers than Neymar.

“**Figure 2:** Overview of the total contribution per 90 minutes for different types of actions for (a) Neymar, Ousmane Dembélé, and Philippe Coutinho during the 2016/2017 season, and (b) Cristiano Ronaldo, Marcus Rashford, and Eden Hazard during the 2017/2018 season.”

Similarly, when Real Madrid lost their all-time top scorer Cristiano Ronaldo in the summer of 2018, the struggling club appeared in desperate need of a suitable replacement. Manchester United’s Marcus Rashford and Chelsea’s Eden Hazard have both been linked with moves to Madrid. However, neither comes close to replicating Ronaldo’s incredible finishing skill. Moreover, Ronaldo exhibits a higher total shot value per 90 minutes than Rashford and Hazard combined. While Hazard outperforms Rashford in every aspect, Rashford was closer to Ronaldo in terms of style as both rate similarly for passing and dribbling. If Real Madrid wanted to stick to their current playing style, the analysis suggests that the 21-year-old Rashford would be the better choice. However, if their aim is to immediately strengthen their team, then the 28-year-old Hazard would be the preferred choice as he is a better player regardless of his specific playing style.

Trading off action quality and quantity

The trade-off between the quality and quantity of actions in football players is discussed next. The authors note that a high number of actions may result in a lower average value per action. They use their VAEP framework to plot the average number of actions (quantity) against the average value of these actions (quality) for players in the Spanish and English leagues during the 2017/2018 season. The plot shows that Lionel Messi stands out from the rest of the players in terms of both action quality and quantity.

“**Figure 3:** Scatter plots of players in the 2017/2018 season who played at least 900 minutes in the Spanish or English league. The plots contrast the average number of actions performed per 90 minutes with the average value of the actions of the player. As shown by the grey-dotted isoline in (a) and (c), Lionel Messi is clearly in a class of his own.”

The authors then zoom in on the top 10 players in the English Premier League and Spanish league, respectively. They identify two archetypes: strikers who perform a low number of highly valuable actions, and midfielders who perform more actions that are less valuable. Philippe Coutinho falls in between these two archetypes, hitting a “sweet spot” between action quality and quantity. The authors note that Harry Kane and Mohammed Salah perform a relatively low number of actions but their actions are highly valued on average. Kevin De Bruyne and Paul Pogba perform more actions with a lower average value per action. Finally, Lionel Messi is an outlier, rating high on both action quality and quantity at the same time.

Evaluating design choices

In this section, the authors discuss the challenges of evaluating the performance of a data science system, particularly in cases where there is no ground truth available. Therefore, the question is how to evaluate the design choices made in creating the VAEP framework. They note that standard evaluation metrics such as accuracy, precision, and recall cannot be used in such cases. As a result, the only way to evaluate a system is to evaluate the components it consists of.

To evaluate the performance of their system, the authors train classification models using the features described previously with the CatBoost algorithm to generate scoring and conceding probabilities. They then compare the performance of their approach to alternative approaches that use either a different feature set or a different algorithm.

To evaluate the performance of each approach, the authors use two metrics that are often used for evaluating probabilistic predictions: the Brier score and ROC AUC. The Brier score measures the accuracy and calibration of the predictions and is minimized when the true underlying probability distribution of the data is reported. This property is important because the predicted probabilities are summed and subtracted to generate action values. The area under the receiver operator curve (ROC AUC) evaluates how well the approaches can discern positive examples from negative examples. An important advantage of ROC AUC is that the metric is unaffected by unbalanced data sets, as in their data only 1.5% (0.5%) of all game states lead to a scored (conceded) goal.

The authors also evaluate the usefulness of their comprehensive feature set by comparing it to four baseline feature sets: no features, location, action type, and location + action type. They evaluate each feature set using the CatBoost algorithm. For estimating both scoring and conceding probabilities, their feature set outperforms the baseline feature sets on both evaluation metrics. This result suggests that their features capture important game state context that is missing from the baseline feature sets.

“**Table 3:** Different design choices evaluated on both scoring and conceding probabilities using the Brier score and ROC AUC. For the Brier score lower values are better, whereas for ROC AUC higher values are better.”

Finally, the authors compare the performances of four learning algorithms (Logistic Regression, Random Forest, XGBoost, and CatBoost) using the designed features. CatBoost performs best in all cases, with XGBoost a close second. This narrow victory can be attributed to CatBoost’s intelligent handling of categorical features compared to XGBoost’s more naive one-hot encoding.

Overall, the authors’ evaluation methodology provides a comprehensive and rigorous approach for evaluating their system’s design choices, from feature selection to choice of learning algorithm.

Discussion of remaining challenges

One of the limitations of the framework is that it only values on-the-ball actions, while defending is also an important aspect of the game that cannot be captured by the model. Defending is often more about preventing the opponent from gaining possession of the ball through clever positioning and anticipation, which the VAEP framework cannot evaluate.

Another challenge is accurately comparing players across leagues. Players in minor leagues such as French, Dutch, and Belgian leagues might receive a higher VAEP rating compared to players in tougher leagues such as English and Spanish leagues. Similarly, it can also be challenging to compare players across clubs in the same league since players in top clubs with strong teammates might perform more valuable actions than players in mid-table clubs with weaker teammates.

The authors also mention the challenge of building trust in the VAEP ratings among traditional scouts who might be unfamiliar with this method of rating football players. The VAEP ratings might be less intuitive than traditional metrics such as goals per 90 minutes, which could make it challenging for analytically less inclined scouts to understand what the VAEP ratings precisely measure.

Overall, while the VAEP framework is a useful tool for evaluating football players, it has some limitations that need to be addressed. Nevertheless, the VAEP framework provides valuable insights into the performance of football players, and its usefulness can be improved with further research and development.

Source link