Tuesday 16 August 2016

Offensive Contribution in La Liga: Usual Suspects and One Ugly Duckling

In this post, we look forward to the new season of La Liga by looking back to the previous one. More specifically, we’ll zoom in on the offensive contribution of those forwards who put in stellar performances on this front. A player’s offensive contribution is typically determined "quick and dirty" by considering goals and assists, often by simply adding the two. Naturally, there are much more sophisticated – and accurate – ways to gauge the offensive contribution a player represents to his team. One such method that I will employ here has been proposed by Thomas Severini, Professor of statistics at Northwestern University, and concerns a regression model with multiple predictor variables.

The intuition and rationale behind the statistical model are as follows: For all teams in the Primera División, over the last seasons, we know how many goals they scored and a bunch of other statistics, including how many shots on goal, how many attempts at goal from outside the box, from inside the box, how many dribbles, passes, etc. they executed. What we are interested in to find out, is what are the variables that actually relate (most) to the number of goals scored by a given team. To this extent, in our model, we consider each team’s performance over the last five seasons, leading to 100 (i.e. 5 times 20 teams) observations. The statistical model allows for filtering out those variables that yield important additional information about the dependent variable (here: goals scored).

A concise model that allows us to explain no less than 85 percent (i.e. the R2) of the goals scored in the past five seasons of La Liga turns out to be the following:

– 13.7 – 0.10112*shots_outside_of_box + 0.41682*shots_on_target + 0.0132*successful_passes

Most important when interpreting this model – rather than the actual number, which is hard to interpret – are the variables included and their respective signs: Shots on target has a positive sign, implying that more shots on target tend to coincide with more goals. The sign of the variable shots outside of box is negative as it negatively adjusts the impact of shots on target attempted from outside the box in terms of their success probability. Successful passes turn out to constitute another important aspect of the offensive contribution, whereas, for example, successful dribbles, do not. Also noteworthy is the relative difference between the respective coefficients of variable shots on target and of successful passes: A shot on target will have an impact on the offensive contribution over thirtyfold (i.e. 0.41682/0.0132) the one of a successful pass. In case I would have data on the area of the pitch where the passes took place (e.g. final third), the model could be made even more accurate and I could also include non-forwards.

Following Severini's Analytic Methods in Sports (2015), to apply to above team-level model at the level of the individual player, we merely need to divide the intercept by 10 (because of the ten field players). Thus, for an individual player,

Offensive Contribution = – 1.37 – 0.10112*shots_outside_of_box + 0.41682*shots_on_target + 0.0132*successful_passes

The top 10 of offensive contributors for the 2015-2016 La Liga season is as follows:

Figure 1: Top 10 offensive contributors, La Liga 2015-2016

Naturally, some players received more playing time than others, e.g. due to injury. For comparative purposes, therefore, it is also helpful to consider offensive contribution assuming all players would have played all matches – at the level of their offensive contribution when they were actually fielded:

Figure 2: Top 10 offensive contributors, La Liga 2015-2016,
assuming all players would have played all of their team's games

Key observations:

1.     The number 1 in terms of offensive contribution is Lionel Messi.
2.     Although Cristiano Ronaldo came closest to Messi in terms of offensive contribution, Neymar would have jumped Cristiano, had they both had the same playing time.
3.     All members of MSN as well as of BBC are included in the top ten, assuming all players played the same number of games.
4.     MSN’s combined offensive contribution is larger than BBC’s.
5.     There is one “ugly duckling” in the top 7 (8), otherwise made up entirely of “usual suspect” stars: Jonathan Viera of Las Palmas. The 26-year-old Canary Islander tends to remain under the radar of more traditional measures of offensive contribution.
6.     The top 10 is completed by club topscorers who had an exceptionally prolific season: Depor’s Lucas Pérez, Betis’ Rubén Castro, Bilbao’s Aduriz and Real Sociedad’s Agirretxe – who missed over half the season due to injury.
7.     Relative differences are quite substantial, e.g. Messi's offensive contribution is almost double the rankings' number 10.

While it is difficult to perform even better in the season following an exceptionally good one – among others, due to a principle known in statistics as “regression to the mean” – I particularly look forward to finding out whether Las Palmas’ Viera can really make a name for himself this season and whether Messi and Cristiano Ronaldo can remain at the very top, or whether youngster Neymar will make a move towards absolute supremacy.

Note that by means of our regression model, we are measuring correlations, not causation. This concretely means that, over the past five seasons, the selected variables were accurate predictors of offensive contribution in La Liga. However, it does not imply that a player’s future offensive contribution will move accordingly. Especially if players were to “act upon” the above model, e.g. by dribbling less and shooting or passing more instead, the included predictor variables may (or may not) lose part of their positive correlation with the dependent variable.

Let the new season commence! J