Tuesday 8 November 2016

Best 11 vs. 11 "Best Bang for Your Buck"

With the initial study rounds being a matter of the past, the 2016-2017 season is finally well underway. An appropriate time to assess whose star has been shining. In today's post, I will assemble and compare two line-ups, across all leagues: the best performing one and the one that has so far yielded the best value for money. As performance measure, I will employ the industry standard, i.e. the InStat Index, an overall quality score indicating a player's recent form. 

The InStat Index is based on a detailed breakdown of a player's recent performances along about a dozen relevant performance dimensions, according to the position the player occupies on the field, e.g. finishing for a forward and key passing for a central midfielder. As a measure of a player's value, I use the market value as provided by transfermarkt.com. Both InStat Index scores and market values used are those as of November 8, 2016.

To determine the best-in-form line-up, I checked for each position on the field which of all players on that position has the highest InStat Index score. The resulting team is as follows:
Figure 1: Best 11, based on InStat Index per position
(Top, italicized, number in players' squares is their InStat
Index score; bottom number their market value, in €m)

Observation 1: Although players from all professional leagues across the world are being taken into consideration, the best starting eleven come from as few as three national leagues: La Liga, the Premier League and Ligue 1. Moreover, the Spanish league is host to no less than 9 of the 11. Additionally, apart from PSG's Brazilian skipper Thiago Silva, the only representative of a league other than the Spanish one is David Silva, indeed a 2010 World Cup winner with Spain. In total, there are five Spanish players in the team and three Brazilians.

Observation 2: Spanish champions FC Barcelona provide more than half of the best team: the attacking line is entirely Barça's MSN, with Pep Guardiola's incarnation on the field, Sergio Busquets, forming a link between the attack and a defense featuring both the blaugranas Umtiti and Sergi Roberto.

Observation 3: The two players in the team who are no absolute stars (yet) are Sevilla FC's goalkeeper Sergio Rico and Las Palmas' central midfielder Jonathan Viera. With solid performances both in La Liga and the Champions League, Rico, a 23-year-old local boy from Sevilla, who made his debut for Spain earlier this year, is keeping PSG arrival, Italian international Sirigu behind him on the bench. Those readers for whom virtuoso Jonathan Viera, by far the player with the lowest market value to make it into the squad, may or may not ring a bell, I gladly point to the previous post in this blog.

Observation 4: Apart from who made it to the team, it is also interesting to consider some who did not. In spite of them being the players with the third and fourth highest InStat Index score overall, respectively, neither Real Madrid's Gareth Bale nor Cristiano Ronaldo feature in the 11 best. Their misfortune is that they happen to be playing in the same position as the numbers one and two in terms of InStat Index, Messi and Neymar, respectively. Marcelo is thus the only merengue to make it to the best 11.

As, for most clubs and people, money is a limited resource, it is at least as worthwhile to find out which team yields the best value for money. In terms of our analysis, which players represent the most InStat Index points per euro? Simply dividing a player's InStat Index score by his market value, however, would imply that, say, a player with half Messi's InStat Index score and costing less than half of Messi would be "better value for money". To correct for this, I apply a natural-logarithmic transformation to the players' market value. Thus, the measure I consider here is InStat Index score divided by ln(market value). 

For a player to be withheld, in addition to having the best above measure in his position, he should be part of the top 10 in his position according to InStat Index and he must have featured in at least half of his team's domestic league games this season to date. The resulting "best bang for your buck" 11 are as follows:

Figure 2: "Best bang for your buck" 11,
based on InStat Index and market values
(Top, italicized, number in players' squares is their InStat
Index score; bottom number their market value, in €m)

Observation 5: The dominance of La Liga is similar in case of the "best bang for your buck" 11: 8 players play their football in the Spanish league; one of the "foreigners" in the team, Bayern Munich's Xabi Alonso, is actually a World Cup winner for Spain. Together with Schalke 04's Brazilian centre back Naldo, Alonso is one of two Bundesliga representatives coming into the team. With six players, more than half of the team are Spanish; again there are three Brazilians.

Observation 6: With a market value of just one tenth even of Sergio Rico's and with an InStat Index score that is just marginally lower than his Sevilla FC counterpart, Espanyol's on-loan goalkeeper Diego López, the man in whose favor Mourinho famously dropped San Iker Casillas to the bench when at Real Madrid, comes into the team. Sevilla FC does manage to keep a player in the squad, though, as Sergio Escudero replaces Marcelo on left back.

Observation 7: Expensive Barça options on right back and centre back, Sergi Roberto and Umtiti, are being replaced by the very lowly valued Míchel and David García, both from Las Palmas. As, naturally, Jonathan Viera retains his spot in the team, Las Palmas thus become the only club with as many as three players in the "best bang for your buck" 11! Another player coming in from one of La Liga's smaller teams is Villarreal's promising midfielder Manu Trigueros.

Observation 8: Both Messi and Neymar remain in the team, in spite of their astronomical market value. The rationale is simple: both are so much better (as indicated by the InStat Index differential) than the other players in their position that they still provide the best value for money – in this case a lot of value for a lot of money. As mentioned above, Bale and Cristiano Ronaldo are the ones who come closest to the performance of Messi and Neymar, respectively, but their market value is comparable to their blaugrana counterparts.

Observation 9: A big part of bringing down the cost (i.e. market value) of the team without sacrificing performance much has been achieved by bringing aging stars in, whose market value is at a low, but whose performance is still up to standard. Whereas only two players in the best 11 team are over thirty, with Thiago Silva (32) being the oldest, in the "best bang for your buck" 11, five players are aged 34 or 35. The average age of both teams differs by 3 years.

Observation 10: The best 11, corresponding to a total market value of €520m, managed an accumulated InStat Index score of 3,685. This compares to 3,567 for the "best bang for your buck" 11, valued at €259m. In other words, with only half the budget – and still being able to count with both Messi and Neymar – the "best bang for your buck" 11 represent only a 3% loss in InStat Index score compared to the best possible 11.

However, adding players' individual performances may not provide the right basis for comparison. If football is a "weakest link" rather than a "strongest link" game, as advocated by some of its greatest managerial innovators, such as Lobanovskyi and Sacchi, multiplying individual performances may be the more accurate way to go (see Anderson and Sally's The Numbers Game). This would paint quite a different picture as the product of the individual InStat Index scores of the "best bang for your buck" 11 reaches only 69% of the best 11's – hence a 31% rather than just a 3% loss. So one's philosophy of the game will matter a great deal, also when talking "hard" numbers.

Tuesday 16 August 2016

Offensive Contribution in La Liga: Usual Suspects and One Ugly Duckling

In this post, we look forward to the new season of La Liga by looking back to the previous one. More specifically, we’ll zoom in on the offensive contribution of those forwards who put in stellar performances on this front. A player’s offensive contribution is typically determined "quick and dirty" by considering goals and assists, often by simply adding the two. Naturally, there are much more sophisticated – and accurate – ways to gauge the offensive contribution a player represents to his team. One such method that I will employ here has been proposed by Thomas Severini, Professor of statistics at Northwestern University, and concerns a regression model with multiple predictor variables.

The intuition and rationale behind the statistical model are as follows: For all teams in the Primera División, over the last seasons, we know how many goals they scored and a bunch of other statistics, including how many shots on goal, how many attempts at goal from outside the box, from inside the box, how many dribbles, passes, etc. they executed. What we are interested in to find out, is what are the variables that actually relate (most) to the number of goals scored by a given team. To this extent, in our model, we consider each team’s performance over the last five seasons, leading to 100 (i.e. 5 times 20 teams) observations. The statistical model allows for filtering out those variables that yield important additional information about the dependent variable (here: goals scored).

A concise model that allows us to explain no less than 85 percent (i.e. the R2) of the goals scored in the past five seasons of La Liga turns out to be the following:

– 13.7 – 0.10112*shots_outside_of_box + 0.41682*shots_on_target + 0.0132*successful_passes

Most important when interpreting this model – rather than the actual number, which is hard to interpret – are the variables included and their respective signs: Shots on target has a positive sign, implying that more shots on target tend to coincide with more goals. The sign of the variable shots outside of box is negative as it negatively adjusts the impact of shots on target attempted from outside the box in terms of their success probability. Successful passes turn out to constitute another important aspect of the offensive contribution, whereas, for example, successful dribbles, do not. Also noteworthy is the relative difference between the respective coefficients of variable shots on target and of successful passes: A shot on target will have an impact on the offensive contribution over thirtyfold (i.e. 0.41682/0.0132) the one of a successful pass. In case I would have data on the area of the pitch where the passes took place (e.g. final third), the model could be made even more accurate and I could also include non-forwards.

Following Severini's Analytic Methods in Sports (2015), to apply to above team-level model at the level of the individual player, we merely need to divide the intercept by 10 (because of the ten field players). Thus, for an individual player,

Offensive Contribution = – 1.37 – 0.10112*shots_outside_of_box + 0.41682*shots_on_target + 0.0132*successful_passes

The top 10 of offensive contributors for the 2015-2016 La Liga season is as follows:

Figure 1: Top 10 offensive contributors, La Liga 2015-2016

Naturally, some players received more playing time than others, e.g. due to injury. For comparative purposes, therefore, it is also helpful to consider offensive contribution assuming all players would have played all matches – at the level of their offensive contribution when they were actually fielded:

Figure 2: Top 10 offensive contributors, La Liga 2015-2016,
assuming all players would have played all of their team's games

Key observations:

1.     The number 1 in terms of offensive contribution is Lionel Messi.
2.     Although Cristiano Ronaldo came closest to Messi in terms of offensive contribution, Neymar would have jumped Cristiano, had they both had the same playing time.
3.     All members of MSN as well as of BBC are included in the top ten, assuming all players played the same number of games.
4.     MSN’s combined offensive contribution is larger than BBC’s.
5.     There is one “ugly duckling” in the top 7 (8), otherwise made up entirely of “usual suspect” stars: Jonathan Viera of Las Palmas. The 26-year-old Canary Islander tends to remain under the radar of more traditional measures of offensive contribution.
6.     The top 10 is completed by club topscorers who had an exceptionally prolific season: Depor’s Lucas Pérez, Betis’ Rubén Castro, Bilbao’s Aduriz and Real Sociedad’s Agirretxe – who missed over half the season due to injury.
7.     Relative differences are quite substantial, e.g. Messi's offensive contribution is almost double the rankings' number 10.

While it is difficult to perform even better in the season following an exceptionally good one – among others, due to a principle known in statistics as “regression to the mean” – I particularly look forward to finding out whether Las Palmas’ Viera can really make a name for himself this season and whether Messi and Cristiano Ronaldo can remain at the very top, or whether youngster Neymar will make a move towards absolute supremacy.

Note that by means of our regression model, we are measuring correlations, not causation. This concretely means that, over the past five seasons, the selected variables were accurate predictors of offensive contribution in La Liga. However, it does not imply that a player’s future offensive contribution will move accordingly. Especially if players were to “act upon” the above model, e.g. by dribbling less and shooting or passing more instead, the included predictor variables may (or may not) lose part of their positive correlation with the dependent variable.

Let the new season commence! J

Tuesday 24 May 2016

How Kompany’s Absence Could Prove a Blessing for Belgium’s EURO Title Hopes

In this short post, I will explain how the injury of Belgium’s captain Vincent Kompany could potentially boost Belgium’s hopes of EURO 2016 victory – and why it probably won’t.

The key to understanding why the absence of a great player in this case may actually turn out to have a beneficial effect on team performance relates to automatisms. Concretely, Kompany plays in the central defense and is sure to play there with Belgium when fit. This necessarily means that the duo Vertonghen-Alderweireld will be broken up and at least one of them will be played on the sides, which is exactly what happened at the 2014 World Cup – when also Van Buyten was a certitude for coach Wilmots in the central defense.

Today, it makes much less sense to consider Belgium's defenders individually than at the time of the previous World Cup, though, for the following straightforward reason: Vertonghen and Alderweireld have been playing side by side all season, as a central defensive duo. Moreover, this was in the Premier League, the world’s most competitive domestic league, as well as in the Europa League. And, above all, no team in the Premier League conceded fewer goals this season than did Tottenham Hotspur. The automatisms of a central defensive duo that played at this rate of success in a direct combination for over 30 games at the highest level can be considered a godsend for any national team.

When Spain won the 2010 World Cup, coach del Bosque had three excellent central defenders at his disposal: Puyol, Ramos and Piqué. Two of them, he would field in the centre of the defense; one of them, he would move to the right-back position. While both Ramos and Puyol could fit the right-back position, naturally, del Bosque always elected to field Ramos on the right, preserving the Barça tandem in the heart of the defense. Spain’s 2010 world cup victory would go down as a prototypical example of one that followed from exploiting automatisms generated at the club level, mainly at Barça.

Source: theguardian.com

Skeptics may question the level of automatisms between Vertonghen and Alderweireld relative to the one of, say, Piqué and Puyol, given that the two Belgians have only been together at Tottenham since the beginning of the season. It is important to realize, though, that Alderweireld and Vertonghen coincided frequently during their careers and were formed in the same football academy: from 2004 till 2006, both were being formed at the youth of Ajax Amsterdam, after having joined from the same club in Belgium, Germinal Beerschot. From 2008 till 2012, Vertonghen and Alderweireld made up the defensive block at Ajax’ senior squad. (Thomas Vermaelen followed a similar trajectory, but left Ajax already in 2009.) Alderweireld and Vertonghen share as solid a common basis as you will find in contemporary football.

Source: scoopnest.com

There are yet two more compelling reasons to combine Alderweireld and Vertonghen at the centre of Belgium's defense. First is that whereas the former is right-footed, the latter is left-footed, underlining their complementarity. Second is that another Belgian who had an individually outstanding season at Tottenham, Mousa Dembélé, there plays one line above the defensive duo, opening up the possibility to leverage automatisms relating not just to the duo, but to the triangle as well.

Within the realm of economic rationality, having more available options can only lead to better or similar outcomes, but not worse. Economic rationality, however, is not the regime of a football team and certainly not the one of the Belgian national team. Leaving the influential captain out of the team or not playing him in his favored position could have had serious repercussions. It is thus only through Kompany’s absence that the way was freed for the successful Spurs duo to be united in the heart of the Belgium defense.

Especially with other contenders for the central defensive roles Vermaelen and Lombaerts mainly having been used as substitutes at their clubs this season and youngster Engels having had to forfeit the Euros because of injury, building on the successful central Tottenham duo would appear to be a no-brainer. Yet, I suspect coach Wilmots to come up with some alternative that will still require breaking up one of the most effective defensive duos present at the Euros.

Tuesday 29 March 2016

BBC vs MSN: But This Year’s Clearly Different, Right?

In last January’s post, I subjected the goal-scoring performances of Barça’s trident and Real Madrid’s BBC to a comparative analysis. I showed that even though Messi, Neymar and Suárez overall turned out to be more efficient than Cristiano, Benzema and Bale, the constituent members of BBC do actually turn out to make each other better (synergistic), whereas Barça’s trident members rather seem to score more often when some of their strike partners are absent (cannibalistic). Neymar emerged as the most pivotal member of Barça’s trident. This analysis encompassed data from the start of both tridents being united at their respective clubs. Now that the 2015-2016 season is nearing a conclusion, it will also be important to investigate what the relations are like for the current season. Perhaps Barça’s trident is already combining much better than before? Let’s revisit the results for the current season only, including both La Liga and Champions League, to date.

We may recall that for the aggregated data, there was only one member of the two trios who was actually more productive when part of his threesome than when not: Cristiano Ronaldo. When looking at the current season, it turns out that all of BBC’s strike partners are more efficient when playing together than when not, whereas no member of Barça’s trident is finding the net more easily when MSN is complete than when not.

Fig. 1: Goal-scoring performance as part of trident (bar) relative to when not part of trident (100%-line)

As can be seen in Fig. 1, the numbers are most extreme for Neymar and for Bale. During 90 minutes on the field with both Messi and Suárez, Neymar scored, on average, 0.4 penalty-adjusted goals. When not part of the trident, in contrast, the captain of the Brazilian national team managed to score slightly more than 1 goal per 90 minutes. Neymar already played for over 1,000 minutes for Barça as not part of the trident this season and almost 2,000 as part of the trident. So even though the numbers for the current season alone are naturally more sensitive than the aggregated ones, there is a solid basis for comparison.

The scoring numbers for Gareth Bale are almost mirror images of Neymar’s. When not part of BBC, the Welshman scored slightly more than half a field goal so far during 90 mins; when part of BBC, 1.2 goals per 90 mins on average. This season so far, Bale has thus been more than twice as productive when part of BBC than when not and his bar in Fig. 1 actually extends beyond what is shown. Bale has been out injured quite a bit though and so there is less performance data on him than would be desirable. Still, he already played in over 800 minutes as part of the BBC constellation and in almost 700 as not part thereof.

So the synergism of BBC vs. the cannibalism of MSN turns out to be all the more true for the current season. Whereas BBC appear to be setting each other up to score more and more, Barça’s trident is not – yet(?) – showing any signs of such improvement. We may well recall from the January post that findings regarding BBC’s synergism and MSN’s cannibalism where quite much lessoned though by the fact that the constituent members of MSN combined were still more efficient than the ones of BBC combined. Let’s see if this still holds true for the current season. Turns out it does not! So far, the sum of the average number of goals scored by Messi, Neymar and Suárez during 90 minutes this season is 2.5, compared to 3 for the sum of BBC’s members. Furthermore, 90 minutes of actually fielding MSN together so far resulted in 2.1 goals, per 90 minutes, vs. 3.5 goals in the case of BBC.

Why then, one naturally wonders, are Barça already this season's uncrowned La Liga champion, with the merengues lagging ten points behind? Although both teams scored and conceded about equally in the league, the scoring data support a twofold answer. First, whereas the above analysis focused only on efficiency, i.e. the average number of (penalty-adjusted) goals per 90 minutes, naturally, efficacy, i.e. the actual total number of (penalty-adjusted) goals scored, is also crucial. Concretely, both Benzema and Bale have been injured quite some time, making that the members of BBC have played slightly short of a combined 7,000 minutes so far this season, in La Liga and Champions League together. Suárez, Neymar and Messi already have more than 8,500 minutes on their combined season’s teller. The efficacy of those three combined is actually higher than the one of Cristiano, Benzema and Bale combined: 78.5 penalty-adjusted goals vs. 76.

Second, the crux to understanding the differential in terms of team outcomes is "goal importance", a measure of how many incremental points one’s goals actually provide to one’s team. Cristiano, for instance, is the undisputed top scorer so far, with 37 penalty-adjusted goals. Suárez has 31, six less. Yet, Suárez’ goals often led Barça to take the lead and/or to win a game, whereas more than a few of Cristiano's goals didn’t actually result in Real Madrid gaining any more points from matches (e.g. neither of Cristiano’s four goals against both Malmö or Celta mattered). The total number of point increments for Barça as a direct result of goals scored by its trident members this season to date amounts to 48.5, compared to 37.5 in the case of Real Madrid (Benzema’s 24 penalty-adjusted goals actually yielded hist team more points so far than did Cristiano’s 37). This underlines the added value of a multi-measure analysis when it comes to top scorers, which has been the subject of a number of earlier posts in this blog, also detailing the goal-importance measure.

Let us conclude the analysis by looking at which player is most pivotal in the blaugrana's scoring as a team. I propose to do this by comparing how often a team (not just the trident) scores with and without a player on the field (a similar analysis for Real Madrid is prohibited by Cristiano not having been on the field for only a minute).

Fig. 2: Avg. number of Barça goals per game with and without player having been on the pitch

Although Barça turn out to score more often without Suárez on the field, the above discussion on goal importance illustrates why this probably should not be overemphasized. The main insight is that the role of Neymar turns out to be even more critical in the current season than for the aggregated data: with Neymar on the field, Barça managed to score 2.76 goals, vs. 1.89 without the Brazilian. The explanation hereof in light of Neymar being both the least efficacious as well as the least efficient of the three members of MSN? As for the aggregated data, of all possible combinations (i.e. solo, duos, trio), the one in which they were most efficient for both Messi and Suárez has been in combination with Neymar only. The presence of the Brazilian seems to enable his strike partners to score, this season more than ever. Most probably, he is the one who destabilizes the opponent, even more than Messi does, which in turn is what creates goal-scoring opportunities.