Sunday, 19 October 2014

So the Ball Doesn’t Go In by Chance. But Do We also Know Why?

Football leaders who come from a business background generally are met with great scepticism by the wider public and everyday evidence shows this is often deservedly so. The take of Ferran Soriano, CEO of Manchester City FC and former Vice President of FC Barcelona, is an interesting one, though: he warns against readily translating business ideas into a football context and instead extracts from his business background rather the conviction that a thorough understanding of the logic behind a phenomenon – let it be business or football – will allow for improving upon its management. Moreover, to better understand the logic of football, he embraces authoritative research findings in this arena.

As a notable example, in the first chapter of Goal: The Ball Doesn’t Go In by Chance, the now CEO of the Citizens points to findings by researchers Szymanski and Kuypers of a club’s success being strongly positively correlated with how much the club spends on players' wages, but not transfer budgets – a key take-away featured also in Soccernomics, the bestseller co-authored by Szymanski. Two things are striking about Soriano’s account. First, that he seems to get the interpretation of the findings more right than the researchers in question. Secondly, that it appears to be the “common sense” Soriano refers to that helped him get the interpretation more right, but that a true logic for doing so remains hidden. In this post, I will explicate the logic behind a plausible interpretation of these findings. This will also provide a handy checklist for when interpreting research findings more generally.

In Soccernomics, the authors mention that, based on analysis of historical data from the English Premier League, they are able to conclude that, “the correlation [of a team’s league position] with players’ pay [over a long period is] about 90 percent. In short, the more you pay your players in wages, the higher you will finish”. Even more than the content of this claim, what is interesting here, from a scientific perspective, is the leap from one step to the next: the strong positive correlation is suggested to imply a causal relationship of the type “A leads to B” (with "A" being players’ pay and "B" league position). The A->B causal interpretation, however, is only one of at least six possible explanations for the strong positive correlation. I will argue that it is most probably not the applicable one here and that Soriano intuitively rather seems to have gotten the most plausible one. 

Possible explanations for the strong positive correlation include the following:

1.     There is correlation, but no causation. In fact, correlation does not automatically imply causation: some things just happen at the same time, but without one causing the other. For instance, following the Dutch Armada’s recent surprisingly disappointing performances in the Euro 2016 “qualifiers” – let’s more accurately call them “friendlies”, as everyone is to qualify anyway – journalists started speculating, rather jokingly, that the Dutch may be encountering the “curse of the third”. Apparently, since 1980, no country other than Germany that ended third in the preceding World Cup has been able to qualify for the subsequent Euro Championships. People of common sense readily understand that this is just a “funny coincidence”, e.g. similar to England on average being more successful when playing in red than white jerseys. Moreover, the number of data points would be too small to make any rigorous claim to the contrary. Szymanski et al.'s analysis appears to be based on ample data so that this explanation of mere coincidence can comfortably be ruled out.

2.     The strong positive correlation is not due to coincidence and thus indeed points to a causal relationship. In this case, the causal relationship can still be one of at least four types:
  • A leads to B. In our example, higher wages directly lead to a better league position, i.e. the explanation proposed by Szymanski et al.
  • B leads to A. In our example, a better league position leads to higher wages. Winning a league naturally implies winning prize money, of which the players are to get a significant share. Not all that implausible as an alternative explanation then, perhaps. But note that the proportion of prize money in a typical footballer’s wage tends to be rather small.
  • A and B are positively correlated, not because one causes the other, but because they have a common cause (“C”). A typical example would be the one of the use of sunglasses and temperature being strongly positively correlated. Neither one causes the other. Rather, they both have a common cause, viz. intensity of sunlight. In our case, A and B would seem more closely connected than merely through a possible common cause, though.
  • What is also possible is that A does lead to B, but only indirectly, i.e. through another variable, viz. a mediator ("D"). In our case, it seems most plausible that paying higher wages (A) leads to a better league position (B), but primarily if not only to the extent that these wages are used to employ better players (D).

So, which one will it be?

Not only do Szymanski et al. seem to incorrectly suggest that correlation automatically implies causation, but also that if there is causation, it must then be of an “A leads to B” nature. To appreciate how unlikely this explanation for the on-hand correlation would be, consider what it actually implies: that if you are to pay more (to whomever), you will obtain a better league position. So if you merely start paying your team’s current players as if they were Messi or Cristiano Ronaldo, your team would soon get the results of Barça or Real Madrid. Szymanski et al. seem at least to acknowledge this inescapable implication of their attribution of their findings and – more puzzlingly – seem to go with it: “The question, then, is...[with] this knowledge of the relative importance of wages and unimportance of transfers, how can you win more matches? ...In general, it may be better to raise the pay of your leading players”. Agreed, paying peanuts is likely to hire you monkeys. But is rewarding your monkeys as if they were stars going to turn them into a dream team, really?

What about Soriano? He, as readily, neglects coincidence, thus equating correlation and causation – though he seems to be ruling out coincidence already in the very title of his book. His interpretation of the findings: “So, if you want…a team with a chance of regularly winning championships, then you need to work consistently to have a big club that generates enough revenues to be able to sign the best football talent available.” So the finding that transfer expenditure does not seem to be strongly positively correlated with sportive success does not fool Soriano into believing that it is your current squad that you should simply start paying more. Interestingly, while he gives the impression he is merely reiterating Szymanski et al.’s conclusion, in fact Soriano is rather endorsing the alternative explanation of a favourable position in the league pre-requiring that high wages are used for attracting top talent.

Where Soriano does err, is in stating that per the analysis and findings by Szymanski et al., common sense has been “corroborated by mathematics”. A subtle, yet meaningful, distinction is to be made here. Szymanski and Kuypers were not using mathematics, but rather statistics and, more specifically, regression analysis. Why is this distinction important? Because mathematics is the only way by which general truths can actually be proven. For instance, that (a + b)2 = a2 + b2 + 2a.b can easily by proven by simple application of algebraic rules. Unless someone is able to spot an error in the mathematical proof, a general rule has thus been proven. Additionally, in the natural sciences, there are several general laws that are derived from observation. An apple falling from a tree, for instance, points to the existence of gravity, which will always apply on the earth’s surface.

It is very different in the social sciences, however, where human will and behavioural traits make that not all priors will necessarily lead to the same outcomes. Regression analysis is here typically used to discern meaningful patterns. Very importantly, though, such analysis cannot prove something to hold generally; it can only show for something to occur, with a certain level of probability (cf. confidence intervals) under certain conditions and controlling for certain factors. (One can prove for something not to hold generally, by means of an empirical counterexample, e.g. if a team of 8 beats a team of 11, it has been proved that this is not impossible. However, the general rule can never be proven (empirically): the fact that so far every time a team of 11 came to beat a team of 7 does not show – let alone prove – that that is what will always happen under such conditions.)

Consequently, in the case of regression analysis, and unlike with mathematics, it is not uncommon for different researchers to be able to show quite contradictory findings, based on similar data, e.g. by accurately vs. not controlling for mediation or a common cause.

In short, when there is strong positive correlation, several things can be going on, including nothing at all (i.e. coincidence); A leading to B; B leading to A; C leading to both A and B; mediation; or strong correlation may disappear when accounting for adequate controls. The "noisy neighbours'" CEO appears intuitively right in his interpretation of the potentially important football findings proposed by Szymanski et al. I would expect him to be first to acknowledge that it is helpful to have made visible the hidden logic of why that is so.