This is part two of the football prediction model series. In Part 1 I looked at why existing match prediction models are limited.
I don’t know how sophisticated these expected goal models can get. I am sure that the best models are developed by betting companies or professional gamblers. In both cases they are safely guarded. But I think expected goals is not the answer.
Football Prediction model based on… relative player salaries
I recently read Soccernomics for the second time and one particular statistic peaked my interest: there is a very strong positive correlation between relative squad payroll and position in the league. We are talking 0.91
It makes sense that teams with better footballers win more matches. And better footballers earn more. Transfer market is by not a very efficient market, but it’s still driven by supply and demand and in theory salary is an indicator of players quality. This is my hypothesis. Soccernomics is 10 years old, so I decided to quickly check EPL salaries for the 2018 season and the end of the season standings. Lets take a look:
That’s a pretty strong correlation, I would say. But, I wanted to confirm this hypothesis further with some historical data. I looked at La Liga 2010/2011 match results and corresponding market values of the players on Transfermarkt.com. Value of the player is not indicative of his salary but I was too lazy to do the research. Two evenings of excel spreadsheet manipulation revealed that teams that are 4 times more expensive win 75% of the time, regardless if the match is played at home or away. So more expensive teams win 75% of the time. Maybe I am onto something. Take a look at the results from La Liga 2010/2011:
|Relative Team Value||% of Wins|
Can this statistical model thing be profitable?
Me being the good entrepreneur that I am, I had a natural question: could I profit from this finding? Since I know which teams will win 75% of the time, would I make money if I bet for the stronger team to win? I quickly pulled historical odds for betting sites but was disappointed to learn that the betting odds reflected my findings. Obviously bookies adjusted the odds to account for the favourites. I would only profit $192 if I had an average bet of $100 on all 72 matches. Not the best ROI but the model has some promise.
You see, I am not really interested in making a fortune in betting, although that would be nice. I am fascinated by the world of analytics and building a probability model seems like an interesting task to test my hypothesis.
If I can monetize this model with some Value Betting, it would only be icing on the cake.
The Next Steps in Building my Football Prediction Model
Initial steps showed a bit of promise but the model is overly simplistic. I need a bigger data sample to find out the relationship between player salaries and sporting outcome. I believe 15 years of statistical data will be a large enough sample.
Unfortunately individual player salaries are not publicly available with one major exception. La Gazetta dello Sport has publishes them on an annual basis for the last few years, so perhaps I can extrapolate the data when player salaries are not available. Our friends at Football-Data have all of the historical results of Serie A, so it is fairly easy to get match results as well as bookie odds.
Historical probabilities will be the basis for the model, however if I want to predict future outcomes, I must account for changes in player values based on:
- Differences in the motivation level between teams. This can occur when a new coach comes in or at the end of the season when a team has nothing to lose
- Player form and ratings. I would like to be as objective as possible, so I will have to find a way to account for this
- Home or Away advantage
These, however are next steps. First I need to analyze data from nearly 5,000 matches and establish the relationship between player salaries and final score. I will present the finding of this football prediction model in the part 3 of this blog.