Stat Analysis 2017-18 season

With the new Premier League season underway, everyone is anxious to forecast where their favorite team will end up. Although it’s still early, many people are beginning to predict the final table. One of the best ways to do this is by using the soccer pythagorean theorem, as described here.

This model essentially uses goals scored and goals conceded to give a team an expected points per game. This can then be used to forecast how a team will fare over the rest of the season. In this article, we are going to go back to the 2017-18 EPL season to analyze the effectiveness of the formula.


In order to do this, we are going to retrospectively perform a mid-season prediction. Basically, we are going to take data from the first 19 games (half the season) of the Premier League, and use that to develop each team’s expected points per game. We will then use the expected points per game value, extrapolate it to the final 19 games of the season, and add that to the initial 19 games, to get a final prediction for the 38 game season.
(If you are curious as to how this whole process works, I suggest reading the previous articles, in which the overall method was outlined)
When this process was performed, I found that, on average, the expected points and final points differed by an average of 4.8 points. That means that this model was able to accurately predict the final standings for each Premier League team with an error of just 0.126 points per game.
In fact, 9 of the teams had expected points and actual points that differed by under 2 points at the end of the season. The model accurately predicted West Ham, Crystal Palace, and Newcastle to climb out of the relegation battle, and also predicted Stoke City’s late struggles.
In this study there were just 6 teams that had a prediction error of 6 points or higher. However, of those teams, 4 of them experienced managerial changes during the season. This would explain the unpredictability of their results, as new staff means new playing styles and new results. When these 3 teams are negated in the study, the average points disparity drops down to just 3.98 points over the course of the entire season.


So, what does this mean for this season? Well, once we get close to a reasonable enough sample size (roughly 10 games, I’d say), we’ll be able to accurately predict the fates of teams in leagues around the world. We’ll be able to judge which teams can stay at the top, which teams will have a late surge, and which teams will be fighting to stay up. It’s an exciting way to track what’s sure to be an exciting season.

Author: Nikhil Mehta

Record Breaking Year for Manchester City?

What a start it’s been for Manchester City, who are currently sitting 8 points ahead of 2nd place Manchester United, despite being only 11 games in. They’ve won 10 of those matches, the only standout being a draw with Everton.

City have already been smashing records, for example,  their 13 game winning streak in all competitions (a new club best). In addition, they’ve also beaten their previous mark with 6 consecutive away wins. Their goal difference of +31 is a Premier League record through 11 games, and they also have a perfect Champions League resume to add to the list.

However, at some point the question has to be asked: Can City break the ultimate record – most points in a Premier League season. The current record is held by Chelsea, who notched 95 points in 2004-05 under Jose Mourinho.

A simple look at this says that if they have 31 points through 11 games, they’ll end with 107 points. That’s a pure linear model. However, the sporting world does not work that way. We have to take into account the idea that Manchester City will likely regress back slightly as the season wears on.

 


 

We can model this through the use of our “Pythagorean Theorem” (https://goo.gl/cUiccT). This model takes a team’s goals scored and goals allowed, and uses them to create an expected points per game for that team.

Given Manchester City’s current statistics (which of course will change over the course of the year), they have an expected 2.52 points per game. And with the 27 remaining games in the season, they are projected to obtain another 68 points, which would result in an expected 99 total points at the end of the season.

 

Given that these projections typically have a RMSE, or error, of 0.1226 points per game, we can expect a +/- error of about 6.6 points at the end of the season. This means we are 95% certain that City will finish with a points total between 92.4 and 105.6. Of course this isn’t great, however, it can also be said that there’s about a 70% chance they will finish between 95.7 and 102.3 points.

 


 

Now, we can’t take transfers, injuries, and other unforeseeable events into account, so this is solely based on how they’ve begun their campaign. And so, while nothing is guaranteed, it is likely that we will see Manchester City’s 2017-18 campaign end in a Premier League record. It will be really interesting to see whether the Citizens will achieve this feat, and maybe even go on to reach triple digits.

 

Author: Nikhil Mehta

Applying the “Pythagorean Expectation” to Soccer

One of the most interesting breakthroughs in the world of sports statistics was Bill James’s creation of the “Pythagorean Expectation”. This module predicts a given baseball team’s win percentage based on their number or runs scored and runs allowed. The basic formula for this is: Predicted Win % = (RS2) / (RS2 + RA2). Recently, Professor Abraham Wyner from the University of Pennsylvania came out with his modified version of James’s model. Wyner’s formula takes out all the exponents from the equation: Predicted Win % = (RS – RA) / (RS + RA). This simplification produces virtually the same predicted Win %, and was created to make it easier to do the calculations. With both models, one can determine a team’s predicted win totals down to about ± 10 wins most of the time.

The Pythagorean Expectation has been applied to many other sports, including basketball and hockey. However, one of the sports that it never seemed to forecast correctly was soccer. One reason was that “points” are used instead of “wins” and teams are also able to draw games, where each team receives 1 point. Another reason was that the various leagues around the world don’t all play the same number of games, which complicates making a universal forecasting model. However, while working with a friend of mine, Michael Berman, I believe I came across an extremely accurate model that predicts points for soccer. The formula I used directly mirrors that of Professor Wyner’s modification of James’s Pythagorean Expectation:

Points Per Game = 1.7 * (Goals Scored – Goals allowed) / Goals Scored + Goals Allowed)) + 1.35

In this article, I am going to be testing this model against the top 5 leagues in Europe over the past 10 years.


Serie A

To start out, I tested the model against the Italian Serie A. Using data from the last 10 years, I ran my forecast against every team’s actual performance. Here is what I found:

This model has a correlation coefficient is 0.9648, and a root mean square error (RMSE) of .1137 points/game, or 4.32 points over a season. What this means is that this model can predict a Serie A team’s success to within 8.64 points about 95% of the time.

La Liga

We then performed the same steps upon the past 10 years of La Liga data. Here’s what the top division of Spain gave us:

La Liga had a correlation coefficient of 0.9589, and an RMSE of .1276 points/game, or 4.85 per season. This forecasts a La Liga team’s success to within 9.7 points nearly every time. Even then, La Liga was actually the least accurate of the 5 leagues we tested.

EPL

As for the English Premier League, we were able to gather data from the past 24 years, and we once again received very encouraging feedback.

In this case, the correlation coefficient came out to .9546, and the RMSE was .1226 points/game, or 4.66 per season. Therefore, the model effectively predicted final points down to within 9.22 points 95% of the time.

Bundesliga

The Bundesliga was the only league we studied that had 34 games as opposed to the typical 38 played in other leagues. However, because our model operates in points per game, this was no problem.

In fact, our prediction for this league was one of the most accurate, with a correlation coefficient of .9547, and an RMSE of .1232 points/game, or 4.19 points per season. This mean that 95% of the time, we correctly predicted a German team’s final points within just 8.38 points.

Ligue 1

The final league we looked at was Ligue 1, the French top division.

Ligue 1 produced a correlation coefficient of .9508, and an RMSE of just .1145 points/game, or 4.35 per season. This means that for 95% of the time, our predicted values were within 8.7 points of the actual results.


This is not only an accurate Pythagorean model, but it is also very flexible, as we saw by running it through various leagues. This model can also be used mid-season to see whether a team is underperforming or overperforming their expected points per game, and that can help to predict whether they will improve or worsen in the latter part of a season. As I mentioned earlier, the baseball pythagorean expectation varied usually about 10 games. With this model, an interval of around 8.5 points is just under 3 wins over the total season, and that comes with 95% precision.

Up until now the most accurate model we saw had an RMSE of 4.7 pts/season (ours is around 4.25 on average), and that model only worked for leagues with 38 games. In addition to this, it could only be used after all games had been played. So, while creating the most accurate “pythagorean model” for soccer, we also developed a tool that can be used to figure! out what teams have been the “luckiest and unluckiest” given their performances, and also forecast how a team will perform for the remainder of the season (using the assumption that a team will regress towards their expected points per game value).

It will be interesting to put this up to the test in the upcoming 2017-18 seasons, and we expect to find high accuracy all around the world. While this model isn’t perfect, it’s very close to it.


Authors: Nikhil Mehta, Michael Berman

Red hot Sanchez hits goalscoring form

While Arsenal fans hadn’t started doubting Alexis Sanchez, the Chilean’s first hat-trick for the club couldn’t have come at a better time.

 

Without a goal to his name yet this season, Sanchez opened his account for the campaign with a sizeable deposit, hitting three goals in Arsenal’s impressive 5-2 victory over Leicester City, and then followed that up with a brace against Manchester United. Currently on five league goals for the season, Sanchez has seen his odds of finishing as the league’s top scorer shorten to 10/1 with some bookmakers when this article was produced.

 

Having made an electric start to life in London following his switch to the Gunners from Barcelona last summer, Sanchez’s goalscoring slipped off somewhat during the second half of last season, and the striker had gone eight games without scoring before his hat-trick against Leicester. Despite failing to continue his early season form in front of goal throughout the entire campaign, the forward still managed to finish his first year in English football with 25 goals to his name, helping the club win the FA Cup for a second successive season.

 

Now back among the goals, Arsenal fans will be hoping Sanchez can play a major role in the club challenging for this season’s Premier League title. While Manchester City might still be many punters’ favourites this season, currently 10/11 with betfair and other major bookmakers, Arsenal are very much in this year’s title race and Sanchez rediscovering his scoring boots certainly strengthens their case.

 

15362272352_65338d8d67_z

CC  by  Ronnie Macdonald 

 

While Sanchez isn’t Arsenal’s only goalscoring threat, the fact is that, when he is in form, Arsenal are a much more dangerous team than when the Chile international is firing on all cylinders. Olivier Giroud has certainly done well over the past three seasons, starting this campaign in solid form with three goals from his opening seven league appearances, and Theo Walcott continues to stake his claim for a role up front, scoring the opening goal against Leicester. But it’s Sanchez who could hold the key this year.

 

giphy

 

As well as his goalscoring, Sanchez also has a big role to play in terms of assists, setting up his team-mates on 12 occasions last season and proving just how important he is to Arsenal from an offensive point of view. If he can get close to 20 goals and 10 assists again – with the likes of Giroud and Walcott weighing in with 10-plus goals apiece, there is no doubt that Arsenal could well push the likes of City much harder than they have managed over the past few seasons – possibly even all the way through to May.

 

Priced at 4/11 for a top-two finish in the league, Arsenal fans might not be satisfied with that this year, especially with Sanchez in this sort of form.

Using Football Stats to Achieve a 100% Success Rate (Guest Article)

Rooney

licence by  nasmac  Caption: Manchester United’s Wayne Rooney

There might not be any certainties in life, but when it comes to betting on soccer you can get pretty close to perfection if you learn how to hedge your bets. Knowing the game well enough to understand the odds being presented to you by a betting exchange is a vital skill. However, once you’ve figured the perennial performers and eternal losers in soccer leagues you’re betting on, the process of picking a winner becomes a lot easier.

Of course, when it comes to soccer betting, the dynamics of a game are constantly changing and that means you need to stay on top of the numbers. Through a combination of general knowledge about the game and analysis of the hard facts (i.e. numerical data), you can improve your expected value (EV) enough that making a profit is easy.

It’s All About EV

In fact, improving your EV is the only thing you should be concerned about as a soccer gambler. Knowing that you’re able to make the best decisions in every situation is the sole aim for any professional gambler because they know that once they’ve made a +EV bet, the money will follow. While this doesn’t mean you’ll win every single soccer bet you place, it does mean you’ll make an overall profit if all your bets are +EV.

Thanks to the rise of modern betting apps and computer software, it’s now possible to make a guaranteed profit betting on soccer. By taking away the laborious task of running the numbers, crunching the data and coming to a conclusion, online betting apps have made it possible for punters to always make +EV bets.

Going Green

One such method that’s become possible through a combination of betting exchanges and modern technology is “greening up“. Used to describe the process of simultaneously betting on both sides of a soccer fixture to ensure a profit, this technique has only really been possible in the last few years thanks to online sports betting exchanges such as betfair.

In a nutshell, greening up takes advantage of the movement of odds within a sports betting exchange. If you’re able to identify a team that you think will win a match and, importantly, one that’s odds won’t drift (get worse) then you’re able to “green up”. Basically, you want the odds to shorten for the team you’ve picked to win.

If this happens, then you can back the other side of the bet (lay against the team) and secure a guaranteed profit thanks to the difference in returns between the two bets. For example, let’s say you backed Manchester United at the start of the week to beat QPR on Saturday. Given that it’s a relatively early bet you managed to stake £10 at 4/1 (5.00) on United winning the game.

Using Numbers to Your Advantage

At this point you believe United’s odds will shorten, so you look to make a lay bet. However, you need to know how much you need to stake. When it comes to greening up, it’s good to back high and lay low. So let’s assume you want to lay at odds of 3/2 (2.50). To quickly work out how much you need to stake on your lay bet, you can simply take the return if your backed bet wins (in this case £50) and divide it by the lay odds. However, if this is too much like hard work then you can use an online calculator.

Plugging in these numbers, you’d need to place a lay bet of £28 in order to achieve an £18 profit on this bet. Sound too good to be true? Well, it isn’t. Although there may be some commission to pay on winning bets, it’s possible to use your knowledge of soccer, take some stats and crunch some numbers in order to lock up a profit regardless of how a team performs on the pitch.