Applying the “Pythagorean Expectation” to Soccer

One of the most interesting breakthroughs in the world of sports statistics was Bill James’s creation of the “Pythagorean Expectation”. This module predicts a given baseball team’s win percentage based on their number or runs scored and runs allowed. The basic formula for this is: Predicted Win % = (RS2) / (RS2 + RA2). Recently, Professor Abraham Wyner from the University of Pennsylvania came out with his modified version of James’s model. Wyner’s formula takes out all the exponents from the equation: Predicted Win % = (RS – RA) / (RS + RA). This simplification produces virtually the same predicted Win %, and was created to make it easier to do the calculations. With both models, one can determine a team’s predicted win totals down to about ± 10 wins most of the time.

The Pythagorean Expectation has been applied to many other sports, including basketball and hockey. However, one of the sports that it never seemed to forecast correctly was soccer. One reason was that “points” are used instead of “wins” and teams are also able to draw games, where each team receives 1 point. Another reason was that the various leagues around the world don’t all play the same number of games, which complicates making a universal forecasting model. However, while working with a friend of mine, Michael Berman, I believe I came across an extremely accurate model that predicts points for soccer. The formula I used directly mirrors that of Professor Wyner’s modification of James’s Pythagorean Expectation:

Points Per Game = 1.7 * (Goals Scored – Goals allowed) / Goals Scored + Goals Allowed)) + 1.35

In this article, I am going to be testing this model against the top 5 leagues in Europe over the past 10 years.

Serie A

To start out, I tested the model against the Italian Serie A. Using data from the last 10 years, I ran my forecast against every team’s actual performance. Here is what I found:

This model has a correlation coefficient is 0.9648, and a root mean square error (RMSE) of .1137 points/game, or 4.32 points over a season. What this means is that this model can predict a Serie A team’s success to within 8.64 points about 95% of the time.

La Liga

We then performed the same steps upon the past 10 years of La Liga data. Here’s what the top division of Spain gave us:

La Liga had a correlation coefficient of 0.9589, and an RMSE of .1276 points/game, or 4.85 per season. This forecasts a La Liga team’s success to within 9.7 points nearly every time. Even then, La Liga was actually the least accurate of the 5 leagues we tested.


As for the English Premier League, we were able to gather data from the past 24 years, and we once again received very encouraging feedback.

In this case, the correlation coefficient came out to .9546, and the RMSE was .1226 points/game, or 4.66 per season. Therefore, the model effectively predicted final points down to within 9.22 points 95% of the time.


The Bundesliga was the only league we studied that had 34 games as opposed to the typical 38 played in other leagues. However, because our model operates in points per game, this was no problem.

In fact, our prediction for this league was one of the most accurate, with a correlation coefficient of .9547, and an RMSE of .1232 points/game, or 4.19 points per season. This mean that 95% of the time, we correctly predicted a German team’s final points within just 8.38 points.

Ligue 1

The final league we looked at was Ligue 1, the French top division.

Ligue 1 produced a correlation coefficient of .9508, and an RMSE of just .1145 points/game, or 4.35 per season. This means that for 95% of the time, our predicted values were within 8.7 points of the actual results.

This is not only an accurate Pythagorean model, but it is also very flexible, as we saw by running it through various leagues. This model can also be used mid-season to see whether a team is underperforming or overperforming their expected points per game, and that can help to predict whether they will improve or worsen in the latter part of a season. As I mentioned earlier, the baseball pythagorean expectation varied usually about 10 games. With this model, an interval of around 8.5 points is just under 3 wins over the total season, and that comes with 95% precision.

Up until now the most accurate model we saw had an RMSE of 4.7 pts/season (ours is around 4.25 on average), and that model only worked for leagues with 38 games. In addition to this, it could only be used after all games had been played. So, while creating the most accurate “pythagorean model” for soccer, we also developed a tool that can be used to figure! out what teams have been the “luckiest and unluckiest” given their performances, and also forecast how a team will perform for the remainder of the season (using the assumption that a team will regress towards their expected points per game value).

It will be interesting to put this up to the test in the upcoming 2017-18 seasons, and we expect to find high accuracy all around the world. While this model isn’t perfect, it’s very close to it.

Authors: Nikhil Mehta, Michael Berman

Simple Weekender (Draw No Bet, Asian Handicap or DIY)

Draw no bet often has a lot of appeal in a match as it gives you that extra safety net. After a chat today we thought Stoke looked a decent price at home to Bournemouth and although we did fancy the win the draw no bet just gave us the extra security we like. When placing that sort of bet it pays to look around at the various options to get the best price.

For example take the Tottenham v Man Utd game tomorrow.  If we wanted to back Tottenham then we see the following best prices

Draw No Bet = 1.63 (Marathon)

Asian Handicap (0) = 1.64 with Bet Victor

So we can get a better price on the Asian Handicap than draw no bet. There is also the option of putting this together yourself. The final option is to do it yourself and back the home team and then the draw to cover your stake in it. To work this out you can use this draw no bet calculator and the best price for the Spurs win was 2.25 and the draw was 3.75 so if we had £10 we would put £7.33 on Spurs and £2.67 on the draw and doing this would give us odds of 1.649 which beats both of the prices above. It pays to shop around and there are some great resources on the internet to help you.

If we wanted to back Liverpool we would get

Draw No Bet = 2.52 (Marathon again)

Asian Handicap (0) = 2.47 (188 Bet or Bet Victor)

DIY = 2.536 (Liverpool at 3.46 and the draw at 3.75)

Yet again the do it yourself option is the best. There is obviously a time issue doing this as you need to place 2 different bets with 2 different bookmakers but if its a big bet once in a while its worth doing.

Now we just need Stoke to win!

Simple Weekender (Watford)

Marcos Silva’s Watford host Arsenal on Saturday and are available at 5.5 for the win. Watford were smashed by Man City last home match but drew with Liverpool before being held at home to Brighton. Arsenal themselves got smashed away to Liverpool and also lost at Stoke with their only point from the draw at Chelsea. All stats are available here.

Silva had a 41 home match unbeaten run for his teams at home and Hull were awful before he arrived and actually game them some hope and belief. His 9 home matches included 6 wins, a draw and 2 losses, one of which was to Spurs when they were relegated.

If you had bet £10 on the home win in each of his Hull matches you would have ended up £120 better off. Not quite sure how Arsenal are so short in a tough match.