In the realm of the mathematics of sports, one of the most fascinating topics is that of the Las Vegas spread. The purpose of the Vegas spread is just to get equal amounts of money bet on both teams so that the folks that run the casinos are sure to make money either way. This is certainly true.
However, there are a lot of very interesting mathematical/statistical facts about the spread that are also true and which can provide insight and even predictive power. I have been studying the Vegas spread in both college basketball and football for several years, and over that time I have gained a lot of insight. I will share some of that insight with you here today.
“Vegas always knows” on average, with high variance
Based on my analysis of college football spread data back to 2001, Vegas is the best predictor of the outcome of a given game. If you plot the final spread versus the average margin of victory, you get a very high correlation and a slope of 1.00, as shown below in Figure 1.
However, the variance of this data is quite large. Figure 2 below shows the actual point differential of every game from the 2019 season compared to the opening Vegas spread. The correlation is very weak and the scatter is very large.
Also notice all those data plots below the x-axis. Those are upsets, which account for roughly 25 percent of all games in a given season, every season. So, instead of just the average margin of victory, Figure 3 below shows the standard deviation of the point differential vs. the spread.
The standard deviation is 14-15 points, which if you think about it, is huge. That is like saying, “Team X is favored to beat Team Y by five points, plus-or-minus two touchdowns.” Also, this deviation from the spread is essentially normally distributed, as shown below in Figure 4.
So, another way to think about this is that roughly two-thirds of all games will have a margin of victory that is plus-or-minus two touchdowns from the spread. The crazy thing is, that implies that a full one-third of games will have a margin of victory over two touchdowns from the spread. AND, for about five percent of games, you can expect the spread to be off by more than four touchdowns, in either direction! Considering there are 50-60 games a week, this result is likely to be observed two to three times per week.
Victory Probability is Highly Correlated to the Spread
If we take all this data together, we can also plot the odds that the favored team will win any given contest, based on the spread:
The trend line in Figure 5 is not just a best fit line. It is derived from the data shown above where for each spread I assume actual result (point differential) is normally distributed with a standard deviation of about 14 (14.112, to be exact, which gives the minimum error). The mean of each distribution is equal to the Vegas line, and the win probability is the fraction of the normal distribution greater than zero.
Despite having over 12,000 data points, there is still some scatter, but if I use a seven-point box car smoothing function, it looks like this:
You can see how well the trend line fits the data. This correlation is one of the cornerstones of my college football and basketball analysis. If one can project the point spreads of future games using other methods (such as my power rankings or Ken Pomeroy’s basketball efficiencies), it is then possible to convert these spreads to probabilities and use tools like Monte Carlo simulations to estimate the odds of any given game or season outcome.
What Does It All Mean?
Several very interesting questions result from this data. Does Vegas adjust the lines based on the known betting habits of certain fan bases? They almost certainly do, but I have never been able to detect any clear bias in the data. Also, it would certainly be very easy to do that for just a handful of games, and that data would just get swamped by all the other data. So, I just ignore this possibility. If I can’t measure it systematically, I don’t care about it.
Practically, if a team is favored by 10 points, this translates to a ~75 percent chance of victory for that team. Does this mean: If those teams were to play 100 times, one team would (roughly) win 25 times and the other would win 75 times? OR, does it simply mean that in any given game with a 10-point spread, the favorite will win 75 percent of the time by an average margin of 10 points. I think that the second statement is clearly the correct one.
The first statement is a fascinating concept in itself. It is easy to fall into the trap of thinking that (for example) because MSU beat UofM in Ann Arbor back in 2017, the 2017 MSU team would beat the 2017 UofM team 100 percent of the time if they played again. I am also sure that there are Michigan fans out there that believe that both the 2018 and 2019 Wolverine squads would have beaten their contemporary Spartan opponents 100 percent of the time as well. Those statements are all certainly false. But, what I think the Vegas line does (in effect if not in intent) is to estimate this likelihood, based on all the information available at the time. By the end of the season, I think it is pretty likely that they get close to this reality.
For reference, here is the probability of victory data in tabular form:
Finally, I have one new piece of data to share, which is the likelihood of a given big upset per year. The table above shows that once the spread gets over ~28 points, the odds of the underdog winning drop to under two percent. But, sometimes this is hard to understand how likely or unlikely this even actually is, especially since most of the contests in a given year have a spread that is between one and two touchdowns.
If you factor in the number of games typical in a given year with a given spread, you can create a kind of cumulative distribution function of the number of expected upsets observed in a given year, as a function of the spread. Those data are shown below in Figure 7.
An upset when the spread gets above 14 is a once-a-week type of occurrence (not shown). Once the spread gets over 25, we enter the realm of a “once in a season” event. A spread of 30 or higher is a once every three year event, and it goes up quickly. The “10-year storm” is a spread of 33.5, the “50-year storm” is a spread of 37.5, which just so happens was the opening spread for the biggest upset on record, Stanford’s 2007 upset of USC.