For college basketball junkies, this is the best week of the year. Selection Sunday through the end of the first weekend of the NCAA Tournament is the greatest eight-day span on the sports calendar, and 2022 marks the first time since 2019 that the NCAA Tournament will be “back to normal” with full crowds in the stands.
But even casual fans of college basketball go through annual ritual of filling out a bracket in an attempt to predict which crazy upsets will occur during the first two days, which teams will make the Final Four, who will finally cut down the nets on that first Monday of April and everything in-between.
While everyone has their own methods and strategies for picking which teams will advance, I have developed my own system over the years that uses a combination of math and historical probabilities. My method is certainly not foolproof, but it does provide some useful tips that have led to some office pool success over the years.
It helped me to virtually nail the Final Four in 2019 and it correctly predicted that No. 3 Texas would not survive its first-round test last year against No. 14 Abilene Christian. This year, I have crunched the numbers once again and I am happy to share the results with the class.
Last year, I presented a more detailed overview of my methodology. Briefly, I made a simple observation several years ago which forms the foundation for the analysis that I am about the present. That observation is:
When it comes to NCAA Tournament upsets, the behavior is exactly the same as in regular season games. The odds are largely predictable based on Las Vegas points spreads and by tools that can predict point spreads, such as Kenpom efficiency margin data.
All of my analysis of college basketball odds is based on this same premise. Kenpom efficiency data can be used to assign probabilities to any arbitrary basketball matchup. Knowing this, the full season and any tournament can be mathematically modeled and its odds can be calculated.
My favorite plot to highlight this fact is shown below.
The figure summarizes the upset frequencies of some of the most common seed pairings that occur in the NCAA Tournament. As we can see, the actual frequencies correlate extremely well with what we would expect based on the win probabilities derived from either the actual Vegas point spreads or Kenpom efficiency data.
Just in case there is still some doubt about the value of using Kenpom data to project point spreads, Figure 2 shows the current correlation for all of the first-round games, based on Tuesday’s lines as published on Draft Kings. Note that the left-hand panel is the full set of data, while the right-hand panel is an enlarged view of the data for the games where the spread is 10 points or less.
As we can see, the correlation is very strong, with only a handful of games differing by more than a point or two.
Upset Odds and Trends
A careful analysis of Figure 2 will already start to give some hints as to where some of the more likely upsets will occur. Are there any pairings above that look to have a tighter spread than one might expect for that seed pairing? Naturally, those are the games to put on upset alert.
A better way to visualize these upset odds are to plot them as a group relative to each other and to the historical odds of an upset for that particular pairing. Figure 3 shows this analysis for the full set of first-round games.
Using this figure, it is easy to see where the most likely upsets will occur. If a game falls below the blue line (the historical odds that the higher-seeded team wins) an upset is more probable. If a game is above the blue line, it is less probable. That said, the odds shown at the right are still the “true” odds for the upset.
The left side of Figure 3 shows the data for the top-four seeds in each region. As a general rule, there is only one or two “major” upset of this nature in any given tournament. There were a total of four of these upsets in 2021, but that was the most in NCAA Tournament history. That said, there have been only five tournaments since 1985 where all of the top-four seeds have survived the first round (1994, 2000, 2004, 2007 and 2017). Therefore, it is quite likely that at least one of those teams will fall.
An upset to No. 1 or No. 2 seeds are quite rare and unpredictable. Based on Figure 3, I would not expect one in 2022. But there are a total of four possible upsets on the No. 3 and No. 4 lines that stick out from Figure 3. No. 3-seeded Wisconsin, No. 4-seeded Illinois, No. 4-seeded Arkansas and, especially, No. 4-seeded Providence are all more likely to be upset than a typical team of their seed, based on historical trends.
If we dig into the numbers, the reasons are clear. In the 2022 tournament, the No. 4-seeded teams appear to be weaker than normal, historically, while the No. 13 seeds are stronger. This is a classic recipe for an upset.
As for Wisconsin, it is not that Colgate is a particularly strong No. 14 seed. In fact, the Raiders are a below average No. 14 seed. The problem is that Wisconsin is a historically poor No. 3 seed, based on its current Kenpom efficiency margin. In my analysis of the Big Ten season, Wisconsin consistently graded out as the luckiest team in the conference. Do not be surprised if the Badgers do not stay for long in the Big Dance.
The right side of Figure 3 shows the data for the first-round games involving teams seeded No. 5 to No. 12. This is where the bulk of the upsets (relative to seeding) occur in any given tournament. Once again, the Figure gives insight into which upsets are more probable.
Interestingly, while No. 4 seeds seem to be in quite a bit of danger in the first round, the No. 5 seeds look fairly safe. Only St. Mary’s College — which will play the winner of the Indiana/Wyoming game in the play-in round — looks somewhat vulnerable. Considering that the winner of the play-in game tends to have success in the first round and the fact that at least one No. 5 seed has lost in the first round of 31 of the past 36 tournaments, this might be a good bet.
As the blue line in Figure 3 shows, teams seeded No. 6 or No. 7 tend to get upset 40 to 45 percent of the time. These games are typically close to toss-ups. In 2022, five of those eight contests look riper than usual for an upset. Warning, the conclusions here are not great for Michigan State fans.
Based on this analysis, No. 7 Michigan State is officially on upset alert versus No. 10 Davidson. Furthermore, No. 11 Michigan also stands a very good shot to “upset” No. 6 Colorado State. As Figure 2 shows, the Wolverines are actually favored in this game. My metrics suggest an upset pick is in order in both cases. You, dear reader, will simply need to pick with your conscience.
As for the other potential upsets, Kenpom has No. 10 Loyola-Chicago favored over No. 7 Ohio State and No. 10 San Fransisco favored over No. 7 Murray State. Those are both good bets. No. 6 Texas also looks vulnerable against No. 11 Virginia Tech.
For the No. 8 and No. 9 seed games, these are historically true toss-up games, as Figure 3 suggests. In this case, it is best to consult the Vegas line, which currently has No. 9 Memphis favored over No. 8 Boise State and which has No. 9 TCU as a pick’em versus No. 8 Seton Hall. Those are the two most likely “upsets” in that group of four games.
Vegas spreads are a useful tool for the first-round games. However, they are not available for any games in subsequent rounds. Fortunately, Kenpom data can be used to project these lines and win probabilities, which still allows for further analysis as shown below in Figure 4.
The left side of Figure 4 compares the odds for the higher seeds to win in the second round of the tournament.
No. 1 seeds get bounced in the second round roughly every-other year, on average. Illinois experienced that in 2021, but all the No. 1 seeds advanced in 2019. Based on Figure 4, the most vulnerable No. 1 seed is Kansas, but there is still a 70 percent chance that the Jayhawks survive until the second weekend.
In general, about two-thirds of all No. 2 seeds advance to the Sweet 16 and it is quite rare that all four survive the first weekend in any given tournament. That said, there is no clear No. 2 seed that appears vulnerable in 2022. I can think of one No. 2 seed that I would like to see lose to a certain Green and White-clad No. 7 seed (if they make it that far) but I will save that analysis for later in the week.
As for the No. 3 and No. 4 seeds, history tells us that about half of them will likely advance to the Sweet 16. Figure 4 gives some strong hints as to which of these seeds are more likely to be upset, and the news is not great for some Big Ten fans.
Interestingly, there are four potential second-round games involving No. 3 or No. 4 seeds where the lower seeded team are projected to be favored in the second round. No. 6 LSU would be favored over No. 3 Wisconsin, No. 4 Houston is projected to be favored over No. 5 Illinois, and No. 5 UCONN is a likely pick’em versus No. 4 Arkansas. Furthermore, No. 5 Iowa is projected to be a big favorite over No. 4 Providence.
Naturally, this assumes that all of the No. 4 seeds win their first round matchups, which Figure 3 suggests is not likely. Also note that No. 3 seed Purdue may be slightly vulnerable to No. 6 Texas, if the Longhorns can beat Virginia Tech in the first round.
Finally, the right side of Figure 4 presents the same data for the potential matchups in the Sweet 16 and Elite Eight. This analysis does assume that the top seeds all advance, which is unlikely, but it does provide some hints as to which teams are more or less likely to advance to the Final Four.
For example, No. 1 Baylor looks potentially vulnerable to No. 4 UCLA in the Sweet 16, and even more vulnerable in a potential Regional Final showdown with No. 2 Kentucky. Figure 4 also suggests that No. 3 Tennessee and No. 3 Texas Tech both might be favored over No. 2 Villanova and No. 2 Duke in their respective regions. No. 1 Kansas would also be only a slight favorite over No. 2 Auburn in a potential Midwest Regional Final.
The analysis above will hopefully provide a good start in filling out a bracket. But, exactly how many upsets should we expect? How is each individual Region likely to shake out? Stay tuned for Part Two, coming soon.