clock menu more-arrow no yes mobile

Filed under:

March Madness Metrics: Performance versus Expectation

Wins and losses in March are important, but not all NCAA Tournament games are created equally. Can we use math to clarify which coaches truly rise to the top?

NCAA Basketball: Big Ten Conference Tournament- Purdue vs Michigan State Trevor Ruszkowski-USA TODAY Sports

As spring blooms in the state of Michigan, The Only Colors is taking a look back at the 2022 Men’s basketball season and the NCAA Tournament. In the previous installment of this series, we counted up NCAA Tournament wins, tabulated win percentages, and analyzed some round-by-round data.

What we found is that Michigan State head coach Tom Izzo is solidly in the top-10 of coaches and in all categories in the modern era (since seeding began in 1979). Furthermore, when it comes to Sweet 16, Elite Eight, and Final Four appearances, he personally has achieved more as a head coach than every school in the Big Ten over their history since 1979.

However, not all games and paths in the NCAA Tournament are created equally. For example, recently retired Duke head coach Mike Krzyzewski won a total of 101 NCAA Tournament games in his career. Of those wins, 25 of them came versus No. 15-and-No. 16-seeded opponents. There are only 25 coaches total since 1979 who have won 25 games in the Big Dance. For comparison, only six of Tom Izzo’s 53 career Tournament wins came against such low seeds.

So, while Duke and Coach K earned the right to play that many No. 15 and No. 16 seeds over his 36 total tournament appearances, it clearly gave him the opportunity to pad his numbers when it comes to the simple accounting of March wins and losses. Fortunately, there are more advanced ways to level the playing field by looking at metrics that measure performance compared to expectations.

Performance Metrics Summary

In total, there are five performance-versus-expectation metrics that I tabulate for the NCAA Tournament. Two of these metrics are commonly used by others, two of them I created myself and one is another fairly simple accounting stat. I have explained each of these metrics before in detail, so I will only briefly introduce them here:

PASE (performance against seed expectation):

PASE is the “original” advanced NCAA Tournament metric. It measures the number of wins for each coach or team relative to the historical total number of wins per tournament for teams with a given seed. For example, No. 1 seeds have historically won 3.34 games per tournament since 1985. In order for a No. 1 seed to overachieve with a positive PASE score, it would need to win four games and advance to the Final Four.

PARIS (performance against round-independent seed):

PARIS is a metric that I created (prior to hearing about PASE) that measures almost the same thing as PASE. The difference is that I consider the historical win percentage for each seed in each round separately and not for the tournament as a whole.

PAD (performance against exact seed differential):

PAD is a variation on PARIS that I created, which takes into account the seed of the opponent for each tournament game. For example, playing a No. 15 seed in the second round is quite a bit easier than facing a No. 2 seed. PAD accounts for this difference, while PASE and PARIS do not.

PAKE (performance against Kenpom expectation):

PAKE is the other commonly-used metric that is similar to my PAD metric. PAKE accounts for the true strength of each opponent in each tournament game, regardless of seed, based on Kenpom efficiencies. However, this metric only goes back in time as far as 2002.

Chalk (+/-)

This is a simple accounting stat that measures the total number of games won by a coach or team relative to the situation where the higher seeds win all tournament games up to the Final Four rounds. Chalk and PASE give similar information.

With these definitions in mind, Table 1 below summarizes these NCAA Tournament metrics for 32 notable head coaches, sorted by PASE.

Table 1: Summary of performance versus expectation metrics for 32 notable men’s basketball coaches through the 2022 season.

As we can see, when it comes to performance relative to expectation (i.e. seed) Tom Izzo is the best NCAA Tournament coach in the modern era.

Coach Izzo’s current PASE score of 14.94 is two games better than Louisville legend Denny Crum. The story for PARIS is similar. Coach Izzo’s current PAD score of 8.07 is a half-game better than Villanova legend Rollie Massimino. Izzo is in first place all-time in all three metrics and also is at the top of the leaderboard in the Chalk metric.

The only metric where Coach Izzo is not currently in first place is in PAKE, where he currently sits in fourth place behind Jim Boeheim (Syracuse), Roy Williams (Kansas and North Carolina) and John Beilein (West Virginia and Michigan). That said, PAKE does not account for Izzo’s first four tournaments, which included a Sweet 16 berth and three consecutive Final Fours.

For some additional context, Izzo’s current PARIS and PAD scores are higher than any other coach at any point in their career. In fact, the only time a coach achieved a higher PARIS or PAD score was Izzo himself following the 2015 tournament when his PARIS score was 9.31 and his PAD score was 8.75.

As for PASE, Krzyzewski did surpass Izzo’s current score back in 2001 with a score of 16.02. However, Izzo does hold the record for the highest PASE of all time with a score of 16.46 following the 2015 tournament. Note from Table 1 that Coach K retired with a final PASE of 11.60, and his PAKE is actually negative (-1.90) since 2002.

Comparing the Metrics

Many other websites will reference the PASE and PAKE metrics, and they both certainly have value. But the PARIS and PAD metrics have certain mathematical properties that allow us to extract some additional interesting information. Specifically, the PARIS metric compares performance per round to the historically average performance for every team of the same seed in that round. The PAD metric is very similar, but it references the specific seed of each opponent, meaning that it is more specific to the actual difficulty of each game.

In other words, PAD more accurately reflects the true difficulty of a team’s path in the tournament. For example, did a highly-seeded team suffer an upset earlier in the bracket that then made the path easier for the other team in question? When PARIS and PAD are compared, the value represents the amount of “luck” that a team or coach has had in the opponents that they have faced.

This effect is best shown below in Figure 1.

Figure 1: Comparison of NCAA Tournament luck (as measured by the difference between PARIS and PAD) and true NCAA Tournament performance relative to expectation (PAD).

Figure 1 compares the “luck score” (PAD subtracted from PARIS) to the PAD metric, which is indicative of the “true” performance versus expectation in NCAA Tournament play. Figure 1 includes data from all 666 head coaches who have appeared on the sidelines of at least one Tournament game.

The vast majority of these data points are clustered near the origin. However, several notable coaches appear in the area outside of this middle region. Each coach’s position on the graph gives information about the relative impact of “luck” on their tournament performance relative to expectation.

The upper right-hand corner of the graph highlights coaches with both positive PAD and luck metrics. In other words, on average these coaches have been both lucky and good. Most notable in this section of the graph are Krzyzewski, Beilein, Boeheim and the all-time king of NCAA Tournament luck, former Florida coach Bill Donovan.

Coach Donovan’s example helps to illustrate the meaning of the luck metric. History tells us that a No. 15 seed has defeated a No. 2 seed in the first round a total of 10 times in NCAA Tournament history. Naturally, this upset will usually favor the remaining teams particularly in that half of the bracket, as the nominally “strong” No. 2 seed has been eliminated. While at Florida, Donovan benefited from this type of upset in both the 2012 tournament (as a No. 7 seed) and in the 2013 tournament (as a No. 3 seed).

While Donovan certainly enjoyed a lot of tournament success, his performance relative to expectation was certainly “padded” later in his career due to some fortunate upsets in his part of the bracket. Similarly, Coach K, Beilein and Boeheim have been similarly “lucky” compared to the average NCAA Tournament coach.

Michigan State’s Izzo has also been slightly lucky over his tenure, but by just over half of a win. Rick Pinito (Kentucky and Louisville) and Crum (Louisville) were similarly good and more neutral on the luck scale.

As for other coaches of note: Massimino was almost as good as Izzo in the PAD metric, but the Villanova legend rarely caught a break with upsets in his bracket, while Williams was also good, but not so lucky. Former Arizona great Lute Olson was average in the PAD metric and almost as unlucky as Massimino. Meanwhile, Kansas’ Bill Self is also average based on PAD, but has been noticeably lucky.

Then, there are the coaches that have underachieved over the years, based on the PAD metric. Bob Huggins (Cincinnati and West Virginia) and Tony Bennett (Washington State and Virginia) both have negative PAD scores, but they cannot blame the difficultly of their tournament draws. On balance, both coaches have been lucky.

In a part of the graph all by himself is Rick Barnes (Texas and Tennessee). Coach Barnes’ current PAD score of -6.67 is the worst of all NCAA Tournament coaches as of 2022. He cannot blame luck either, as his luck score is -0.02.

For the final comparison in today’s installment, Figure 2 compares the PAKE metric to the PAD metric, as calculated since 2002.

Figure 2: Comparison of the PAKE metric to the PAD metric since 2002.

Figure 2 shows that these two metrics are strongly correlated, which makes sense. Both metrics are attempting to measure the number of actual wins compared to the number of expected tournament wins.

PAKE measures expected tournament wins based on the victory probability derived from Kenpom efficiency data (which correlates very strongly to Las Vegas betting lines). The seeds of the teams do not factor in at all. This is likely the most accurate way to measure performance versus expectation, but the data set is limited.

PAD measures expected tournament wins based on the historical data correlating win probability to the combinations of seeds playing in each game. As I have shown previously, the results of this calculation also correlate strongly to historical Vegas lines.

Most of the data points in Figure 2 fall onto or near the trendline. What is interesting about Figure 2 are the coaches whose data deviates noticeably from the trendline. Coach Izzo, for example, has a higher PAD score than his PAKE score. Mark Few (Gonzaga) and Bo Ryan (Wisconsin) similarly appear above the trendline in Figure 2, while Boeheim, Williams and Self all fall below the line.

I interpret this deviation as related to the accuracy of the seeding by the Selection Committee. If a coach has a higher PAD than PAKE, that implies that the Kenpom data indicates that the coach has more expected wins than is implied based on the seed combinations. That coach’s team, on average, has been better than their seeds imply (and/or their opponents have on average been worse). In other words, on average, that coach has been historically under-seeded. Izzo, Few and Ryan fall into this category.

The opposite is also true, as Figure 2 suggests that Boeheim, Williams and Self on average have received a higher seed than they deserve.

One of the concepts that we touched on above is the idea that not all NCAA Tournament paths are equal. In the next installment of this series, we will dive into this topic in more detail.