There are a lot of things that make for a great college basketball game: the players, the coaches, the fans, the mascots, the venues...each one bring something different and exciting to the table. But, there is one part of the game that seems to bring out the ire of fans on all sides: the officials.
Obviously, officials are a necessary part of the game. But, every fan can likely think of a game where the officials seemed to “take over” and effectively became the story. Every fan can likely think of one call that they believe single-handedly cost their team a game (most Michigan State fans would likely point to the non-call on Draymond Green in the 2010 Final Four).
At the end of the day, officials are human beings trying to do a job to the best of their ability. If they do their job perfectly, they likely won’t get noticed at all. It is only when they are at the center of a controversy that we learn their names. Every generation of Spartans fans likely knows the names of a few officials that they feel might be either biased against their team or simply bad at their craft.
In past years, names like Ed Hightower and Ted Valentine used to raise my blood pressure slightly. These days, the name Bo Boroski is one that most Spartans know. But, were/are these officials actually biased against Michigan State? If they were, is there a way that we could tell?
As for the first question, it simply is not reasonable to suggest that any official out there simply has it out for MSU, Tom Izzo, or any other coach or program. These men and women are professionals and it is a bit paranoid to believe that they are “out to get us.”
That said, no person is truly unbiased. As hard as any single person (including officials) try to be fair in all situations, it is literally impossible to separate a lifetime of experiences that weighs on one’s soul when one is trying to decide in a split second if the play was a charge or a blocking foul.
Maybe the kid taking the charge reminds the official of the son of one of his friends. Maybe the player called for the block was mouthing off in the last huddle. Maybe, the ref once had a bully in middle school that wore a University of Michigan hat. Maybe, the ref’s wife cheated on him with a Kentucky grad. OK, some of those are unlikely, but who knows? Unconscious bias will creep into the brains of all of us imperfect human beings. Officials are no different, no matter how they try.
But if at least some level of bias (unconscious or not) exists in the officiating of college basketball, how could it be detected? This is a problem that I have thought about for a while, and I think that I found a way to potentially measure it.
How to grade an official
I decided to approach this analysis just like I do any other sports-related analysis that I conduct (and write about here): I treated it like a scientific experiment. One thing about science is that individual observations are basically useless. So, looking at individual plays, fouls, or games would be meaningless. What I needed was as large of a set of consistent data as possible.
Instead of focusing on something like the total number of fouls called or free throw rates, I settled on two related metrics: performance against the spread (ATS) and wins or losses relative to expectation (R2E). Performance ATS is more straightforward. The folks in Vegas and the betting community are really good, on average, at getting the line to a place where the favored team will win almost exactly 50 percent of the time.
Good teams tend to beat the spread a little more often than not (because they are good) but deviations of more than a few percentage points are extremely rare. In the database that I assembled for this project, which extends back to the 2005-2006 season, Michigan State is 267-245-9 (51.2 percent) against the final Vegas line.
Performance relative to expectation is a little more complicated. The idea here is to measure actual wins and losses, but in a weighted fashion. I have written extensively about the way to correlate the Vegas line to the odds that a favored team will win or lose. I once again leaned on this data here.
To give an example, let’s say that Michigan State plays a total of 20 games. In the non-conference season, MSU is favored by 13 points in the first 10 games. Then, Big Ten play begins, and the Spartans are only favored by 2.5 points in each of the next 10 games.
According to the historical correlations and in this example, the Spartans would have a 90 percent chance of winning each of their first 10 games, but only a 60 percent chance of each of the second group of games. So, the most likely outcome would be for MSU to win nine of the first 10 games (90 percent) and go 6-4 (60 percent) in the second batch of games.
If I use the actual spread data for all of the Michigan State games in the database back to 2006, I can calculate the expected number of wins and that can be compared to MSU’s actual performance in these contests. Over that span, I calculate that MSU was expected to have a record of 364.5 wins and 156.5 losses (70.0 percent). In reality, MSU went 371-150 (71.2 percent) or just slightly better than expected. MSU won about six-and-a-half more games (out of 521) than the spread predicted.
Just as Michigan State’s overall performance can be measured ATS and against expectation, MSU’s performance involving any individual referee or combination of referees can also be measured ATS and against expectation.
As an example, consider the case of the official Kelly Pfeifer. Pfeifer has officiated a total of 22 Michigan State games since 2013. In those 22 games, MSU was expected to have a record of 16.6-5.4. In reality, the Spartans went 17-5 in those games and 11-11 against the spread.
In other words, in games where Kelly was an official, MSU won +0.4 more games than expected and was -0.3 wins ATS (this number is slightly negative since MSU overall is slightly over .500 against the spread). This is about as close as “unbiased” as possible based on this analysis. I applied this same methodology to every official who has worked a Michigan State game since 2006.
While Pfeifer seems perfectly neutral when it comes to MSU, not all officials are. As we will see, MSU tends to under-perform in the presence of some officials, and over-perform in the presence of others. But, how can we tell if this performance is “suspicious” or not?
In this case, I applied the statistical principle of the binomial test to each measurement. Briefly, the binomial test reveals the odds that a specific result would happen just by chance. Historically, these odds have to be below five percent to be considered “significant.” For the purpose of this analysis, I also note any observations with odds below 20 percent as “notable.”
OK, Let’s Look at the Data
With that introduction, it is time to look at the results. For context, a total of 159 different referees have officiated a Michigan State game since 2006. However, over 100 of those officials have worked fewer than seven games total. While I made the relevant calculations for all 159 refs, I will mainly focus on the officials who have worked at least 20 games in the past 16 years, which is only 23 total.
Also, I will mention here that it is pretty common for fans to refer to specific “officiating crews.” While each game has three total officials, those officials do not work in consistent groups. It is rare for a pair of officials to work a large number of games together and even more rare to see an identical “crew.” This study focuses primarily on individual officials.
Figure 1 below summarizes Michigan State’s performance, both against the spread (on the y-axis) and relative to expectation (on the x-axis) when each of the most frequent 23 officials are working. The size of the data point scales with the total number of games.
In addition, I have color-coded the data points to reflect the statistical significance for the data points that are straying from the average. Yellow and light green data points are between a five percent and 20 percent confidence level (“notable”). If a measurement falls below five percent into the truly statistically significant zone, I will shaded those data point red (if it is negative for MSU) or dark green (if it is positive).
As Figure 1 shows, there are definitely officials who are either a net negative or a net positive with respect to actual Michigan State wins and losses and relative to the spread. However, none of these results rise to the level of statistical significance.
In total, there are five officials who historically seem to be trouble for MSU, i.e. the yellow data points. When it comes to wins and losses, Bo Boroski has been a net negative (-2.5 wins relative to expectation out of 60 games), but this is not very significant. Note that for some of the officials highlighted in this study, I have included a link to the full data table of MSU’s results on their watch. Click on the official’s name to view it.
In the past 16 years, there are three officials who have been worse for Michigan State than Boroski: Ed Hightower (-3.5 wins over 40 games, and who retired in 2014), Pat Driscoll (-3.6 wins over 35 games), and the most significant, Terry Wymer (-4.8 wins over 65 games, which also makes him the official who has worked the most MSU games over this span). The odds that MSU’s poor performance on Wymer’s watch is only due to chance is just 13 percent.
The two other officials whose presence is correlated to below average performance on the court are Jim Burr and D.J. Carstensen, but for a different reason. In these two cases, it is not the wins and loses that are notable, it is MSU’s performance against the spread, which is -3.9 wins ATS for Burr (out of 29 games, ending in 2014) and a shocking -8.2 games (out of 57) for Carstensen.
MSU is 21-33-3 against the closing spread (37 percent) in games officiated by Carstensen. The odds of that percentage being so low is only 11 percent, which is certainly notable.
That said, there is also another side to this coin. Figure 1 also shows a total of three officials who seem to have a positive impact on MSU’s results: Robert Riley, Bill Ek, and especially Terry Oglesby. Unlike the yellow data points, the light green ones tend to fall closer to a diagonal line from the bottom left to the upper right. In other words, these officials seems to have a positive effect on MSU ability to both wins and to beat the spread.
As for Riley, Michigan State is 18-11 ATS in the games in which he has officiated. As for Ek, MSU is 14-7 ATS, which is also high, but more notably, MSU is 21-0 straight up with Ek on the court. While that sounds pretty bad, MSU has been heavily favored is most of those games. The spread was double-digits in 14 of the 21 games, it was only below seven point three times, and he had never officiated a game where MSU was the underdog.
As for Oglesby, the Spartans are 26-13-1 ATS in the games that he has worked, which is just barely outside of the threshold for statistical significance. Also notable is that MSU has won in three of the five total games where the Spartans were the underdogs, including the win over Kansas in the 2015 Champions Classic, and the road win over Michigan in 2019. Perhaps the title of this piece should actually be, does Terry Oglesby love Michigan State?
In order to provide a little more context, I wanted to see if there was any noticeable change in the data if home games and road games were separated. Figure 2 below shows the wins relative to expectation data for the same group of officials. This time, however, I have plotted the data for home games only on the x-axis and road games only on the y-axis.
With this data, we get a little more interesting information. In Figure 2, we can see that Terry Wymer has been very tough on the Spartans in road games. Michigan State is just 8-18 on his watch away from the Breslin Center, including seven upset losses in 13 games where MSU was favored. This result is statistically significant.
Interestingly, Bo Boroski has also entered the “yellow notable zone” for Michigan State road games. That said, MSU has actually won a bit over one game more than expected in the Breslin when Boroski was in the house. In contrast, Pat Driscoll seems to be tough on the Spartans at home, but is a net positive for the Spartans on the road.
Figure 2 also reveals that official Larry Scirotto has had a strongly positive impact on MSU wins at Breslin, but is neutral to slightly negative in road games.
In regard to results against the spread, Figure 3 separates the data based on home and road games as well.
Here we can see that D.J. Carstensen’s negative impact on MSU’s performance against the spread is consistent, regardless of venue, but at a fairly low level of statistical significance. The trends for Pat Driscoll and Bo Boroski are similar here as they were for Figure 3.
MSU actually does really well ATS the spread in home games where Boroski officiates and really poorly ATS when they see him on the road. Both of these deviations are statistically significant. In contrast, Driscoll grades out as tough on the home teams.
There are a group of five additional officials who seem to have a positive impact on Michigan State’s performance ATS at the Breslin Center. Ek and Riley are once again in this category, and they are joined by Lamont Simpson and Donnie Eppley. That said, Eppley’s profile (easy on the home team) resembles Boroski’s profile, just with a smaller sample size. Finally, Terry Oglesby’s impact, while generally positive, also seems to mostly occur at the Breslin Center. MSU is 15-5 ATS, which is statistically significant.
Other Odds and Ends
The data above focuses just on single officials who have worked more than at least 20 Michigan State games. That said, there are a few other notable results from the analysis that I will briefly comment on here in lightning round form:
- When it comes to pairs of officials, the Wymer/Valentine combination was not great for MSU. The Spartans were 5-8 (-3.2 R2E) and 3-10 ATS all time.
- The Boroski/Scirotto combination is notably bad for MSU on the road. The Spartans are 1-4 straight up (SU) and 0-5 ATS.
- Wymer teamed up with Lamont Simpson has not been good for MSU straight up. The Spartans are 2-5 (-2.9 R2E) with this pair.
- Officials with fewer games and a notably negative impact on MSU wins, relative to expectation (R2E) are: Donnee Gray (7-10 SU, -3.3 R2E, from 2006-2009), Reggie Greenwood (2-5 SU, -2.3 R2E, from 2006-2009), and Antonio Petty (3-3 SU, -1.9 R2E, from 2008-2012). MSU also did poorly ATS with these three officials.
- Officials with fewer games and a notably positive impact on MSU wins, relative to expectation: Keith Kimble (12-2 straight up, +2.6 R2E, active), Chris Beaver (14-0, +1.7 R2E, active), and Earl Walton (11-0, +2.1 R2E, active).
- Finally, there are two addition officials who post notably positive results ATS for MSU: Tom Eades (12-4 ATS) and Mark Whitehead (10-3 ATS).
In summary, there are clearly some officials whose presence correlates to both above and below average performance by the Spartans. Does this mean that unconscious bias is the cause? Not necessarily. While it is certainly possible, most of the results shown above are not statistically significant. Even when it is, correlation does not imply causation.
That said, the results are what they are. The fact remains that, historically, Michigan State’s performance has been either better or worse than average when some officials are working. Whether these trends continue into the future remains to be seen.
That is all for today. As always, enjoy, and Go Green.