With football season starting to wrap up, it is time to start to turn my mathematical focus to the hardwood. While I have developed my own methods for predicting the point spreads of football games, I believe that the efficiency-based methods used and refined by Ken Pomeroy are the current gold standard in basketball analytics. Over the past few years, I have developed my own methodology using Kenpom data to try to answer interesting questions regarding college basketball.
As this weekend marks the beginning of Big Ten basketball, it seems like a good time to take a closer look at what the current Kenpom data can tell about which team(s) might be hoisting a Big Ten championship banner at the close of the regular season.
I use the following method. The current Kenpom efficiency margins can be used to project point spreads for all future Big Ten games. These point spreads can be used to generate the probability that either team will win any given future contest. I actually use a different formula than Kenpom does, but they give the same result to within a percentage point, from what I can tell. (I use the Normal / Gaussian distribution while Kenpom seems to use the “ELO” or “Log5” method which uses the simplified Logistic Distribution. Based on my analysis of the data, my method is more accurate based on historical data, thank you very much.)
Once you have the probabilities of series of events, it is possible to set up a “Monte Carlo” simulation. This is essentially a very large set of weighted coin flips, using a random number generator as the coin. Basically, I can use this “coin” to simulate the entire Big Ten season, based on the current Kenpom ratings of all 14 Big Ten teams and the projected point spreads of all 140 Big Ten conference games. Using a simple Excel macro, I can run this simulation 120,000 times and generate a good set of statistics on the likelihood that each team will win “x” games and the likelihood that each team will at least tie for first place.
With these tools in place, I pulled the Kenpom data from late this week before any of the Big Ten games were played and simulated the season. The key outputs of this simulation are the expected value of conference wins for each team as well as the win probability matrix. That first set of data is shown here:
As of Thursday night (12/05), Ohio State, by virtue of their big win over UNC in the ACC-Big Ten Challenge, is currently the highest ranked Big Ten team, and as a result has the highest projected expected win total of 15.16. This translates an a most probable record of 15-5. MSU is currently sitting at #4 with a slightly lower expected value of 14.39., which translates to record of 14-6.
While this gives us some information about what to expect, it is more useful to look at the probabilities that each team will actually finish at least in a tie for first place. Fortunately, my simulation gives me that data as well, which is summarized below:
In this case, the probability matrix gives the odds that each team will win/tie for first with the number of wins given in the column heading. For example, there is a 0.7% chance that MSU wins the Big Ten by going 19-1, a 3% chance MSU wins with a record of 18-2, and about a 8% chance MSU wins with a record of 17-3, and so on. Note that the win totals here are higher than the win totals shown in the previous table. This simply implies the obvious point that in order to win a Championship, you usually need at least a little bit of luck, or at the very least be able to “beat the odds.” It is necessary to be able to win more toss-up games than you lose.
Based on the current Kenpom rankings, Ohio State has basically 50-50 odds to at least tie, MSU’s odds are about 1 in 3, Purdue’s odds are 23%, Maryland’s are 15%, and Michigan’s are less than 5%.
Baked into this analysis is the idea that each team will continue to play at exactly the same level for the next 2 and a half months, and that the current Kenpom ratinngs are an accurate snapshot of how good each team actually is and will continue to be. We know that this is not true, but it is the best set of tools that we have short of a crystal ball.
In the face of this uncertainty, I find that it is interesting to take a look at the overall schedule to see which teams may be at a specific advantage or disadvantage. In general, the calculation of “strength of schedule” is a tricky one, but I have a couple of methods that I believe make sense.
The first thing to do is simply to try to visualize the Big Ten schedule. Even with a 20-game season, it is impossible for each Big Ten team to play a full double round robin, and as such the schedule is unbalanced. The matrix below shows the double and single play games for all 14 Big Ten teams.
At first glance, it is interesting that Ohio State, MSU, and Purdue each only play each other once. That would seem to be a major advantage for all three teams. In contrast, the next two highest ranked teams, Maryland and Michigan play each of those teams twice with the exception of Maryland and Purdue. That seems significant, but how do we quantify that?
The method that I have developed is to use the Kenpom data to simulate each team’s season using the same assumed strength of the team in question. In this case, I assumed that each team had a Kenpom efficiency equal to that of a middle-of-the-pack Big Ten team, which for this year’s analysis is Penn State. I can run my simulation to calculate the expected number of wins each team would have if they are all only as good as the Nittany Lions. When I do this, I can generate the following graph of normalized expected conference wins:
Based on this data, Purdue is the team with the easiest Big Ten road. Looking again at the single play matrix shows why, as 4 of their 6 single play opponents are among the Top 6 other teams in the conference. In the next tier are Ohio State, MSU, and Iowa, who are about 0.3 of a game back. The schedule strength gets increasingly harder as we move to the right, with teams like Minnesota, Wisconsin, Illinois, and Northwestern almost a full game back in expected value. I should also note that based on this analysis, MSU has a roughly 0.7-game advantage on Michigan right out of the gate.
While this analysis is pretty good, in my humble opinion, there is one thing that bothers me a bit. That is the fact that the expected win values above clearly correlate to the strength of each team. On one hand, this makes sense. After all, MSU has an easier schedule than Northwestern, in part because MSU does not have to play a team as good as MSU and Northwestern doesn’t get the benefit of playing a team as bad as Northwestern. So, I think the chart above is a pretty good reflection of reality.
That said, I was still curious if I could correct for this “bias,” and I think that I found a way to do that. My strategy is to run the same simulation as before where each team takes on the strength of Penn State only I also adjust Penn State’s rating to be equal to the team in question. For example, when making the calculation for MSU, I map Penn State’s Kenpom efficiency onto MSU, but I also map MSU’s efficiency onto Penn State. In this way, the overall strength of the conference is not affected. When I perform this analysis, the results are no longer correlated to the current Kenpom efficiencies.
Those results of this final calculation are shown below:
This plot perhaps gives us a feel for how the strength of schedule might change based on the uncertainly and potential inaccuracy of the current Kenpom efficiencies. There is no major change in the relative ratings. In this case, Purdue and Iowa still have 2 of the easiest schedule in the conference, while it is clear that Rutgers and Nebraska would also have a pretty good schedule if they weren’t so bad.
MSU and Ohio State sit at 5th and 6th by this measure, roughly a half game behind Iowa and Purdue. Meanwhile, no matter how you slice it, Michigan, Wisconsin, and Illinois seem to have the toughest schedules overall with almost a full game disadvantage based on this analysis.
That all said, there is nothing in this analysis that can account for things such as leg stress reactions, personal tragedies, or road games on short rest. Ultimately, probability is not destiny. But, I think this analysis gives us an initial look at what to expect. My goal is to continue to update some of this data throughout the season.
That is all for now. Until next time, enjoy, and Go Green.