Warning: math geekery ahead.
Last year I tracked the Bradley-Terry ratings for college football and basketball, plus a modified system I created to try to include margin of victory. The two biggest complaints (from others and myself) were:
- Unbeaten teams were predicted to have unrealistically high probabilities of beating anyone who was not unbeaten.
- Home field isn't accounted for.
The first issue is now handled by increasing the weight of the fictional game that is added to guard against zero and infinite ratings; given equivalent strength of schedule, an N-0 team is now a 5:3 favorite over an N-1 team instead of a 3:1 favorite.
As for how to deal with home field: I'm not a fan of the NCAA's RPI adjustment (where a road win or home loss counts as more than 1 win or loss for determining win percentage and road losses or home wins count as less than 1); it makes guaranteeing convergence of the ratings complicated and it just doesn't seem right intuitively. Better, I think, is to adjust the strength of schedule to account for the added difficulty of road games or reduced difficulty of home games. To that end, when calculating strength of schedule, a team's opponent's rating is multiplied by a factor H (the home-field advantage parameter) if the game was a road game or divided by H if it was a home game. All that remains now is to decide what the value of H should be.
The naive answer is to simply look at the record of home teams for the season and use the home team's win ratio. However, this doesn't account for the fact that power-conference teams play a lot of home games against weaker foes, artificially inflating the record of home teams. It seems the best check of the value of H is to calculate the ratings with different values of H and compare the expected wins by home teams to the actual wins. When these two are equal, we have the correct value. Using the 2010 football season data, home teams won 464 of 754 true home games (semi-home* games, such as us hosting FAU at Ford Field, and neutral-site games like the bowls are excluded). For different values of H, the expected home team win totals are:
- H = 1.0 (no advantage): 428.9
- H = 1.2 (if the teams are considered equal on a neutral field, the home team is a 54.5% favorite): 450.0
- H = 1.3 (56.5% favorite): 459.2
- H = 1.35 (57.4% favorite): 463.4
- H = 1.36 (57.6% favorite): 464.3
So for the no-margin version, to the nearest two decimal places the correct value of H is 1.36. What about the margin-aware ratings? We can apply the same logic, but to the "victory point" totals instead of pure wins (as a reminder, for football the victory points are assigned according to a logistic curve: a tie is 0.5, win by 7 is about 0.75, win by 14 is about 0.9, win by 21 is about 0.96, win by 35 is about 0.995). For 2010, the actual home victory point total is 458.4. Expected values:
- H = 1.0 (no advantage): 424.3
- H = 1.2 (59.5% favorite; the rating scale for margin-aware is compressed due to wins giving less than 1 victory point, thus the same multiplying factor has a larger impact on the probability): 447.8
- H = 1.3 (63.4% favorite): 458.1
- H = 1.31 (63.8% favorite): 459.0
To two decimal places, the right value is H = 1.30. This is equivalent to making the home team favored by an extra 3.41 points - which fits well with the common wisdom that home field is worth about a field goal. It's interesting to me that the home field adjustment, in terms of win probability, is so much larger for the margin-aware ratings; without taking a closer look at the data set I can't be sure why, but I suspect it's caused by road teams pulling off an unusually large number of close wins. Perhaps when there's enough data to judge by this year the home field factor will be closer together for the two systems.
Starting next week, I plan to track the conference race as I did last year with probabilities of winning the division and going to the Rose Bowl for each team.
*(Semi-home games are handled by giving an adjustment of sqrt(H) instead of H; the effect is exactly half as many points as a true home game for margin-aware. As a general rule, these games are substantially closer to one campus than the other and within easy single-day-trip distance of the closer school. LSU-Oregon in the JerryDome is sufficiently far from both campuses to qualify as neutral despite being obviously much closer to LSU; Boise State-Georgia in Atlanta or MSU-FAU in Detroit are semi-home. Teams that regularly play home games away from campus, such as Arkansas playing in Little Rock, still count those games as full home games.)