- by John Whelan
Q. What is KRACH?
A: KRACH stands for "Ken's Ratings for American College Hockey". It is the implementation of a sophisticated mathematical model known as the Bradley-Terry rating system, first applied to college hockey by a statistician named Ken Butler.
College Hockey News endorses KRACH as a way of determining the relative strength of one team to another. KRACH would replace the NCAA's standard Ratings Percentage Index (RPI), and the system of comparisons (commonly known as the PairWise Rankings) that is used to select and seed the NCAA tournament.
The KRACH listings can be found here.
While the model is sophisticated, and needs a computer to calculate, its essential meaning is actually quite simple.
Q. Can you tell us a little more?
A: Getting a bit more technical: The Bradley-Terry system is based on a statistical technique called logistic regression, in essence meaning that teams' ratings are determined directly from their won-loss records against one another. KRACH's strength of schedule component is calculated directly from the ratings themselves, which is a key point. It means that KRACH, unlike many ratings (including RPI), cannot easily be distorted by teams with strong records against weak opposition.
The ratings are on an odds scale, so if Team A's KRACH rating is three times as large as Team B's, Team A would be expected to amass a winning percentage of .750 and Team B a winning percentage of .250 if it played each other enough times. The correct ratings are defined such that the "expected" winning percentage for a team in the games it's already played is equal to its "actual" winning percentage.
Q. And so why is this so great?
A: In other words, if you took one team's schedule to date, and played a theoretical "game" for each game already actually played, using the KRACH ratings themselves in order to predict the winner, then the end result would be a theoretical won-loss percentage that matches the team's actual won-loss percentage. Pretty cool.
It is not possible to do any better than that with a completely objective method. Any other method would introduce arbitrary-ness and/or subjectivity.
Q. What are the limitations?
A: Well, KRACH can't predict the future. Nothing can. The idea behind such ratings systems is to use them in order to properly select and seed tournaments. Champions are then determined on the ice. All systems are designed to analyze past results, not necessarily predict future ones. Though, by theory, the more sound the analysis of the past, the better the ability to predict future results.
KRACH is "perfect" in its analysis of past results. But that should not be construed to mean that it definitively decides which team is better. When dealing with sample sizes like this, you never know. Team A could lose to Team B, be below them in KRACH, and then turn around and beat Team B the next three times. KRACH would then change. It does not invalidate what KRACH represented at the time, however.
Q. What is RPI?
A: RPI is a method created by the NCAA in the late '70s to help factor in a team's strength of schedule to its winning percentage. RPI is calculated by factoring your winning percentage 25 percent; your opponent's winning percentage 24 percent; your opponent's opponents winning percentage 51 percent (25-24-51). "Opponent's winning percentage" is the average winning percentage of each opponent -- not the total winning percentage based on the sum of all wins, losses and ties. (Formula changed from 25-50-25 in 2006-07. Another past formulation was 35-50-15.)
Q: Why is something like RPI or KRACH needed to begin with?
A: Because not everyone plays everyone else the same number of times. Comparing the winning percentages of teams playing vastly different schedules is unfair, so the NCAA developed the RPI.
Q: What is the problem with RPI?
A: Unfortunately, RPI fails to work as designed sometimes. If a team plays a weak opponent and wins, their RPI can still go down because the reduced strength of schedule hurts more than the improved winning percentage helps. This should not happen.
The NCAA tried to address this by increasing the weight of winning percentage in the RPI formula for hockey to 35 percent, but this just brought out RPI's other flaw: teams which play very weak schedules can inflate their RPIs by racking up impressive winning percentages. Because the strength of schedule component to RPI is influenced by winning percentage (KRACH does it differently), if all of a team's weak opponents play each other a lot, they can maintain a respectable winning percentage and make that team's schedule look stronger than it is. (This was the problem with the MAAC and CHA). In response to this, the NCAA went back to the original formula for the hockey RPI, but now teams are once more dropping in the RPI when they beat weak opponents.
Q: So what's the real problem with RPI?
A: The problems of RPI may seem like too much or too little weight is being given to strength of schedule -- find the right weight, and all will be good. But the real problem is that the definition of strength of schedule is completely inadequate. Adding the strength of schedule components (50-25) to the winning percentage (25) is the wrong thing to do.
Q: How is KRACH better?
A: KRACH does what RPI is designed to do -- combine a team's won-lost records with their strength of schedule -- but it does a better job of it.
While RPI is winning percentage plus strength of schedule, KRACH is winning ratio times strength of schedule. (A more complete methodology is in the section below).
Q: What about all of the other PairWise components besides RPI? Are they still needed?
A: You could theoretically take each PairWise component -- record in Last 16 games, record vs. common opponents, head-to-head record, record vs. other Teams Under Consideration -- and "KRACH-ify" them. In other words, use KRACH's strength of schedule method to modify those criteria.
But straight KRACH is much simpler -- a simple list of all the teams, ranked in order. This has the effect of eliminating some ambiguities in the comparison system, which is not transitive. For example, if Team A beats Team B in a head-to-head comparison, and Team B beats Team C ... that does not necessarily mean Team A beats Team C. This kind of issue leads to complications.
As a result, straight KRACH is preferred.
Q: But some of those other PairWise components are nice to have, aren't they?
A: A straight KRACH does overlook such vague concepts as "being hot down the stretch" (record in Last X games), and "doing well against the better teams" (record vs. Teams Under Consideration). But those distinctions are dubious anyway.
Q: The committee recently added a new component to the selection process, intended to help compensate for teams forced to play a large amount of non-conference road games. How does KRACH help this?
A: KRACH, like the undoctored RPI, doesn't take home ice effects into account when assessing the strength of a team's schedule. However, the KRACH methodology can be used to factor in the home-ice advantage. As a result, you can then level the playing field in a much more sophisticated way than the highly arbitrary "good win" system used by the NCAA.
Q. What is the simplest definition of a team's KRACH rating?
A. Winning ratio times strength of schedule. Winning ratio is a cousin of winning percentage; instead of wins divided by games played, it's wins divided by losses. Strength of schedule is a weighted average of your opponents' KRACH ratings.
Q. You talk about wins and losses. What about ties?
A. Ties count as half a win and half a loss, just like in winning percentage, RPI, or any other rating. Wherever you see "number of wins" you should think "number of wins plus one-half number of ties" and similarly for losses.
Q. A weighted average? What's that?
A. For each opponent, there is a weighting factor. Multiply the KRACH rating by the weighting factor. Add those up, and divide it by the sum of all the weighting factors.
Q. So what is the weighting factor?
A. It's the number of times you played that opponent, divided by the sum of your KRACH rating and that opponent's KRACH rating.
Q. Why did you choose that weighting factor? It seems like you're undercounting the opponents with high KRACH ratings.
A. Because what's important with KRACH ratings is the ratios between them (one divided by the other), not the differences (one minus the other). Suppose I play one game each against teams with KRACH ratings of 50 and 200, and split. I should have a KRACH rating of 100, which is twice as good as 50 but half as good as 200. My winning ratio is 1 (same number of wins and losses), so whatever my strength of schedule is, that should be my KRACH rating. If I just averaged 50 and 200, I would get 125, which is more than twice as good as 50 and less than half as good as 200. But the weighting factors are chosen in just the right way that the weighted average is indeed 100.
For those who want to see the math: The weighting factor is 1/(100+50)=1/150=2/300 for the first team and 1/(100+200)=1/300 for the second team. So the weighted sum is 50*(2/300)+200*(1/300)=(100+200)/300=300/300=1. We need to divide this by the sum of the weighting factors, which is 2/300+1/300=3/300=1/100. 1 divided by 1/100 is 100.
The other reason we chose this weighting factor has to do with interpreting the ratings. Pick any team, and the ratio of your KRACH rating to the other team's will be be the winning ratio you'd be expected to rack up if you played them a bunch of times. So if your KRACH is 200, and you played a team with a KRACH of 100, you'd be expected to win twice as many games as you lost. The definition of KRACH ensures that if you use this formula to see how many games you'd be expected to win, given your actual schedule, it will be exactly the number of games you actually won. Thinking about the team with a KRACH of 100 that played teams rated at 50 and 200, they'd be expected to win 2/3 of the games they played against the weaker team and 1/3 of the games they played against the stronger team. So if they played them each once, the expected number of wins would be 2/3+1/3=1, and sure enough we saw above that a .500 record against those teams corresponded to a 100 rating. (Of course, they won't actually have won 2/3 or 1/3 of a game; the total is what will match up.)
Q. It sounds like you need to know everyone's KRACH ratings before you can calculate anyone's. Isn't the definition circular? How can you calculate anything?
A. The technical term is called recursive, and we can solve most recursive equations by a technique called iteration. Start off with an educated guess for everyone's rating (like they're all 100, or all 100 times the team's winning ratio), then calculate the ratings that come out of the formula, plugging in your guesses. If you had guessed the right answer, all the ratings coming out of the formula would match those going in. (In the real world, they won't, but they'll be closer to what you're looking for.) Now take those output ratings, and use them as a new set of guesses; plug them into the formula and see what comes out. You repeat this process, using the ratings calculated from one set of guesses as the next set of guesses. Eventually, the numbers coming out will be very close to the numbers going in, differentiating only in, say the fifth decimal place. If you only want to quote the ratings to four decimal places, you can stop there, since you were going to round off what was in the fifth decimal place any way.
Q. Does this always work?
A. In practice, yes. You know that game you play where you say the last place team in the weakest conference once beat the fifth place team, who beat the first place team, who tied the third place team in another conference, and so on until you've "proven" that the weakest team in the country is better than the national champion? As long as you can make such a chain of wins (and/or ties) from any team to any other team, the iteration we described will give you an answer. In any reasonable college hockey season, that condition is satisfied by around December. (There's still a way to define KRACH, or at least RRWP, even in weird cases, but it's more complicated, and almost certainly irrelevant.)
Q. And will it give the same answer no matter what guesses we start with?
A. It will always give the same ranking (as long as the chain-of-wins condition from the last answer is satisfied). The only difference that can arise is that all of the rankings might be multiplied by the same number. So one guess might make everyone's KRACH be three times what another guess does. Since it's the ratios that are meaningful, this ambiguity doesn't matter, but we get rid of it anyway by requiring that a completely typical team, one which would be expected to go .500 if it played every other team the same number of times, has a KRACH of 100.
Q. What is RRWP? What is the difference between KRACH and RRWP?
A. The RRWP, or Round-Robin Winning Percentage, is the winning percentage a team would be expected to accumulate, if they played a completely balanced schedule, i.e., if the NCAA held a round-robin tournament with all the teams in one big group. It's calculated from the KRACH, using the interpretation that the winning ratio you'd be expected to run up against a team is given by your KRACH rating divided by theirs. If you calculated the KRACH ratings for a league playing a balanced schedule, each team's RRWP at the end of the season would equal their actual winning percentage.
An alternative definition of a team's KRACH rating is as the product of its Winning Ratio (winning percentage divided by one minus winning percentage) with the weighted average of its opponents' KRACH ratings. (The definition of the weighting factor makes this equivalent to the first definition of the KRACH ratings.) In addition to KRACH and RRWP, the KRACH table lists each team's Winning Percentage, Winning Ratio and Strength of Schedule (the aforementioned weighted average of their opponents' KRACH ratings).
Q. Where does Bradley-Terry/KRACH come from?
A. The rating method was first invented in 1929 by a German named Zermelo to evaluate the results of a chess tournament in which a full round-robin was not completed. In 1952 a pair of Americans, Bradley and Terry, unaware of Zermelo's work, rediscovered the method while trying to model the outcomes of taste tests, and this rating system came to be called the Bradley-Terry method. In the 1990s, an English statistician named Kenneth Butler, studying in Canada, decided to apply the Bradley-Terry method to US college hockey, and when prodded for a name chose Ken's Ratings for American College Hockey.
Q. Where can I read further explanations of KRACH?
A. Ken Butler's original KRACH explanation page (with examples) is still on the web. Note that the "fictitious games" he described are no longer part of the KRACH ratings and are not used in CHN's calculations. John Whelan has a more detailed mathematical analysis of KRACH.
©2013 College Hockey News.