The Pairwise ... Re-Revisited
A Fresh Look at the NCAA Tournament Process
by Adam Wodon/Managing Editor
So many people have asked me recently whether I think the Pairwise is a good system or not, and every time they do, I point them to the voluminous previous articles I've written on the topic.
But recently I realized that the last time I went into it in depth was 2008 — mainly because there hasn't really been anything new to write (beyond the usual annual analysis of why the seeds were as they were). Nothing has really changed since then, there haven't been any new controversies, the criteria is essentially the same, and we've all settled into an "it is what it is" mentality.
Which is both good and bad.
While most of it is covered in past articles, this is a good time to summarize the issues again. Anything that hasn't been done in five years is probably worth revisiting ... beginning with the existential question.
Is the Pairwise Good or Bad?
Ultimately, I believe the good far outweighs the bad. It's an objective system where everyone knows the criteria for qualifying for the NCAAs in advance. The various criteria used could be considered subjective, or somewhat arbitrary, but they are known in advance. And ultimately, the Pairwise does come up with a good field that no one really worries about once the NCAA Tournament games begin.
There are problems, though. The existence of a "TUC Cliff," for example (see below), is indicative of that. Also, the Ratings Percentage Index (the heart of the Pairwise), is not as strong of a ratings method as something like KRACH. They are somewhat close, but not always, with enough difference to impact a couple of teams per year. And there are always tweaks that could be made.
This is why you'll see me both defend and criticize the Pairwise, depending on the day and who's asking. I'll defend it to people ignorant of the system, or who exaggerate the problems; but I'll criticize it with people who know the system well.
Is the Pairwise "Standings List" Gospel?
Yes it is. Now. In practice. But not in definition.
Most people these days don't realize that the "Standings" as you see them published on CHN and elsewhere, is not always the way it was. In fact, it's safe to say committee members don't even know this, or care.
The system as defined in the Men's Ice Hockey Handbook, only describes the criteria used to compare the teams to each other. And it defines what a "Team Under Consideration" is. It does not mandate how exactly the criteria should be applied to order the teams for selection and seeding. Originally, the RPI was used to rank teams, and in the case of "close calls" or "bubble teams" the committee would apply the criteria among sets of teams to decide from that group.
A few different groups, most notably USCHO, decided to total up the amount of comparisons won by each team against all other teams and publish them as "Standings." This difference led to confusion among those inside the system, because they refuted that the systems were the same, even though they basically were.
Eventually, there was so much published, written and said about it, that somewhere along the way, around 2003, the committee decided to adopt the Pairwise Standings as is. The differences this creates are small enough so as not to worry about too much, but it can definitely make a difference for a team or two.
Should the System Be So Rigid?
Selections should be set in stone. The only issue is whether or not the old method — using criteria solely to break ties within groups — should be used. We're probably not going back to that, but it's worth knowing you could. Ultimately, there are pros and cons to each method.
With seeding, however, the Committee's approach is definitely too rigid. It's good to have some system to selecting teams, even if that system isn't perfectly precise — but the numbers are not precise enough to rely on them so rigidly when it comes to seeding the teams.
These days, the committee seems to be more rigid than ever. It has ordered the teams 1-16 and treated it like gospel. Again, I wouldn't want things to be willy nilly, but it's silly to treat the order as some sort of biblical writ, given their small sample sizes and obvious wild fluctuations based on a game or two here and there.
This is probably my biggest complaint, as evidenced by the near-annual rants I once did on the topic.
So, Should Human Judgment Come Into Play?
No. I'm not saying that. Certainly not for selection. But some leeway would be worthwhile when it comes to seedings. The committee has itself boxed into a corner due to a variety of rigid rules it uses to seed teams. Not only should those rules be relaxed, but I would not be against even letting the committee have a little fun, within reason, like the basketball committee does, when it clearly intentionally creates intriguing matchups in many cases.
But the basketball committee also makes judgment calls on selections, and subjectively factor in things like whether a team is "hot," or whether a player has just gotten hurt, or if a team was missing a key player for much of the season and now that player has returned.
Most hockey people agree that, as tantalizing and logical as that might sound on one level, it's not worth opening that can of worms. In other words, be careful what you wish for. I'd rather the system err on the side of rigidity than willy-nilly decisions.
Should We Use KRACH?
I've advocated for this many times in the past. In fact, in 2004, I co-authored an extensive piece with John Whelan on how exactly the Committee could incorporate the KRACH, either wholesale, or as components of other criteria. As a purely mathematical system, it's definitely better than the RPI/Pairwise.
But any momentum for this is long gone, so I've pretty much given up on the idea. We still endorse KRACH at CHN as a better ratings system, but it's been a while since I urged the Committee to incorporate it.
So Is There Any Better System?
Beyond the ones I've advocated for in the past, involving KRACH, then no.
While the Pairwise may have flaws on its fringes, ultimately no system with a 35-game sample size per team (approximately) is ever going to be able to get it completely "right," whatever that means. At least we know the criteria in advance, and no one can scream of conspiracies. Taking the human element out of it eliminates some major controversies that just aren't worth it, no matter how much fun that may be to some of the sadists among us.
I've yet to hear a better system (again, besides a KRACH-based one) that doesn't have many more flaws than the current system.
Why is Team X So Low/High in the Pairwise?
I've gotten a number of cards and letters lately wondering how such-and-such team could be so high in the conference standings, and so low in the Pairwise. Or, conversely, so high in the Pairwise and low in the conference. People forget — or never knew — that NCAA Tournament selections are not based on conference play whatsoever. Only your overall record matters, so if your non-league record is significantly better or worse than your league record, it might look odd. This concept seems to escape a lot of people.
I have no problem with this. I believe your season should be judged as a whole, and conference play specifically should have no bearing, besides being part of your overall record.
Should "Hot" Teams Be Rated Higher?
There used to be a "Last 20" (and then "Last 16") component to the Pairwise, but it was eliminated. It was not so much that the Committee didn't agree with the idea philosophically, but because the "Record in Last 16" was so skewed by strength of schedule.
I would not be against a "down the stretch" component to the Pairwise, but it would have to be normalized against your schedule strength — something that KRACH could do.
But, there is also an argument to be made that the season should be judged as a whole, period. I can see both sides to this argument, and I honestly don't care either way, so long as, if "down the stretch" is added back, it's normalized. It also should be codified, and not applied subjectively, as it is in basketball.
What about the "TUC Cliff?"
The "TUC Cliff": This is a moniker that arises from the committee's definition of "Team Under Consideration." Since the cutoff point for a team is when its RPI is .500 or better, teams can bounce above and below that line as the season goes on. Because "Record vs. TUC" is a component of the Pairwise, this leads to some interesting, and sometimes drastic, fluctuations in the Pairwise.
This wouldn't be evident if the Pairwise wasn't publicized, and it's not meant to be looked at until the season ends. However, it illustrates a flaw in the system. Why? Because a team could benefit by losing. To wit, in 2005, when Wisconsin defeated Alaska-Anchorage in Game 3 of their WCHA playoff series, it bumped UAA off the TUC Cliff, and Wisconsin suddenly got drastically worse in the Pairwise. From an NCAA standpoint, Wisconsin would have been better off losing the third game of the series. That's not the sign of a good system. See the complete article from 2005 for more detail.
The Regional System
We didn't even get into the structure of the NCAA Tourmament itself — four teams in each of four regions. That's another story for another day. But again, despite flaws here too, I like the current setup the way it is, though the more we can get away from teams hosting regionals in their own building, the better.
Why Do You Care So Much?
I've had people ask me this too. And I admit to being a math geek, and a Bill James disciple. But the human element of sports is by far the most interesting thing about it and always will be.
The most relevant answer to this question is — I care about the Pairwise system because that's the system that's used. Therefore, I want to know everything about it, so I can inform others, and so I can understand what's going on cocncerning the most important parts of the season.
I don't understand those who remain willfully ignorant of the system just because they don't like the geekiness of it — especially if you rip the system while doing so, or ridicule those who pay attention to it. Whether it's geeky is irrelevant — it's the system we have, and we owe it to ourselves to get it.