Introducing the EVTB Predictor: A Stat-Driven Ranking Model Trying to Make Sense of the Madness


By: Eric Eichelberger

Why bother making a computer ranking system? Aren’t there tons out there?

Rankings have always fascinated me. I used to check up on my favorite football and basketball teams weekly to see how the polls shifted and, normally, how my team got shafted. The BCS brought light to the era of computer ranking systems, and I have been following them for years, trying to understand what makes for a solid ranking. Last year, I undertook a project of making my own stat-based NCAA basketball ranking system. Honestly, the Hokies played a large part, because of all the preseason hype and regular season success they enjoyed, which motivated me to track their progress. I tweaked the ranking system throughout the beginning of the season, with the goal of figuring out a formula that would accurately predict the NCAA field of 68. For this season, the model will be doing the same, and from here forward will be named the EVTB Predictor.

Alright, you made a model, but was it any good?

Without tooting the EVTB Predictor’s horn too much, I was impressed with its accuracy the first year. The model correctly predicted 64 out of 68 teams, including 26 exact seeds, 24 within one spot, 10 within 2 seeds, 2 teams 3 places off, and 2 big misses at 4 seeds out of place. For the record, there were no judgement calls on seeding, it was just a straight by-the-number prediction.

OK, OK, the model was decent, so how can I learn more about it?

The plan for this series of articles will be to deliver a weekly update highlighting certain aspects:

  1. The model’s top 25, including big movers and explanations
  2. A focus on the ACC placements, movers and shakers, and comparisons to other ranking systems
  3. A weekly update of Virginia Tech’s standing
  4. Overall comments on how the model compares to other rankings (human and computer)

If you just want to enjoy that content and don’t want to get into the weeds of the numbers, feel free to check in the future articles. If you’re still reading, let’s get down and dirty with some behind-the-scenes aspects of the model.

I’m a huge nerd, please tell me more about how it works!

There are countless models out there you can read about that range from simple to complex. I have kept this model on the simple side for a few reasons: 1) simple models are easier to understand and track what makes a team move; 2) simple models tend to be nearly as accurate without the effort, computing power or complexity. We’re just talking basketball here people, not going to the friggin’ moon. The model boils down to 7 factors that are aggregated into a single number:

  1. Wins/losses (of course)
  2. How much “luck” your team has against its expected performance
  3. Credit for road wins
  4. Dings for home losses
  5. Strength of schedule
  6. Margin of victory
  7. End of season adjustments for Power 5/Mid-Major schools

Wins and losses: I can’t imagine a model that doesn’t take into account how much they won or lost. This should be pretty self explanatory, but the more a team wins, the better it will do in the model. It’s worth noting that the model does not use any prior year data, or preseason rankings. 

Luck: By luck I mean, how did your team fare against expectations. Said another way, if the team is winning a lot of close games, and has an inflated record against expectations, they are “lucky.” If a team has been losing close games that they should have won, they are “unlucky.” I know all the Boomers out there will say, “I want my team to perform great in close games and that’s a sign of a tough, gritty team.” Ok Boomer, but this is a computer model that is making a prediction, and correcting for this “luck” does matter. Let the committee take your feelings into account. 

Road wins and home losses: Similar to regular wins and losses, I think this is self-explanatory. With modern travel schedules (VT had to go halfway across the world to play), rabid crowds and the comfort of being on your own hardwood, home court advantage is a real thing. Therefore, teams get extra credit for overcoming that disadvantage, or dinged for not being able to rise to the occasion. Neutral games are just that, neutral, so no extra credit or punishment.

Strength of schedule: There are many ways to skin a cat, and even more ways to calculate strength of schedule. The model uses a version where it compares records of the team’s opponents and opponents of opponents against the average, then assigns credit for playing a more difficult schedule, and duducts for playing a soft schedule.

Margin of victory: This is probably the most controversial portion to add to a computer model, but here we are embracing the controversy. I prefer to include margin of victory, because it is a simple metric to capture the effectiveness of both the offense and defense in a number. Yes, you could have a terrible defense and sick offense and the model wouldn’t know, but on aggregate it is able to compare to other teams. Also, we may expand this model out to use for gambling predictions, and margin of victory plays a big part in that.

End of season adjustments: A reality of the NCAA Tournament, is that being from a power 5 school matters a ton for both seeding and being at-large. Remember the ultimate goal is to nail that field! As such, Power 5 teams enjoy the privilege of an end-of-the-year bump. I repeat… end of the year. There are no adjustments made, until the final whistle on selection Sunday is blown.

If you are all the way down here, thanks for being a fellow basketball nerd and I hope you check out our weekly updates. Please comment each week with your thoughts and any additional insight you may want or have.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.