Predicting Upsets in the NCAA Tournament with Machine Learning

Published in

Towards Data Science

8 min readMar 9, 2018

Update: Check out this follow-up where I discuss 3 additional upset signals.

The madness is nearly upon us! The annual men’s college basketball tournament begins on March 15, and soon millions of fans will be filling out their brackets.

For most of us, upsets are the best part of the tournament. Few fan experiences in sports beat the excitement of seeing an upset unfold in real-time and the pure joy of the underdog squad when they pull off the victory:

The only thing better than watching crazy upsets go down is nailing the upset picks in your bracket. That’s easier said than done: in a recent national online bracket contest, the average upset was picked by only 19% of brackets. Upsets also “bust brackets” by sending some highly-seeded favorites home early. Case-in-point: in 2016 62% of ESPN Challenge brackets had Michigan State in the Final Four, and the Spartans had the 2nd-most picks to win the championship. But madness ensued: Tom Izzo’s squad was eliminated by 15-seed Middle Tennessee State in the first round.

Picking upsets correctly can distinguish your bracket and give you a competitive edge in your pool. I sought out to beat the odds by using data and machine learning to predict upsets in the NCAA Tournament.

Defining upsets using team seeds

Each year the 64 eligible teams are slotted into four groups of 16 (“regions”) and each team is assigned a “seed”, a ranking from 1 (best) to 16 (worst), like in the 2016 Midwest region to the left. One team is technically favored in almost every game, but people aren’t shocked by a 9-seed beating an 8-seed, or even a 10-seed beating a 7-seed. These match-ups are basically “toss-ups”. I’m interesting in predicting upsets that are more shocking and unexpected.

I define an upset as a victory by a team seeded at least 4 slots lower than its opponent, such as a 1-seed losing to a 5-seed or lower, a 2-seed losing to a 6-seed or lower, etc. Because the tournament uses the same structure every year, the seed match-ups are a quick, easy way to identify upsets and study them.

Data Preparation and Exploration

Data

I obtained data on team and player performance for the regular season and tournament games for each tournament team since 2003. The data includes 584 tournament games that had “upset-potential”, meaning the teams had a seed differential of at least 4.

Feature engineering

I created a dataset of 82 characteristics for each team including team measures of season-long performance, team efficiency metrics, coach tourney experience/success, and team travel. Because the goal is prediction, all features are characteristics available prior to each tournament game. Then I did some data exploration to identify trends that might predict upsets.

Upsets and seed match-ups

The plot below shows the yearly count of upsets, along with the yearly mean. Don’t go crazy picking too many upsets; on average only 9 occur each year.

On average only 9 upsets occur in each tournament

Which match-ups most likely result in an upset? It turns out that 14 common seed pairings that occur in rounds 1 and 2 account for 85% of all “upset-potential” games. These match-ups are shown below, along with a heat map of their “upset rates” (the proportion of games that result in an upset).

Heat map of upset rates for the most common seed pairings

The plot identifies some good upset candidates based on seed match-ups:

In Round 1, focus on 12 vs 5 and 11 vs 6. The 12-seeds and 11-seeds win these games around 40% of the time.
If you’re feeling bold, take one 13-seed over a 4-seed in Round 1. The 13-seeds win 20% of these games.
The 2-seeds rarely lose in Round 1 (7%) but in Round 2 they are surprisingly vulnerable to upsets by 7-seeds (33%) and 10-seeds (37%).
Those 10-seeds and 7-seeds are much better candidates for a Sweet-16 run than 8 and 9-seeds, who rarely beat 1-seeds (12–13%) in Round 2.
Consider putting an 11-seed in the Sweet 16: the 11-seeds win one-third of their match-ups with 3-seeds in Round 2.

Spotting Cinderallas and Underachievers

Beyond the seed match-ups, what are the characteristics of underdogs that win, and favorites that lose?

Team efficiency margin

From Ken Pomeroy’s excellent advanced basketball analytics website, a team’s adjusted efficiency margin is a single number that indicates overall strength of play. In essence, it represents a team’s expected margin of victory over an average opponent, given 100 possessions. Below I plot the efficiency margin for the underdog and favorite for each game in my dataset, with the data point showing whether the game resulted in an upset or not.

Upsets involve underdogs with higher efficiency and favorites with lower efficiency

Most upsets occur when underdogs with margins over 10 play favorites with margins under 25. In fact, games meeting these criteria result in an upset 35% of the time, compared to 13% in games that don’t. As a simple rule of thumb for picking upsets, you could certainly do much worse!

Earning extra possessions — rebounds and turnovers

In basketball, teams exchange possession of the ball after missed shots or scores, but teams can also earn “extra” possessions, by rebounding their own misses (which produces an extra shot opportunity) and forcing turnovers (which take an opponent’s opportunities away). To compute “Offensive rebound & turnover margin”, I calculated each team’s average per-game advantage in offensive rebounds and turnovers over the course of the season.

Winning underdogs have better offensive rebounding and turnover margins

To the left I show the average margin for underdogs and favorites who played in non-upsets (left) and upsets (right).

In non-upsets, the favorites have higher margins than the underdogs, but in upsets the opposite is true. If an underdog is better at getting offensive boards and turnovers than their opponent, that game is ripe for an upset.

Machine Learning for Upset Prediction

Rather than examine 82 features individually, it’s time to let the algorithms do their thing. I split the games into a training set (80%) and test set (20%) and trained each algorithm for upset prediction.

Training classification algorithms

I treated upset prediction as a classification problem, with the goal to classify each game as an “upset” or not. I chose 5 classification algorithms to train:

Logistic regression
Neural network
Support vector machines
Random forests
Gradient tree boosted classifier

Using python’s scikit-learn package, I fit each algorithm to the training data and used 5-fold cross-validation to tune the model hyperparameters (settings that affect model fit). To identify the optimal settings for each algorithm, I used the cross-validation F1 score. Here, the F1 score is a balance of upset precision (minimizing the incorrect upset predictions) and recall (predicting most of the actual upsets). See here for more about F1.

I should also acknowledge that upsets are “imbalanced” in these data, with only 22% of games being upsets. These algorithms typically do best with examples that are closer to a 50–50 split. Without going into too much detail, I’ll note the training data was resampled to create balance between upsets and non-upsets (read more about imbalanced data and resampling here).

Testing model performance

For each algorithm, the best model from training was evaluated on the held-out test set. The ROC curves for the 5 models are shown below:

The plot shows that for each classifier, as the true positive rate increases, the false positive rate also increases. This is true of most classification problems; as more true cases are predicted, more false cases are also predicted. I also notice that the algorithms maximize the true positive rate at distinct points in the ROC curve, so combining them in an ensemble might produce better results.

“Future” matchup classification

To evaluate a more “real-world scenario”, I wanted to predict upsets from a single tournament. I used the logistic regression model to predict 42 upset-potential games from 2017, which I held out from prior training and testing.

Matrix showing prediction results for 2017

This matrix shows the breakdown of predictions into correct non-upsets (upper-left), correct upset predictions (lower-right), incorrect non-upset predictions (lower-left), and incorrect upset predictions (upper-right).

The model predicted 15 total upsets in 2017. Of the 10 actual upsets in 2017, 8 were identified by the model, but the model also made 7 “false-positive” upset predictions. The model is a bit too aggressive in predicting upsets, but it’s still correct more often than not for these hard-to-predict games.

A closer look at the predictions provides more insight into where the model performed well, and where it could use some improvement:

Upset predictions are most accurate for Round 1

The model is really good at predicting the first round, with 23/24 games predicted correctly. For all other rounds, the model went 10/18. I’d like the model to be more precise throughout the tournament, but the first-round is also when most upsets occur. Most bracket pool players can gain an advantage by taking the model-predicted upsets for the first Round. I also calculated that placing money-line bets on these upset predictions would have gotten a 21% net return on the amount wagered.

Conclusions

I have several ideas for improving the model predictions, including adding more data to create more features, weighting a team’s recent performance, and trying other algorithms or algorithm combinations. I’d also like to try modeling game scores instead of the final outcome, and looking at other NCAA tournament prediction problems, like identifying the characteristics of overrated squads who go home early, and “Cinderella” underdogs poised for a deep tournament run.

I’ll announce future work on Twitter, so give me a follow if you enjoy!