Model Behavior

Building a model to predict the NBA's MVP Award winner.

Jul 19, 2024

About 15 years ago, when I was still working on Basketball-Reference.com, I came up with a model to predict the NBA’s MVP Award winner in a given season. The goal was to model the behavior of the voters, not to determine the “most deserving” player. Since it’s been quite some time since I did this, I thought it would be fun to revisit the issue.

I decided to use the 1980-81 season as my starting point, as that’s when the voting shifted from the players to the media. Every MVP Award winner since then has met the following criteria:

Played for just one team during the season.
Played in at least 80% of his team’s games (the standard under the current collective bargaining agreement is 65 games in an 82-game season, which is a little over 80%).
Averaged at least 30 minutes per game.

Those seem like reasonable standards to use to establish the player pool. For example, last season there were 62 eligible players, an average of a little more than two players per team.

I decided to build a logistic regression model with an indicator for MVP Award winner as my dependent variable (the winner would have a value of one, all others would have a value of zero). For my independent variables, I wanted to stick to basic statistics, so factors like points per game and rebounds per game were considered, while advanced metrics like Player Efficiency Rating and Win Shares were not.

I tried numerous combinations of independent variables, but in the end only four factors were statistically significant:

Points per game
Rebounds per game
Assists per game
Team winning percentage

Those factors produced a logistic regression model with the following parameters:

Intercept = –37.7
Points per game = 0.40
Rebounds per game = 0.46
Assists per game = 0.49
Team winning percentage = 26.6

The coefficients above are rounded off for presentation purposes, but I carried all of the decimals for any calculations that follow.

Let’s use Nikola Jokic’s 2023-24 season as an example. Jokic averaged 26.4 PPG, 12.4 RPG, and 9.0 APG for a team with a winning percentage of .695. Using the coefficients above, this yields a logit of:

L = -37.7 + 0.40 * 26.4
          + 0.46 * 12.4
          + 0.49 * 9.0
          + 26.6 * 0.695
  = 1.39

Which is converted into a probability as follows:

P = 1 / (1 + e^-L)
  = 1 / (1 + e^-1.39)
  = 0.801

To force the probabilities in a given season to sum to one, I divided the individual P values by the sum of the P values for all players. The latter sum was 2.54 in 2023-24, so Jokic’s adjusted probability is:

P_adj = 0.801 / 2.536 = 0.316

Here are the players with an adjusted probability greater than 0.05 last season:

Nikola Jokic, 0.316
Luka Doncic, 0.295
Jayson Tatum, 0.191
Giannis Antetokounmpo, 0.094
Shai Gilgeous-Alexander, 0.066

I would classify this as a success, as Jokic, the actual winner, also had the highest individual P value.

For the rest of this post, I’m going to summarize the results in Q&A format. Keep in mind throughout that the model wasn’t constructed to pick the “best” player, but rather the most likely MVP Award winner.

How often does the model identify the winner?

The model correctly predicts the MVP Award winner in 32 out of 44 seasons (72.7%), including 15 out of the last 18 (83.3%). The only misses over the last 18 seasons are Derrick Rose in 2010-11, Russell Westbrook in 2016-17, and Nikola Jokic in 2021-22, all of whom had the second-highest P value in the given season.

In the 12 seasons in which the model incorrectly identified the MVP Award winner, the player with the highest P value finished second in the voting seven times, third four times, and ninth once.

What’s the biggest miss by the model?

In 2005-06, MVP Award recipient Steve Nash had just the ninth-highest P value (0.013), which is also the lowest such figure ever recorded by a winner. The top P value belonged to Nash’s former teammate, Dirk Nowitzki, who won the award the following season (and once again had the highest P value).

The second-biggest miss was Nash in 2004-05. Nash had the seventh-highest P value that season (0.047), while the model pegged Nash’s teammate, Amar’e Stoudemire, as the probable winner (P = 0.290).

Who had the highest P value?

LeBron James had a P value of 0.909 in 2009-10, a season in which he received 116 out of a possible 123 first-place votes. Kobe Bryant was a distant second with a P value of 0.033. That gap of 0.876 is the largest between the top two candidates.

What’s the highest P value by a non-winner?

Karl Malone had a P value of 0.699 in 1997-98, when he finished second in the voting to Michael Jordan. Malone’s Utah Jazz tied Jordan’s Chicago Bulls for the NBA lead in wins with 62, as Malone averaged more rebounds (10.3 to 5.8) and assists (3.9 to 3.5) than Jordan while almost matching his scoring average (27.0 to 28.7).

What’s the lowest P value by a winner?

As noted above, it’s Steve Nash in 2005-06 (0.013), followed by Nash in 2004-05 (0.047) and Allen Iverson in 2000-01 (0.063).

Thank you for reading Statitudes. This post is public so feel free to share it.

Who was “snubbed” the most times?

Granted, “snubbed” isn’t exactly the right word here, but Larry Bird is the only player to finish with the highest P value multiple times in seasons in which he did not win the award. Bird had the highest P values in 1980-81 (the winner was Julius Erving), 1981-82 (Moses Malone), and 1987-88 (Michael Jordan). To be clear, none of those selections were controversial.

Who has the most seasons with the highest P value?

Larry Bird finished with the highest P value six times:

1980-81 (second in voting)
1981-82 (second)
1983-84 (first)
1984-85 (first)
1985-86 (first)
1987-88 (second)

LeBron James is second with five such seasons, followed by Michael Jordan with four.

What’s the smallest gap between the top two?

David Robinson held a slim 0.007 edge over Hakeem Olajuwon in 1993-94 (0.382 to 0.375), a season in which Olajuwon picked up his first and only MVP Award. Robinson finished second in the voting, but won the award the following season.

What was the deepest race?

Four players had a P value of 0.200 or higher in 2021-22, the most in any one season: Giannis Antetokounmpo (0.294), Nikola Jokic (0.277), Joel Embiid (0.203), and Devin Booker (0.202). Those players finished in the top four in the actual voting, with the order being Jokic, Embiid, Antetokounmpo, and Booker.

What’s the highest P value by a player who received no MVP votes?

Klay Thompson had a P value of 0.046 in 2015-16, but did not receive so much as a single fifth-place vote in the balloting. Thompson’s backcourt mate, Stephen Curry, had the highest P-value that season (0.531) and became the first (and still only) unanimous MVP Award winner.