Back when I was working on Basketball-Reference.com, I built a Hall of Fame probability model. I was a little surprised how popular it became, as the results of that model were cited quite often (for better and for worse). The model has been slightly tweaked since I left Sports Reference, but the framework remains the same.
I decided to look into this topic again, basically starting from scratch. My player pool consisted of players who met the following criteria:
Played at least 10 seasons in the NBA.
Had the entirety of their NBA career fall between the 1968-69 and 2019-20 seasons. I chose to start with the 1968-69 season because that’s the first time the NBA named All-Defensive teams. The 2019-20 season was chosen as the stopping point because players who retired after that are not yet eligible.
That gave me a pool of 691 players, 77 of whom are Hall of Famers. The response variable in my model was simply Hall of Fame status, with electees assigned a value of one and all other assigned a value of zero.
After much experimentation, I settled on five predictor variables:
Career value, which is a weighted sum of the player’s individualized wins for each season. The player’s best season received a weight of 1.00, their second-best season received a weight of 0.95, their third-best season received a weight of 0.90, and so on.
Number of All-Star Game selections.
Number of All-NBA points, where a First Team selection earns the player five points, a Second Team selection is worth three points, and a Third Team selection is worth one point.
Number of All-Defensive points, where a First Team selection earns the player two points and a Second Team selection is worth one point.
Number of championships won. The player must have appeared in at least one postseason game for the league champion to receive credit in a given season.
Those factors produced a logistic regression model with the following parameters:
Intercept = –9.63926
Career Value = 0.04991
All-Star Games = 0.85433
All-NBA Points = 0.18568
All-Defensive Points = 0.28016
Championships = 1.03844
Let’s use one of
’s favorites, former Buffalo Braves star Bob McAdoo, as an example. Here are McAdoo’s values for each predictor:Career Value = 85.3
All-Star Games = 5
All-NBA Points = 8
All-Defensive Points = 0
Championships = 2
These are used to calculate McAdoo’s logit:
L = –9.63926 + 0.04991 * 85.3
+ 0.85433 * 5
+ 0.18568 * 8
+ 0.28016 * 0
+ 1.03844 * 2
Which is then converted into a probability:
P = 1 / (1 + e^-L)
= 1 / (1 + e^-2.452)
= 0.921
If the model produced a probability of 0.5 or higher for a player, then I predicted they were a Hall of Famer, otherwise not. I’ll summarize the results:
There were 614 non-Hall of Famers in the player pool. The model correctly classified 610 of them, or 99.3%.
There were 77 Hall of Famers in the player pool. The model correctly classified 72 of them, or 93.5%.
Overall, the model correctly classified 682 of the 691 players, or 98.7%.
Let’s take a closer look at the misses, starting with the five Hall of Famers who were not pegged as such by the model:
Keep reading with a 7-day free trial
Subscribe to Statitudes to keep reading this post and get 7 days of free access to the full post archives.