It’s been a long, windy journey to get here, one that started with plans for a video. (That wouldn’t have been enough time to discuss all ten metrics.) Then I recorded a podcast that ended up being just under an hour and a half long. I’m hoping to fall somewhere in the middle here, to provide as much information possible in a digestible amount of time. Ladies and gentlemen, I present to you (finally…), the 10 best NBA impact metrics.
Ranking impact metrics proved to be no easy task. To limit errors that would come from an arbitrary approach, I chose to run with a very strict criterion:
To qualify for the list, the impact metric had to (at the very least) be a measure of the past, or what has already happened. There will be some metrics in the rankings that enlist predictive techniques as well. But as long as past data is used to also measure the past, it checks this box. It’s also worth noting most metrics that aren’t strictly predictive don’t inject themselves with predictive power for traditionally “predictive” purposes. Model overfitting is a consistent problem among pure Plus/Minus metrics, meaning scores can severely change from team to team even if the quality of the player stays the same. Combatting these phenomena will actually create some clearer images of the past.
- Type of Measurements
Because almost every “good” modern metric is based in some way on Adjusted Plus/Minus (RAPM), which employs philosophically fair principles, I figured it would be “fairest” to evaluate metrics based on their adherence to the original “ideology” of RAPM: to measure a player’s impact on his team in a makeshift vacuum that places him alongside average teammates while facing similarly average opponents. Because this approach would, in theory, cancel out a lot of the noise that stems from extreme team circumstances to measure the player independent from his teammates, impact metrics are judged on how they align with these ideas. (Impact metrics are distinct measures of value to only one team, but some will be able to move the needle more in overcoming problems like these.)
- No Sniff Tests
A lot of NBA critics or fans who aren’t mathematically inclined will often skim leaderboards for a metric to see how it aligns with their personal lists or player rankings. Because this approach places too much stock in prior information, and a lot of the critics may not actually evaluate players well, the sniff test is not really going to help anyone judge a metric. For this list, all specific player placements are set aside to only view the metric from a lens that focuses on how they perform in the aforementioned criteria.
This doesn’t concern how the metric is judged itself, but the last qualification for a metric to appear on this list is its current availability. A metric I reviewed for this list called “Individual Player Value” (IPV) may have held a spot on the list, but there were virtually no opportunities to view the metric’s results from recent seasons. Thus, all metrics on this list were available (not necessarily free to the public but in the public knowledge) through the beginning of the 2021 season. If it isn’t clear, I really wanted to include PIPM here.
- Modern Era Metrics
Not all metrics on this list can extend as far back in the past as others. Most will be available from the 2014 season (when the NBA first started recording widespread tracking data) onward, while some can be calculated as far back as turnovers can be estimated (basically as far back as the institution of the shot clock). Because this really takes a “modern era” approach to evaluating these metrics, only a metric’s performance and value in the 2014 season and beyond will be in consideration during these rankings. So, for example, PIPM’s shaky nature in season predating Plus/Minus data is out of the equation here.
People can use impact metrics improperly during a debate all the time, but the most specific case I want to show can be explained by the following example. Let’s say, hypothetically, LeBron James finished as +7 in LEBRON in 2021 and +8 in BPM. If someone instigates a conversation with the BPM score, the interlocutor may provide the +7 LEBRON as a “better” or “more meaningful” representation of James. This is not a good way to go about comparing scores in impact metrics. Different metrics sway toward various playstyles, team constructions, etc. Just because LEBRON is a “better” metric (this shouldn’t really be a spoiler), it won’t measure every player better than, say, BPM.
If only this were as simple as only needing one list… Because different metrics treat different sample sizes differently, and the time period during which a metric is taken affects its accuracy relative to other metrics, I’ll split this list into two. The first, which will include the condensed metric profiles, assesses the metrics’ performances across three (full) seasons or more. Three years is the general threshold for stability, a point at which scores aren’t significantly fluctuating. The second list will evaluate metrics in a sample taken within a single season. Since players will often be analyzed using single-season impact metrics, this distinction will hopefully separate some of the metrics’ strengths and weaknesses in various environments.
Developer: Ryan Davis
Based on the original works of Regularized Adjusted Plus/Minus (RAPM), Ryan Davis added a prior to his calculations as a “luck adjustment.” It’s not a traditional prior that would, for example, use counting or on-off statistics to bring in outside information we know to hold value. Rather, the adjustment normalizes a player’s three-point shooting to his career average based on the location of the shot. Free-throw shooting is also normalized to career average. I’m particularly low on the function of the prior because, to me, it would make more sense to adjust teammate and opponent performance instead (what is done in luck-adjusted on-off calculations).
My largest concern is that long-term samples of LA-RAPM will struggle to capture improvements over time. And if someone were to map out a player’s career, it would probably be too smooth, and not in a good way. Because the career averages are used, it might be a bit better in evaluating career-long samples as a whole, but it’s not going to measure individual seasons or even long samples much better than RAPM with no prior. The ideology and the processes behind the metric are impressive, but their applications seem a bit wonky to me.
Developer: Jeremias Engelmann
The predecessor to every single other metric on this list, non-prior-informed RAPM was Jerry Engelmann’s improvement on Daniel Rosenbaum’s Adjusted Plus/Minus (APM), an estimate of the correlation between a player’s presence and the shift in his team’s Net Rating. Although a promising metric, APM was never truly built to be used in practice because of its inherent noisiness and high-variance solutions to linear-system appeasements. Englemann employed ridge regression, an equal-treatment form of Tikhonov regularization in which a perturbation form of traditional OLS appeasement uses various degrees of lambda-values (nowadays found through cross-validation) that suppress the variance of APM coefficients and draw all scores toward average, or net-zero.
A lot of great analytical minds will say long-term RAPM is the cream of the crop of impact metrics. However, as was with APM, it’s still unavoidably noisy in practice. And since players are treated entirely as dummy variables in the RAPM calculations, devoid of any causal variables, my confidence in the accuracy of the metric is lower than others. RAPM is built to provide strong correlations between the players and their teams, but due to a lack of any outside information creates a greater level of uncertainty regarding RAPM’s accuracy, I’m going to rank it near the back end of this list. However, I have it higher than Davis’s luck-adjusted version for the aforementioned reasons relating to career mapping.
8. Basketball-Reference Box Plus/Minus
Developer: Daniel Myers
The signature metric of Basketball-Reference and probably the most popular Plus/Minus metric on the market, Daniel Myers’s BPM 2.0 is arguably the most impressive statistical model on this list. There are some philosophical qualms I have with the metric, which I’ll discuss later. BPM continues the signature Myers trademark of dividing the credit of the team’s success across the players on the team. However, this time, he updated the metric to include offensive role and position on his team to add context to the environment in which a player accrued his box score stats. This means assists are worth more for centers, blocks are worth more for point guards, etc.
BPM incorporates a force-fit, meaning the weighted sum of BPM scores for a team’s players will equal the team’s seasonal Net Rating (adjusted for strength of schedule). However, a team’s NRtg/A uses a “trailing adjustment,” which adds a small boost to good teams and a downgrade for poor teams based on how teams often perform slightly better when they are trailing in the game. The aforementioned gripes are mainly based on how BPM treats offensive roles. The metric will sometimes treat increments in offensive load as actual improvements, something we know isn’t always true. There are also some questions I have on the fairness of measuring offensive roles relative to the other players on the team.
Developer: Ben Taylor
I’ve gone back-and-forth between the two major Box Plus/Minus models for some time now, but after learning of some new calculation details from the creator of the metric himself, I’m much more comfortable in leaning toward Ben Taylor’s model. He doesn’t reveal too much information (in fact, not very much of anything at all), even to the subscribers of his website, but he was willing to share a few extra details: BPM uses two Taylor-made (double entendre?) stats: Box Creation and Passer Rating, estimates of the number of shots a player creates for teammates every 100 possessions and a player’s passing ability on a scale of one to ten. This is a very significant upgrade over assists in the metric’s playmaking components and certainly moves the needle in overcoming team-specific phenomena that don’t represent players fairly.
Backpicks BPM also trains its data on more suitable responses, something I didn’t mention in the Basketball-Reference profile. Myers’s model uses five-year RAPM runs, notably decreasing the metric’s ability to measure stronger players. Conversely, Taylor’s two-to-three-year runs include stronger levels of play in the training data, meaning All-NBA, MVP level, and beyond caliber players are better represented. Teammate data is also toyed with differently. Rather than measuring a player strictly within the confines of his roster, the Backpicks model makes a clear attempt to neutralize an environment. To put this into perspective, Myers’s model thought of Russell Westbrook as a +7.8 player in 2016 (with Durant) and a +11.1 player in 2017. Taylor’s model saw Westbrook as a +7 player in both 2016 and 2017.
Developer: Jacob Goldstein
As was with me when I was first learning about impact metrics, introductory stages to basketball data will often lead people to believe PIPM is arguably the best one-number metric in basketball. It combines the box score with on-off data and has very strong correlative powers relative to its training data. But when I started to look under the hood a bit more, it was clear there more issues than immediately met the eye. Compared to other box metrics, PIPM’s box component is comparatively weak. It uses offense to measure defense, and vice versa. It doesn’t account for any positional factors and the response is probably the most problematic of any metric on this list. I’ll discuss this more.
Recently, I’ve been leaning away from on-off ratings. They’re inherently noisy, perhaps even more so than RAPM, easily influenced by lineup combinations and minutes staggerings, which can make an MVP level player look like a rotational piece, and vice versa. The luck adjustment does cancel out some noisiness, but I’m not sure it’s enough to overcome the overall deficiencies of on-off data. PIPM is also based on one fifteen-year sample of RAPM, meaning the high R^2 values are significantly inflated. Again, this means very good players won’t be well-represented by PIPM. This excerpt may have sounded more critical than anything. But the more I explore PIPM, the most I’m met with confounders that weaken my view of it. Perhaps the box-only metrics are slightly better, but I’ll give PIPM the benefit of the doubt in the long term.
Developer: Ben Taylor
Augmented Plus/Minus (AuPM) similarly functions as a box score / on-off hybrid. It incorporates the Backpicks BPM directly into the equation, also roping in raw Plus/Minus data such as On-Court Plus/Minus and net on-off (On-Off Plus/Minus). It includes a teammate interaction term that measures the player’s Plus/Minus portfolio relative to other high-minute teammates, and the 2.0 version added blocks and defensive rebounds per 48 minutes. There’s no direct explanation as to why those two variables were included; perhaps it included the regression results a bit more, despite having likely introduced a new form of bias.
Pertaining to the AuPM vs. PIPM debate, it should be abundantly clear that AuPM has the stronger component. And while PIPM bests its opponent in the on-off department, the inclusion of shorter RAPM stints in the regression for AuPM means more players will be measured more accurately. So, despite arguably weaker explanatory variables, the treatment of the variables leans heavily in favor of Augmented Plus/Minus.
Developers: Jay Boice, Neil Paine, and Nate Silver
The Robust Algorithm using Player Tracking and On-Off Ratings (RAPTOR) metric is the highest-ranked hybrid metric, meaning every metric higher uses RAPM calculations in its series of calculations, not just as a response for the regression. RAPTOR uses a complex series of box scores and tracking data paired with regressed on-off ratings that consider the performances of the teammates alongside the player and then the teammates of those teammates. The regression surmounts to approximate one six-year sample of NPI RAPM. My high thoughts of it may seem inconsistent with the contents of this list. However, one major theme has made itself clear throughout this research: tracking data is the future of impact metrics.
Despite a “weaker” response variable, RAPTOR performs excellently alongside other major one-number metrics. During Taylor Snarr’s retrodiction testing of some of these metrics, which involved estimating a team’s schedule-adjusted point differential (SRS) with its players’ scores in the metrics from previous seasons (all rookies were assigned -2), RAPTOR was merely outperformed by two metrics, both prior-informed RAPM models. This is a good sign RAPTOR is assigning credit to the right players while also taking advantage of the most descriptive types of data in the modern NBA.
Developers: Jeremias Engelmann and Steve Ilardi
Real Plus/Minus (RPM) is ESPN‘s signature one-number metric, touted for its combination of descriptive and predictive power. According to the co-creator, Steve Ilardi, RPM uses the standard series of RAPM calculations while adding a box prior. This likely means that, instead of regressing all the coefficients towards series as explained in the NPI RAPM segment, Engelmann and Illardi built a long-term RAPM approximation, which then acts as the datapoints the player scores are regressed toward. Value changes and visual instability aside, RPM is among the premier groups of metrics in their ability to divvy the right amounts of credit to players, having finished as the metric with the second-lowest SRS error rate in Snarr’s retrodiction testing.
Developers: Krishna Narsu and “Tim” (pseudonym?)
BBall Index‘s shiny new product, as new as the latest NBA offseason, the Luck-adjusted Player Estimate using a Box prior Regularized On-Off (LEBRON) metric makes a tangible case as the best metric on this list. Similar to RPM, it combines the raw power of RAPM with the explanatory power of the box score. LEBRON’s box prior was based on the findings from PIPM, but upgrades the strength of the model through the incorporation of offensive roles, treating different stats differently based on how certain playstyles will or won’t accrue the stats. Three-point and free-throw shooting is also luck-adjusted in a similar fashion to PIPM’s on-off ratings to cancel out noise.
Part of what makes LEBRON so valuable in a single-season context is its padding techniques, which involve altering a player’s box score profile based on his offensive role. For example, if Andre Drummond shot 45% from three during his first ten games, LEBRON’s box prior will take his offensive role and regress his efficiency downward based on how he is “expected” to stabilize. This makes LEBRON especially useful in evaluating players during shorter stints, and while these adjustments aren’t perfect, the metric’s treatment of box score stats probably uses the best methods of any metric on this list.
Developer: Taylor Snarr
I don’t want to say Estimated Plus/Minus (EPM) runs away with the top spot on this list, because it doesn’t, but it’s clear to me that it’s the best widespread impact metric on the market today. Roughly as young as LEBRON, EPM is the product of data scientist, Taylor Snarr. As noted by RPM co-creator Ilardi, EPM is similar to RPM in that it uses RAPM calculations to regress toward the box score, but also includes tracking data. The tracking data, as has been shown with RAPTOR, makes all the difference here. During his retrodiction testings, Snarr constructed linear regression models to estimate the effect of lineup continuity on the metric’s performance.
To the envy of every other metric, EPM’s reliance on lineup continuity was estimated to be roughly half of the runner-up metric, RPM. It may not sound like a crucial piece of information, but given EPM’s model strength and some of these types of metrics’ largest problems, EPM performs fantastically. It’s also worth mentioning EPM is predictive as well, having led the retrodiction testing in SRS error throughout the examined seasons. I allowed these details to simmer in my head for some time in case I was having some type of knee-jerk reaction to new information, but the points still stand tall and clear: EPM is currently unmatched.
For the single-year list, I only made two significant changes. Because shooting is much noisier in smaller samples, such as a season or less, I’ll give the edge to Davis’s LA-RAPM over NPI RAPM. Additionally, PIPM’s response issues drop it down a peg for me in the one-year context. I did consider putting LEBRON over EPM due to its padding (EPM doesn’t employ any stabilization methods to my knowledge), but the tracking data leading to greater team independence is too large an advantage for me to overlook.
- Estimated Plus/Minus
- Real Plus/Minus
- Augmented Plus/Minus
- Backpicks BPM
- Basketball-Reference BPM
- Player Impact Plus/Minus
- Luck-adjusted RAPM
- NPI RAPM
There was also some noise surrounding DARKO’s Daily Plus/Minus ranking on this list. I did evaluate the metric for this list despite its breaking of the criteria, simply to include it as it stacks up against the other metric in model strength. Based on the statistical model, I would rank it fifth on this list, bumping AuPM down a spot and slotting DPM right behind RAPTOR.
To my surprise, some people saw DPM as the premier impact metric on the market today. Some questioning led back to DPM’s developer’s, Kostya Medvedovsky, game-level retrodiction testings during the tracking era, which saw DPM lead all metrics. However, DPM is specifically designed to act as a predictive metric, giving it an unjustified advantage in these types of settings. Based on how I “expect” it would perform in descriptive duties based on the construction of its model, I don’t really see an argument for it cracking the inner circle (the Final Four).
Thanks for reading everyone! Leave your thoughts in the comments on these metrics. Which is your favorite? And which one do you believe is the best?