How to Interpret NBA Impact Metrics [Video & Script]
NBA impact metrics, despite acting as some of the best tools available to the public today, are frequently misinterpreted. What are some ways to look at impact metric scores and use the data to our best advantage? Are there cases in which largely lower-score players are “better” than higher-score players? As a relatively avid user of these stats, I dive into the tips and tricks I’ve gained over the past few months on how to make the best use of these metrics.
Good afternoon everyone, and welcome back to another discussion on NBA impact metrics. So recently, I’ve been talking and writing about impact metrics more than usual, and that’s because there’s still some level of mystery surrounding them in the public eye. Anyone can hop onto Basketball-Reference and see that Russell Westbrook had a league-leading +11.1 Box Plus/Minus in 2017, but what does that actually mean? For the more dedicated NBA fan, these numbers might be seen every day through an endless series of acronyms of BPM, EPM, RPM, PIPM, RAPM, RAPTOR, or whichever metric that happens to be perused that day. Because of this, I’ve decided to sit down today and discuss the interpretations of these impact metrics, not only to set some of the records straight on what they truly mean but as an aid to make more reliable conclusions based on these types of data.
Before we begin, if you missed my discussion of how impact metrics are calculated, feel free to scroll down to the description box; I’ve linked that article as a precursor to this video as some optional context going into this.
With that out of the way, let’s go over the definition of an impact metric that I laid out in that article, which goes as follows: “Impact metrics are all-in-one measurements that estimate a player’s value to his team. Within the context we’ll be diving into here, a player’s impact will be portrayed as his effect on his team’s point differential every 100 possessions he’s on the floor.” So when we refer back to Russell Westbrook’s Box Plus/Minus in 2017, we know that metric says he provided an estimated +11.1 points of extra impact every 100 possessions he played. Now, this understanding may suffice to recognize a score, but what we’re looking for here is a strong interpretation of these metrics because, unless the nuances of each metric remain present in the mind, some of the inferences drawn from these data points will lead to much poorer interpretations of what’s happening on the basketball court.
Let’s begin with some universal rules that can be applied to nearly every widespread metric. They only capture a player’s value to his specific team in his specific role. And if that doesn’t sound like a problem at first, consider this example. Arguably the top-2 NBA point guards in 2016 and 2017 were Stephen Curry and Russell Westbrook, and they both had the best seasons of their careers within these two seasons. Curry had his mega-spike in the 2015-16 season while Westbrook followed that MVP case with one of his own in 2017 when he averaged a triple-double. Let’s look at their scores in Box Plus/Minus during these seasons. When Curry had his peak, he posted an Offensive Box Plus/Minus of +10.3 and an overall score of +11.9, which to this day stands as one of the highest marks since the merger. The following season, he had a very impressive yet significantly lower +6.7 Offensive Box Plus/Minus and a +6.9 total score. So was Steph really 5 points worse in 2017 than he was in 2016? Of course not. This phenomenon was created when the addition of Kevin Durant absolved some of Curry’s role in the effort to help the two stars coexist on the court together. If the logistics behind that alone doesn’t make sense, just look at some of Curry’s box score stats when he played without Durant in 2017 compared to his bigger role in 2016.
2017 (w/o KD): 39.5 PTS/100, 9.9 AST/100, 7.3 FTA/100
2016: 42.5 PTS/100, 9.4 AST/100, 7.2 FTA/100
The same events happened in reverse with Russell Westbrook, coincidentally enough because of the same player. With Durant in OKC in 2016, Westbrook had an Offensive BPM of +6.4 and an overall score of +7.8. After Durant left for Golden State that offseason, Westbrook posted an offensive score of +8.7 and an overall score of +11.1. So we ask again, was he really +3.3 points better a mere season apart? Of course, we answer again as we did with Curry: no. Westbrook simply took on a larger role; in fact, a much larger role. His usage rate increased a whopping TEN percent between the 2016 and 2017 seasons. That’s an unprecedented amount of change in such a short amount of time! Let’s use the same technique we did for Curry and compare Westbrook’s 2016 box scores when Durant was off the floor against his historically great solo act the following season.
2016 (w/o KD): 40.7 PTS/100, 15.3 AST/100, 13.0 FTA/100
2017: 44.8 PTS/100, 14.7 AST/100, 14.7 FTA/100
The takeaway here is that impact metrics are extremely sensitive to a player’s role and only estimate what they’re doing in their specific situations. This means players who are poorly coached or are being assigned to a role that doesn’t fit their playstyle will always be underrated by these metrics while players who are utilized perfectly and play a role tailored to their style will always be a bit overrated. This type of significant confoundment will be found more often in box-only metrics than play-by-play informed ones; but star players on more top-heavy rosters will also see inflation in their on-off ratings, even after adjusting for some forms of luck.
The next thing I’d like to discuss is how to interpret impact metrics in large groups. I’ve seen some claims that say one metric on its own isn’t entirely telling, but a healthy mix of a lot of metrics will be significantly more telling. Despite the fact that almost all notable one-number metrics attempt to estimate the same base measurement, RAPM, I still struggle with this idea; and that’s because every one of these metrics is created differently. Box Plus/Minus only uses the box score; PIPM uses the box score as well as luck-adjusted on-off ratings, RAPTOR uses both of those in addition to tracking data. I wouldn’t go so far as to call this an apples-to-oranges comparison, perhaps more along the lines of red apples to green apples. Averaging out five or so metrics might get closer to a true value, but it doesn’t necessarily move the needle as effectively as viewing each metric individually and considering the nuances. But I also won’t say this is entirely more useful, as these metrics still do use a lot of the same information. One form of confoundment for one metric will likely be present to some degree in another.
The last main topic I’ll talk about here is how to interpret impact metrics within their sample size. At the time of this writing, NBA teams have played an average of 52 games so far, yet there have been cases of 50-game samples of these metrics treated just the same as a 250-game sample. This is where I’ll introduce the variance among these metrics. I’m a part of a biweekly MVP ladder ranking over at Discuss The Game, the profile to which I’ll also link in the description box, and in the discussion room filled by the panelists, I saw a lot of talk early on in the season that compared impact metric scores of the candidates. I only found this interesting because, as the panelist with arguably the highest usage of impact metrics in an overall sense, here I was the panelist with the least reliance on these stats. So why was this shift so significant? It paints a picture of how variance is often treated among basketball stats. NBA analyst Ben Taylor discusses “sample-size insensitivity” in his book, Thinking Basketball, which states most fans will often not consider the possibilities that lie beyond the scope of an allotted time period. This means that almost every team that wins the NBA championship is crowned the best team of the season. But what if the same teams replayed the same series a second time? Perhaps a third time? Hypothetically, if we could simulate these environments thousands of times, we’d have a much better idea of how good players and teams were during certain seasons. Because, after all, a lot of confounding results that don’t align with observable traits could be nothing more than random variance.
So, with the bulk of this discussion concluded, let’s go over the biggest points in interpreting an impact metric score. When working with larger samples that span at least a season, perhaps the largest factor to consider is role sensitivity. Because these metrics only estimate how valuable a player is in his specific context, these aren’t estimates of how good a player would be across multiple environments. So in this sense, “value” versus “goodness” has some separation here. Look at these measures as ballparking values for how a team’s point differential shifts with versus without a player, subject to the inflations or deflations that come along with the circumstances of a player’s role and fit on that team. The next part of this relates back to assessing scores in groups. A simple averaging won’t suffice; each metric was made differently and should be treated as such. Instead, I prefer to use these different calculations of impact as a reference to which types of data prefer different types of players. So while almost all of Westbrook’s value can be detected by the box score, often with some overestimation, someone like Curry, who provides a lot of unseen gravity and rotational stress, won’t have a lot of his more valuable skills in consideration with these measurements. The last, and arguably the most important, is to interpret an impact metric based on its duration. Similar to how an RAPM model’s scores should be interpreted relative to its lambda-value, an impact metric score should be interpreted relative to its sample size. After ten or twenty games, they may be more “right” than they are “wrong,” but they aren’t definitive measures of a player’s situational value, and are even more confounded than the limitations of the data that goes into these stats. This means while one player can appear to be more valuable to his team, when in fact the counterpart in this example will prove to have done more in the long run.
So the next time you’re on Basketball-Reference or FiveThirtyEight or wherever you get your stats, I hope this helps in understanding the values of these metrics and how to use them appropriately and in their safest contexts, Thanks for watching everyone.