Modeling NBA Scoring Proficiency
(📸 The Ringer)
The concept that diminishing returns on efficiency accompany an increase in scoring attempts has long existed, yet very few public models are available to showcase this. Recently, I tinkered with data from Basketball-Reference to estimate the effects of the context of a player’s shot selection with his shooting efficiency to create a few new statistics to help move the needle in quantifying these alternate approaches to “scoring proficiency.”
With this project, I had one overarching goal in mind: to estimate the number of points a player scored “over expectation” based on the distances of his field-goal attempts, whether or not the shot was assisted (i.e. whether or not he is “creating” his own shots), and how often he shoots. This would hopefully take out some of the noise in pure rate stats like Effective Field-Goal Percentage and True Shooting to identify the most “proficient” scorers based on the context of his shooting attempts.
The first step in calculating this “Points Over Expectation” statistic is looking at how far away a player’s field-goal attempts were from the hoop. Using BBR as the data source, this split the court into seven different zones:
- 0-3 feet from basket.
- 3-10 feet from basket.
- 10-16 feet from basket.
- 16 feet from basket to 3P line.
- Above-the-break 3P.
- Corner 3P.
The first building block to measuring scoring proficiency is comparing a given player’s efficiency and volume in these zones to league-average expectations and estimating a “Points Above Average” of sorts. For example, Luka Doncic has taken 215 attempts (through 4/21) within three feet of the hoop and made them at a 70.7% rate, which is slightly over 3% better than league-average; so based on the volume of his attempts, he “added” an estimated 14 points from this range. The process is repeated for all seven ranges, looking at how often and how efficient a player is from different zones on the court and comparing them to the expected output of an “average” player.
To add some additional context to a player’s shot selection and produce more accurate results, there are two regressions incorporated here:
- Efficiency based on how frequently a player’s field goals are assisted.
- Efficiency based on how often a player shoots from the field.
The firstmost regression occurs first, which looks at league-wide trends that estimate how efficiently a player will score based on how much help he would receive from his teammates. The results showed a significant positive trend between the two statistics. Namely, the more a player’s field goals are assisted, the more efficient he’s expected to score. The “PAA” results are adjusted to this context accordingly.
The second regression is incorporated next. This repeats the same process for “shooting help,” but instead looks at location-adjusted efficiency compared to shooting volume, measured in total field-goal attempts. The results from this also showed a distinct negative relationship between efficiency and volume; the more a player shoots, the less efficient he’ll become. The results from the previous regression are then fitted to these data points.
I calculated the scores for every NBA player in the 2021 season through April 21st, the spreadsheet to which can be found here. First glancing at the scores for 2021, the player who immediately popped up was Luka Doncic leading the NBA with 285.3 “Points Over Expectation.” He’s certainly not the best scorer in the league, so what’s going on here? The approach this model takes love how often Doncic creates his own attempts; 16% of his field goals were the products of assists. He also shoots the ball a lot, then standing fifth among all players in field-goal attempts.
Because of how the model works, the results will be slanted towards certain playstyles that demand the following:
- Players who receive little “help.”
- Players who shoot a lot.
This confounds results for two big reasons: 1) The regressions used to model “luck” aren’t perfect measurements; in other words, there will be some level of variance with how players are rewarded or not depending on the adjustment factors the model uses. 2) Not all shot profiles are created equally. This means different players would, over thousands and thousands of chances, see varying changes in their efficiency based on help and volume. The above regressions use a “best fit” to estimate this change, but this means there will sometimes be large errors or outliers.
The major takeaway here is that these results are mere estimates of a player’s scoring proficiency, not definitive measures. Because a heap of evidence shows Luka Doncic isn’t the NBA’s best scorer, we know we can treat his datapoint as the product of an error based on the calculations of the model.
Although the primary goal of this project was the “POE” statistic, there were some other neat results that could be produced based on the data going into the calculations. To the right of the POE column in the spreadsheet is a column labeled “cFG%”, which stands for “creation-adjusted Effective Field-Goal %.” This simply converts POE to a rate stat (adjusted efficiency per shot) and converts it to an eFG% scale, meaning the league-average cFG% will always be set to the league-average eFG%. This acts as a brief perspective on adjusted efficiency on a more familiar scale, but one that gives some more leniency to lower-volume scorers.
To the right of that is “xFG%,” which stands for “expected Effective Field-Goal %.” Because a player’s shot profile was the driver behind the main metric, the locations could also be used to determine how efficient a player is “expected” to be based on where he shoots the ball. This counterpart doesn’t look at the proportion of shots that are assisted or shooting volume, instead being based purely on location.
So does this stat really measure the best scoring in the league? Of course not. There are a few kinks that can’t be shaken out; but for the most part, I hope this acts as a comprehensive and accessible measure to look at the effect of a player’s shot profile on how efficiently he scores, and how this influences the landscape of current NBA scoring.