How often does the winning team score more baskets in NBA games?

Dec 23, 2021 r basketball

Okay, the title sounds kind of stupid at first glance, but hear me out…

Recently, I tried to explain the rules of basketball to a friend. While attempting to boil down the game of basketball to its bare bones, I resorted to “just count how many times the ball goes through the hoop.” Forget about whether it’s a free throw, 2-pt field goal, or a three, just count how many times the ball goes through the hoop for each team. It sounds intuitive, but I don’t actually know how well this strategy works out in real life.

This led me to a simple question: how often would counting how often the ball goes through the hoop for each team correctly tell you which team actually won?

Data

I found a dataset of NBA game data with player stats on Kaggle. It goes from 2004 season through December 2020. I’m going to use only full seasons, so 2004-2019.

Analysis

Importing datasets with only necessary columns. The games dataset is only for joining season information to game_details.

require(tidyverse)

games <- readr::read_csv("games.csv",
                         col_select = c("GAME_DATE_EST",
                                        "GAME_ID",
                                        "SEASON"))

game_details <- readr::read_csv("games_details.csv",
                                col_select = c("GAME_ID",
                                               "TEAM_ABBREVIATION",
                                               "PLAYER_ID",
                                               "FGM",
                                               "FG3M",
                                               "FTM",
                                               "PTS"))

game_details <- game_details %>% dplyr::left_join(games, by = "GAME_ID")

I’m only interested in full season, so I filter down to only full seasons in the dataset (2004-2019).

game_details <- game_details %>% dplyr::filter(SEASON >= 2004 & SEASON <= 2019)

I want to compare PTS (points) to B (buckets) for each game. B (buckets) is a field I’m creating that adds up FGM (field goals made) and FTM (free throws made). Note that FG3M (3-point field goals made) is a subset of FGM, which is why it’s not included in B.

game_sum <- game_details %>% 
  dplyr::group_by(GAME_ID, TEAM_ABBREVIATION) %>%
  dplyr::summarise(PTS = sum(PTS),
                   B = sum(FGM,FTM))

To answer the question, I have to compare points and buckets between the two teams for each game.

If for a given game, the winning team scores more buckets, my basic basket counting method worked! Otherwise, it didn’t. sadface

# RESULT: Result of game in points, WIN or LOSE
game_sum <- game_sum %>% 
  dplyr::arrange(GAME_ID, desc(PTS)) %>%
  dplyr::mutate(RESULT = ifelse(row_number() == 1, "WIN","LOSE"))

# MORE_B: Scored more buckets, YES or NO
game_sum <- game_sum %>% 
  dplyr::arrange(GAME_ID, desc(B)) %>%
  dplyr::mutate(MORE_B = ifelse(row_number() == 1, "YES","NO"))

# How often did the winning team score more buckets?
answer <- game_sum %>%
  dplyr::filter(RESULT == "WIN") %>%
  dplyr::group_by(MORE_B) %>%
  dplyr::summarise(GAME_COUNT = n()) %>%
  dplyr::mutate(PCT = round(GAME_COUNT / sum(GAME_COUNT), 3))

Result

How often does the winning team score more baskets, regardless of whether the baskets count for 1pt, 2pt, or 3pts?

Nearly all of the time.

ggplot(answer, aes(x=MORE_B, y=PCT)) +
  geom_bar(stat="identity") +
  geom_text(aes(label = sprintf("%1.1f%%", 100*PCT)), vjust = -0.5) +
  labs(title = "Counting baskets is a surprisingly accurate strategy",
       x = "Did the winning team score more baskets?",
       y = "Percent of NBA games (2004-2019)") +
  theme_classic() +
  theme(axis.text=element_text(size=12),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

Result Chart