Campus users should disconnect from VPN to access senior theses, as there is a temporary disruption affecting VPN.
 

Publication:

Breaking Basketball: Using Logistic Regression and SVMs to Predict Basketball Game Outcomes

Loading...
Thumbnail Image

Files

sc5133_written_final_report-2.pdf (462.24 KB)

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Predicting the outcome of sports games is a big and exciting problem. Sports analytics is constantly evolving and finding better ways to understand and break down a sport. Basketball, being a dynamic sport, has tremendous avenues for analysis and predictive modeling. Previously, most approaches have either used rudimentary and descriptive data or built expensive and complex models. This thesis leverages dynamic and complementary feature engineering to model matchup-specific strengths and weaknesses between competing teams. Key methodological innovations include the use of rolling averages to capture temporal trends, complementary metrics (offensive vs. defensive efficiencies, rebounding differential) to account for interactions, and era-based segmentation to analyze the evolution of feature importance across basketball history. Logistic regression with L1 regularization was employed, achieving an impressive 70% prediction accuracy—a significant improvement over other models—and uncovering interpretable insights into feature contributions. The most accurate model was trained on 40 seasons (35,000 games) of NBA data from 1985-2023.

Description

Keywords

Citation