Operations Research and Financial Engineering, 2000-2025
Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp011r66j119j
Browse
Browsing Operations Research and Financial Engineering, 2000-2025 by Issue Date
- Results Per Page
- Sort Options
From xG to WAR: A Comprehensive Framework for Evaluating NHL Player Value
(2025) Larson, Thomas P.; Kornhauser, Alain LucienThis thesis presents a machine learning-based Wins Above Replacement (WAR) model for NHL skaters, integrating play-by-play and shift data from the 2023–24 and 2024–25 seasons. A Random Forest classifier predicts expected goals (xG) at the shot level, capturing offensive and defensive contributions, while a team-level Random Forest regressor translates performance metrics into win probabilities. Individual player contributions are standardized per 60 minutes, compared to replacement-level baselines, and weighted using feature importances from the win model to compute WAR. The result is a single, context-aware metric that quantifies a skater’s total value in terms of added team wins.
AI-Enhanced Adaptive Portfolio Optimization: Beyond the Markowitz Model
(2025) Jimenez, Julian C.; Almgren, RobertThis thesis examines the progression of portfolio optimization techniques from traditional (Markowitz and CAPM) to much more computationally advanced techniques such as Machine Learning and LLMs. Using a 15 year dataset of daily S&P500 returns, we show that Long Short-Term Memory (LSTMs) excel at capturing much shorter-term return forecasting compared to Deep Neural Networks (DNNs) which excel at discerning complex, otherwise invisible patterns in the long term and map straight from input data to asset weights. Both approaches surpass classical benchmarks in risk-adjusted performance. Lastly, we introduce a Large Language Model (LLM)–based simulator, demonstrating how ChatGPT can effectively synthesize (e.g., news headline sentiment, policy announcements) into allocation decisions. Our findings highlight the promising future of prompt engineering as well as LLM’s promising ability to combine numerical and textual insight into, potentially, better understood portfolio strategies.
Optimizing Management Strategy for Biochar Production at Scale
(2025) Limor, Emma R.; Sircar, RonnieArtisan biochar is a carbon-rich charcoal made from crop waste, commonly produced in soil pits adjacent to the fields where the waste is sourced. Biochar’s potential for carbon sequestration and soil health improvement is widely recognized, so many social enterprises have begun to replace open field burning with biochar production, funded by the sale of carbon credits. However, large-scale implementations face challenges related to cost, labor, and methodological rigor. This paper builds a model then determines an optimal strategy for a manager to inspect a set of n workers who make biochar in a cluster of close-by soil pits, through the duration of a biochar production work shift. Then, the efficacy of this strategy was tested against common strategies for inspection such as randomization and shortest-path decision making, and the results confirm this strategy’s superiority. After that, drawing on real-world data from the “Biochar for Burning” initiative in West Bengal, India, a case study is developed to answer the question: does this optimal strategy make a significant difference when the inspector wants to maximize not work quality, but biochar quality? We revise the model to factor in a dataset of known confounders as noise, run it on the project’s real- world soil-pit coordinate dataset, and determine if the strategy still offers a significant improvement. We found a modest improvement (with respect to alternative models) we expect would be greater if we weighed worker effort level as a more important factor than flame temperature, a known confounder, which in further studies we expect will be verified. Our model consistently produced more reliable results than the alternatives as well.
Optimizing Ambulance Dispatch and Relocation
(2025-04) Patel, Shlok D.; Stellato, BartolomeoThis thesis presents a high-fidelity simulator-based framework for modeling, optimiz- ing, and learning real-time ambulance dispatch and relocation strategies. We begin by formulating and solving three static optimization models to identify optimal am- bulance placements across a real road network in Princeton, NJ. These models are solved as mixed-integer linear programs using Gurobi and rely on a synthetic, spatially heterogeneous and realistic demand generator. To evaluate real-time operational policies, we build a discrete-event ambulance simulator that handles call arrivals with time limits, dynamic dispatching, and patient transport. We test a greedy baseline strategy and show that even with ample fleet size, calls can time out due to poor relocation logic. This motivates the use of reinforcement learning (RL) for dynamic decision-making. We develop two OpenAI Gym-compatible environments and train Proximal Pol- icy Optimization (PPO) agents to minimize response delay and maximize patient coverage. Our environments incorporate static travel-time routing on road graph us- ing OpenStreetMaps (OSM), stochastic on-scene and hospital service durations, and priority-weighted call handling. While our RL agents match or outperform baselines in low-fleet settings, they underperform in high-fleet settings, highlighting the impor- tance of reward shaping, evaluation fidelity, and simulation accuracy. Crucially, RL agents execute decisions in under 0.3ms post-training, offering real-time applicability that static optimization cannot match. We conclude with discussion on how EMS resource allocation is impacted by privatized healthcare structures in the United States. We outline future work explor- ing social-welfare-maximizing incentives, adversarial multi-agent RL between compet- ing ambulance providers, and simulation-grounded regulatory strategies for equitable emergency care.
Finding Skill Changes in Major League Baseball Player Development: A Bayesian Approach with LSTM Networks and Hidden Markov Models
(2025-04-07) Kram, Kaden; Fan, JianqingMajor League Baseball (MLB) organizations invest heavily in player evaluation and development, often relying on end-of-season statistics and traditional regression-to-the-mean models to assess talent. However, regression-to-the-mean assumes fixed skill levels and fails to account for the dynamic nature of player performance over a season. My thesis presents a novel approach to evaluating and forecasting MLB player development using Bayesian inference and changepoint detection models, including CUSUM, Bayesian Online Changepoint Detection (BOCPD), Hidden Markov Models (HMMs), and Long Short-Term Memory (LSTM) networks.
I use a Bayesian framework to iteratively update beliefs about a player's true skill level across various performance metrics such as batting average, slugging percentage, and weighted on-base-average. This approach incorporates uncertainty and provides richer comparisons between players than single-point estimates. I tested my models on both synthetic and real MLB play-by-play data, with synthetic data used to benchmark changepoint detection accuracy across controlled scenarios.
My analysis shows that while Bayesian inference effectively captures player skill trends and variation, the changepoint detection models struggle to identify subtle but significant shifts in skill due to the high noise inherent in binary baseball outcomes. The LSTM model initially showed promise but ultimately failed to outperform simpler methods in accuracy or consistency. Nevertheless, this work provides a foundation for future efforts to disentangle random fluctuations from true skill changes in athlete performance.
By offering a probabilistic framework for evaluating player development, this thesis contributes a more nuanced perspective to player scouting and performance forecasting, with implications for team decision-making, player strategy, and contract valuation.
Visualizing Harmony: Transfer Learning in Music Genre Classification
(2025-04-08) Caras, George W.; Rigobon, DanielThis thesis investigates the application of transfer learning and embedding-based approaches to music genre classification, addressing the challenge of limited labeled data in music information retrieval. We explore three complementary approaches using the GTZANdataset: a baseline multilayer perceptron with hand-crafted audio features, a convolutional neural network leveraging VGGish embeddings pre-trained on YouTube audio, and a k-nearest neighbors classifier operating in the embedding space. Analysis of confusion patterns provides insights into genre boundaries and overlaps, suggesting that the embedding space effectively captures musical similarity beyond rigid genre categorization. We conclude by proposing a framework for transforming the genre classifier into a music recommendation system by utilizing the learned embeddings for similarity-based retrieval, potentially enabling more nuanced music discovery that transcends traditional genre limitations.
A Look Into Risk and Returns: The Predictive Value of Risk Indicators in Emerging Market Equities
(2025-04-08) Sukha, Deven P.; Rigobon, DanielEmerging Markets (EMs) are known to exhibit greater volatility in risk, meaning the indicators used to track risk fluctuate more than those in developed markets. This raises an important question: does the movement of risk indicators contain information that can aid in predicting returns in EM equity markets? To address this, we focused on five types of risk—credit, financial, political, economic, and composite- using values from several financial services firms. However, we found that changes in risk scores were not correlated across providers during the period studied, leading us to utilize a single provider S&P Global for sovereign credit and the International Country Risk Guide (ICRG) for the remaining risk indicators. We also modeled equity returns using pooled and country-specific Random Forest models, incorporating a range of macroeconomic variables that are relevant for return prediction. Predictive performance was evaluated using R2 and root mean squared error (RMSE), and the contribution of risk indicators was assessed through feature importance. We trained baseline models excluding risk indicators to test whether macroeconomic factors could compensate. Our results highlight the inherent difficulty of predicting equity returns: model performance was poor across the board, with R2 values near or below zero. While models that included risk indicators performed slightly better, the improvement was marginal. These findings suggest that changes in the selected risk indicators provide limited additional predictive value under the modeling approach. However, this does not necessarily mean that such indicators generally have no predictive usefulness. One plausible explanation is that equity indices may already reflect or even precede changes in these risk metrics, making any subsequent shifts in the risk indicators appear to have little effect. Further research could investigate different lags and modeling strategies to understand whether, and under what conditions, these risk indicators might enhance equity return predictions.
The Gas Gambit: Optimizing Iran's Natural Gas Exports Through Risk Minimization
(2025-04-09) Madaeni, Ghazal S.; Sircar, RonnieIran, despite possessing vast natural gas reserves, faces significant export constraints due to geopolitical isolation, sanctions, and infrastructure limitations. This thesis examines Iran’s optimal natural gas export strategy and aims to mitigate risk through the optimal exports of natural gas via pipeline and liquefied natural gas (LNG). In this paper, risk is defined as political and dyadic risk, which measure the risk associated with a country and the risk associated with the relationship between two countries, respectively. A comparison with Oman, a more politically stable LNG-focused exporter, provides another means for assessing Iran’s optimal exports. Using an optimization model based on portfolio theory, we minimize export risk while accounting for capacity constraints, export commitments, profit targets, transportation costs, and trade limitations. Analysis of different scenarios assesses Iran’s export allocation under a lack of export commitments, increased sanctions, and infrastructure investments, with Oman serving as a benchmark. Results show that export commitments (especially with high-risk countries) and sanctions increase risk, while diversifying and expanding the overall capacity decrease risk. Oman’s strategy highlights the advantages of export flexibility, contracts with low-risk countries, and LNG. These findings suggest that Iran’s reliance on pipelines heightens geopolitical vulnerabilities, while LNG expansion could enhance trade resilience. More broadly, the study con- tributes to understanding how geopolitical circumstances, infrastructure investments, and trade policies shape global natural gas markets.
We Need More Data: The Promise and Peril of Training Large Language Models on Synthetically Generated Text
(2025-04-09) Lam, Gordon K.; Cattaneo, Matias DamianOur research investigates the viability of using Large Language Models (LLMs) for natural language text augmentation. One major factor behind the significant improvement in LLM performance in recent years has been the increased volume of data being used to train models. However, state-of-the-art models have already been trained on nearly the entire internet, effectively exhausting the supply of unique, human-generated data. As a result, availability of unique data is emerging as a significant bottleneck for further advances in model performance. To address this issue, we explored data augmentation as a method of synthetic data generation aimed at expanding the size of existing training corpora. While data augmentation has proven effective for expanding dataset sizes and improving model performance in domains like image classification, robust methods for text data remain underdeveloped due to the complex structure of natural language. In our study, we used the Gutenberg English dataset to generate augmented versions of long-form passages using a state-of-the-art large language model. We then trained three identical ~124M parameter GPT-2 style models to convergence: one on the original dataset, one on the synthetic dataset, and one on a combination of both. Across nearly all evaluation benchmarks, including in-distribution and zero-shot tasks, the model trained solely on human-generated data outperformed the others. These findings highlight the importance of data quality in pretraining, not only underscoring the role it plays in improving model performance, but also the potential risks associated with relying on synthetically generated data, even as past gains have largely been driven by data volume. Our work also highlights limitations in current approaches to text data quality assessment, such as the inadequacy of cosine similarity as a proxy. While our results tell a cautionary tale about the risks of training LLMs on synthetic data, we also suggest potential directions for future work, particularly in refining synthetic data generation and filtering strategies.
Incorporating Skew in Hedge Fund Evaluation and Portfolio Allocation
(2025-04-09) Woolbert, Avery C.; Ahmadi, Amir AliWe have reason to believe that negatively-skewed assets are commonly overvalued, despite having significant downside risk. This paper investigates how the skewness of returns should change investors’ evaluation of and allocation to hedge funds. We first provide evidence to support the claim that many hedge funds have negatively skewed returns. We then propose a new evaluation benchmark for negatively-skewed funds. Finally, we discuss how investors can construct portfolios that take into account the skew of the underlying assets. We find that many common hedge fund indices have return distributions that resemble the shape of short put option payoffs, which are known to be left-skewed. We argue that when choosing a benchmark against which to measure a fund’s performance, investors should choose one with similar skew. Therefore, instead of measuring hedge fund performance relative to the S&P 500, we propose that funds be compared to a strategy of shorting monthly put options on the S&P 500. Not only should skew affect the way investors evaluate hedge fund performance, but it should also influence their capital allocation. We solve a mean-variance-skewness (MVS) portfolio optimization problem to construct an optimal portfolio across common hedge fund indices. We compare this optimal portfolio to the traditional Markowitz portfolio containing the same assets. The differences between these two portfolios provide evidence that incorporating skew in portfolio optimization should change how investors optimally allocate to hedge funds.
Blood Glucose Prediction and Control for Type I Diabetes Management: A Machine Learning Approach
(2025-04-09) Dantzler, Aaron; Akrotirianakis, IoannisType I Diabetes is a chronic disease in which patients cannot make insulin or make very little insulin to regulate their blood glucose. It affects over 1.7 million adults in the United States. People with Type I Diabetes are reliant on taking insulin every day, and recently insulin pumps and specifically Automated Insulin Delivery (AID) systems have revolutionized diabetes care, making treatment easier and more effective. There are three components needed for an AID system: a Continuous Glucose Monitor which relates patient blood glucose, an Insulin Pump which infuses insulin into the body, and an algorithm which translates information from the first two components to an amount of insulin necessary to keep blood glucose in the target range. Our focus will be on the last component. First, this thesis will provide an overview of machine learning techniques for blood glucose prediction on the novel DiaTrend dataset (2023) which has not been extensively studied before (although research on machine learning models has been applied to previous datasets). Our work finds that adding complexity to our model only barely improves performance and does not justify longer run times and less interpretable results. Rather, we recommend a simple Autoregressive time series model which reaches similar impressive performance to the rest of our models while being simpler for healthcare providers to interpret. In the second part of the thesis, we propose two new AID algorithms which utilize our Autoregressive model: the Threshold Controller and IOB Controller. Rather than a PID or MPC approach, these algorithms rely on a set of simple heuristics similar to what an actual patient would use. We find that in a stressful scenario, these controllers are able to improve time in Target Range by up to 12% more than the leading Open Source OpenAPS oref0 algorithm, while providing safety by mitigating low blood glucose. This work lays the foundation for researchers and healthcare providers to implement new AID algorithms which utilize a combination of machine learning models and patient-based heuristics.
Exploring the Benefits of Multimodal Sensor Fusion in Autonomous Driving: A Comparative Study of Camera and LiDAR Using Transformer Architectures for Object Detection
(2025-04-09) Doniger, Sammy; Kornhauser, Alain LucienAccurate and robust object detection is critical for advancing autonomous driving systems. In recent years, transformer-based architectures have shown significant promise in this domain, offering improved performance over previous state-of-the-art technologies, largely due to their ability to handle long-range dependencies. This thesis explores the potential benefits of multimodal sensor fusion in autonomous driving by evaluating three transformer-based architectures for object detection tasks, each trained on the nuScenes dataset. The first model, TransFusion, integrates camera and LiDAR data within a unified transformer framework. The second model is a LiDAR-only variant, adapted from the TransFusion implementation to isolate the contribution from the LiDAR sensors. The third model, FCOS3D, is a camera-only model that isolates the contribution from the camera sensors. The primary goal of this research is to identify scenarios in which single-modality models (camera-only or LiDAR-only) produce conflicting detections and to analyze how the fusion-based approach handles these discrepancies. By closely examining these instances, the study evaluates whether LiDAR offers critical advantages over camera-only systems in consumer vehicles. Given the higher cost and complexity associated with LiDAR sensors, understanding whether these advantages justify the integration of LiDAR is vital for automotive manufacturers and researchers seeking to optimize safety, reliability, and system efficiency under cost constraints. Through extensive experimental evaluations, this thesis contributes insights into how multimodal fusion impacts object detection, revealing that while the LiDAR- only variant yields higher overall detection metrics in limited training environments, the camera-only approach excels at identifying near-range objects, and the fusion model effectively refines extraneous predictions. This synergy underscores trade-offs between cost and detection coverage, providing guidance for future sensor design and deployment strategies in the pursuit of a fully autonomous driving system.
Multi-Period Optimization of Portfolio Transitions: Incorporating Short-Term Alpha Signals and Practical Constraints
(2025-04-09) Zhao, Helen Y.; Almgren, RobertThis thesis develops a multi–period portfolio optimization framework that integrates short–term alpha signals with practical trading constraints, including market impact and deviation risk. By transforming the problem from portfolio–space variables to impact–space variables, our model captures the primary trade–off between harnessing alpha and mitigating market impact, while a risk penalty is imposed to ensure adherence to a target portfolio. After deriving the foundational objective function, the framework is enhanced through the incorporation of multiple alpha signals with distinct decay profiles and Monte Carlo simulations to account for forecast uncertainty. Comprehensive performance evaluations are conducted using an array of benchmarks—including linear trading, all–at–once execution, and half–at–midpoint trading—across metrics such as final wealth, cumulative return, volatility, maximum drawdown, turnover, tracking error, and implementation shortfall. The approach is further extended to multi–asset portfolios, where outcomes are compared across varying levels of stock correlation. Our results demonstrate that, despite the optimized trade schedule often resembling a nearly linear strategy, subtle deviations to exploit alpha allow for meaningful improvements in risk–adjusted performance. This work contributes both theoretical insights and practical tools for managing portfolio transitions in the presence of realistic market frictions and dynamic return forecasts, offering a pathway for future research into more complex cross–asset dynamics and nonlinear impact functions.
Success Prediction and Release Strategy Optimization for Independent Musicians
(2025-04-10) Raghunathan, Harit; Stellato, BartolomeoIn an era where digital platforms have democratized music distribution, small and emerging artists face overwhelming competition and a lack of data-driven frameworks for strategic decision-making. Thus, we address two core questions: (1) Which features of artist best predict future success? and (2) How can emerging musicians optimize their release strategies to maximize audience growth?
To answer the predictive question, we train interpretable machine learning models—including decision trees and logistic regression—on Spotify metadata for over 9,000 emerging artists. We find that recent release frequency and a high proportion of singles in an artist’s catalog are strong predictors of follower growth, with the best models achieving F1 scores over 0.80.
With the knowledge that the frequency of single releases are predictive of artist success, we then formulate a prescriptive framework to optimize artist release strategies. We model the daily follower growth of an artist over time as the sum of exponentially decaying functions triggered by single and album releases. We define these functions in terms of the time between successive releases, an artist's follower count, and the number of new tracks featured in an album. Using non-linear least squares models, we fit these functions using follower time-series data from Songstats. Incorporating these functions into a mixed-integer nonlinear program, we then solve for optimal single and album release schedules over a fixed planning horizon.
Our model's optimum solutions recommend the release of as many singles as possible, spaced evenly across a desired release period to maximize follower growth. Conversely, our model places less importance on saving unreleased tracks for an album release. This result highlights a functional distinction between the two mediums: singles are particularly effective at driving audience expansion for smaller artists, while albums are better at generating revenue for established artists.
Portfolio Optimization under Polynomial Trading Costs with Mean-Reverting Assets
(2025-04-10) Rasmussen, Bryce; Tangpi, LudovicThis thesis explores optimal portfolio allocation under polynomial trading costs created by market illiquidity, focusing on mean-reverting assets modeled by an Ornstein-Uhlenbeck process. Traditional portfolio optimization models often assume proportional transaction costs or ignore them entirely, leading to strategies that may be impractical due to excessive trading. We extend previous research by incorporating higher-order polynomial cost functions to better reflect the impact of trading volume on costs. Using deep neural networks, we approximate optimal no-trade boundaries and compare performance against cost-free analytic solutions. Our findings suggest that incorporating polynomial trading costs significantly alters optimal rebalancing behavior, particularly at higher wealth levels, where the importance of asset diversification increases. Backtesting results demonstrate that learned strategies outperform traditional cost-free models and trained linear models by reducing excessive trading and improving long-term wealth accumulation. This research provides insights into the scalability of deep learning approaches for real-world portfolio optimization problems and informs on ways to address market limitations.
Evaluating Individual Player Value and Positional Spending Efficiency in the National Football League
(2025-04-10) Jasti, Rahul; Scheinerman, DanielThis thesis introduces a data-driven framework for evaluating player value in the National Football League (NFL) by linking advanced performance metrics to player salaries. Despite the proliferation of advanced metrics in professional football, translating measures such as Wins Above Replacement, Expected Points Added, and Pro Football Focus grades into fair salary valuations remains challenging. The proposed framework addresses this gap by combining unsupervised learning with predictive modeling. Specifically, we use k-means to group players into performance-based archetypes. Then we train XGBoost regression models for each archetype to predict players’ expected average per-year salary. Finally, we design a constrained roster optimization model in order to maximize expected team wins under the salary cap. This segmented modeling approach enables a fine-grained evaluation of cost-efficiency across player roles and reveals systematic market inefficiencies. Results indicate that certain roles are consistently undervalued, whereas others are overvalued relative to their on-field contributions. We acknowledge that our findings are limited when considering an entire NFL roster due to the scarcity of advanced tracking data. We further acknowledge that our results are subject to uncertainty due to insufficient robustness checks and validation. Nevertheless, our results are intriguing and they immediately provide a practical application for researchers or general managers who want to improve their spending efficiency.
Physically Misinformed Neural Networks: Evaluating PINN Assumptions, with Applications to European Option Pricing
(2025-04-10) Colchamiro, Jacob B.; Klusowski, Jason MatthewIn recent years, Physics Informed Neural Networks (PINNs) have emerged as a powerful technique for incorporating domain knowledge into the machine learning modeling process. Specifically, the modeler trains a neural network on observed data while simultaneously penalizing deviations in the learned function from a set of posited PDE conditions. Understandably, if the modeler unwittingly enforces PDE constraints that poorly describe a given problem, this may significantly hinder the model’s generalizability to unseen data. The purpose of this paper is therefore to design a hypothesis test to evaluate the likelihood of a PINN formulation, when enforcing a chosen PDE prior, to improve fit over a baseline physically-uninformed neural network. We design a novel approach, using conformal prediction techniques, and outline conditions under which our algorithm can test null hypotheses that quantify the expected performance of the PINN. We justify the hypothesis test with simulated data generated to adhere to the Heat Equation, showing that our test functions as expected. Then, we show how to employ our hypothesis test to conclude that the Black Scholes equation is a useful regularizer within a PINN framework for European call option pricing, an important potential application of our work to a regime with an unknown data governing function.
Modeling Movement in Ancient Pompeii: An Urban Network Analysis of Pedestrian Flow and Economic Change
(2025-04-10) Chen, Samantha; Holen, MargaretThis project uses network-based pedestrian modeling to investigate how movement and economic activity in ancient Pompeii were shaped by the city’s infrastructure, particularly in the aftermath of the 62–63 CE earthquake and the resumption of gladiatorial games in 64 CE. Drawing from urban planning tools, the Urban Network Analysis toolkit was applied to simulate pedestrian flows across Pompeii’s street network. A sensitivity analysis was conducted to calibrate key parameters, including detour ratio and distance-decay, to generate behaviorally realistic simulations in the absence of complete archaeological records. Citywide simulations were run on both the reconstructed pre-earthquake network and the excavated 79 CE network. Surprisingly, there was no significant difference in the distribution of pedestrian flows between the two, challenging prior hypotheses that post-earthquake changes to street connectivity drove Pompeii’s economic shift from industrial production to commerce. To assess the potential impact of game-day movement, a scenario-based simulation was conducted from the amphitheater to the city gates. This highlighted several unexcavated segments that likely experienced high foot traffic. If gladiatorial games played a role in economic reorientation, future excavations along these paths may reveal a transition from industrial to retail-based activity. Overall, this study demonstrates how adapting urban planning models to archaeological contexts can offer new frameworks for interpreting urban change and testing hypothetical reconstructions. It also emphasizes the need for further research into the influence of event-based foot traffic on economic investment patterns, as well as a re-examination of assumptions about the long-term urban impacts of the 62–63 CE earthquake.
Exploiting Arbitrage Opportunities in Live Sports Betting: An Automated Approach
(2025-04-10) Neely, Julian A.; Kulkarni, Sanjeev RameshThis thesis explores the existence and feasibility of arbitrage opportunities in live sports betting markets by analyzing odds from two major sportsbooks—FanDuel and BetMGM—across multiple NBA games. Arbitrage is defined as a risk-free profit opportunity that occurs when betting on both sides of a game across platforms result in a positive return. Using real-time data collected through web scraping, the study identifies moments when the sum of opposing moneyline odds exceeds zero, signaling a profitable opportunity. The results show that while arbitrage opportunities do occur, they are rare and short-lived, typically lasting around 13 seconds under favorable conditions. On average, arbitrage was present in only 4.52% of total scraped game time. Arb1 configurations—pairing FanDuel’s Team 1 odds with BetMGM’s Team 2 odds—were more frequent than Arb2,the opposite pairing, likely due to structural team ordering and sportsbook pricing tendencies. Locking behavior was also analyzed, revealing that sportsbooks occasionally freeze odds, but these lock events were generally not correlated with arbitrage instances. The thesis also investigates cross-state arbitrage by comparing odds both between sportsbooks across states (e.g., FanDuel in New Jersey vs. BetMGM in Indiana) and within the same sportsbook across states (e.g., BetMGM in New Jersey vs. BetMGM in Indiana). While some differences were observed—particularly on BetMGM—no arbitrage opportunities emerged. Moreover, even if such opportunities existed, they would be impractical to exploit due to geolocation restrictions and the inability to place bets in two states simultaneously. An algorithm was developed to detect and execute arbitrage in real time. While it successfully placed a test bet under controlled conditions, practical challenges—such as reloading betslips, site restrictions, and account risk—limit the scalability of automated arbitrage betting in live markets.
Optimal Execution Against Strategic Traders: A Stackelberg Mean-Field Game Formulation
(2025-04-10) Garcia-De La Jara, Christian A.; Carmona, Rene A.This thesis investigates optimal execution strategies within a predatory trading environment through the lens of a Stackelberg mean field game. Specifically, it addresses the problem faced by a distressed institutional investor (leader) forced to liquidate a significant asset position over a finite horizon, anticipating strategic reactions from a large population of high-frequency traders (HFTs), modeled collectively as followers. Extending previous models, the framework introduced here leverages mean field approximations to capture the aggregate behavior of HFTs and the hierarchical decision-making inherent in such scenarios. Under the assumption of linear price impacts consistent with the Almgren–Chriss framework, we adopt cost functionals in the spirit of Cartea–Jaimungal. Building on the probabilistic approach of Carmona and Delarue, the equilibrium dynamics are fully determined by the fixed points of coupled forward–backward stochastic differential equations (FBSDEs), which can be solved to yield an explicit open-loop feedback control. We derive analytical solutions and present numerical results alongside a sensitivity analysis. Ultimately, this thesis proposes a realistic model that can serve as a benchmark for evaluating execution strategies, whether for a large institution or for a high-frequency trading desk.