Operations Research and Financial Engineering, 2000-2024
Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp011r66j119j
Browse
Browsing Operations Research and Financial Engineering, 2000-2024 by Issue Date
- Results Per Page
- Sort Options
AI-Enhanced Adaptive Portfolio Optimization: Beyond the Markowitz Model
(2025) Jimenez, Julian C.; Almgren, RobertThis thesis examines the progression of portfolio optimization techniques from traditional (Markowitz and CAPM) to much more computationally advanced techniques such as Machine Learning and LLMs. Using a 15 year dataset of daily S&P500 returns, we show that Long Short-Term Memory (LSTMs) excel at capturing much shorter-term return forecasting compared to Deep Neural Networks (DNNs) which excel at discerning complex, otherwise invisible patterns in the long term and map straight from input data to asset weights. Both approaches surpass classical benchmarks in risk-adjusted performance. Lastly, we introduce a Large Language Model (LLM)–based simulator, demonstrating how ChatGPT can effectively synthesize (e.g., news headline sentiment, policy announcements) into allocation decisions. Our findings highlight the promising future of prompt engineering as well as LLM’s promising ability to combine numerical and textual insight into, potentially, better understood portfolio strategies.
Optimizing Management Strategy for Biochar Production at Scale
(2025) Limor, Emma R.; Sircar, RonnieArtisan biochar is a carbon-rich charcoal made from crop waste, commonly produced in soil pits adjacent to the fields where the waste is sourced. Biochar’s potential for carbon sequestration and soil health improvement is widely recognized, so many social enterprises have begun to replace open field burning with biochar production, funded by the sale of carbon credits. However, large-scale implementations face challenges related to cost, labor, and methodological rigor. This paper builds a model then determines an optimal strategy for a manager to inspect a set of n workers who make biochar in a cluster of close-by soil pits, through the duration of a biochar production work shift. Then, the efficacy of this strategy was tested against common strategies for inspection such as randomization and shortest-path decision making, and the results confirm this strategy’s superiority. After that, drawing on real-world data from the “Biochar for Burning” initiative in West Bengal, India, a case study is developed to answer the question: does this optimal strategy make a significant difference when the inspector wants to maximize not work quality, but biochar quality? We revise the model to factor in a dataset of known confounders as noise, run it on the project’s real- world soil-pit coordinate dataset, and determine if the strategy still offers a significant improvement. We found a modest improvement (with respect to alternative models) we expect would be greater if we weighed worker effort level as a more important factor than flame temperature, a known confounder, which in further studies we expect will be verified. Our model consistently produced more reliable results than the alternatives as well.
From xG to WAR: A Comprehensive Framework for Evaluating NHL Player Value
(2025) Larson, Thomas P.; Kornhauser, Alain LucienThis thesis presents a machine learning-based Wins Above Replacement (WAR) model for NHL skaters, integrating play-by-play and shift data from the 2023–24 and 2024–25 seasons. A Random Forest classifier predicts expected goals (xG) at the shot level, capturing offensive and defensive contributions, while a team-level Random Forest regressor translates performance metrics into win probabilities. Individual player contributions are standardized per 60 minutes, compared to replacement-level baselines, and weighted using feature importances from the win model to compute WAR. The result is a single, context-aware metric that quantifies a skater’s total value in terms of added team wins.
Optimizing Ambulance Dispatch and Relocation
(2025-04) Patel, Shlok D.; Stellato, BartolomeoThis thesis presents a high-fidelity simulator-based framework for modeling, optimiz- ing, and learning real-time ambulance dispatch and relocation strategies. We begin by formulating and solving three static optimization models to identify optimal am- bulance placements across a real road network in Princeton, NJ. These models are solved as mixed-integer linear programs using Gurobi and rely on a synthetic, spatially heterogeneous and realistic demand generator. To evaluate real-time operational policies, we build a discrete-event ambulance simulator that handles call arrivals with time limits, dynamic dispatching, and patient transport. We test a greedy baseline strategy and show that even with ample fleet size, calls can time out due to poor relocation logic. This motivates the use of reinforcement learning (RL) for dynamic decision-making. We develop two OpenAI Gym-compatible environments and train Proximal Pol- icy Optimization (PPO) agents to minimize response delay and maximize patient coverage. Our environments incorporate static travel-time routing on road graph us- ing OpenStreetMaps (OSM), stochastic on-scene and hospital service durations, and priority-weighted call handling. While our RL agents match or outperform baselines in low-fleet settings, they underperform in high-fleet settings, highlighting the impor- tance of reward shaping, evaluation fidelity, and simulation accuracy. Crucially, RL agents execute decisions in under 0.3ms post-training, offering real-time applicability that static optimization cannot match. We conclude with discussion on how EMS resource allocation is impacted by privatized healthcare structures in the United States. We outline future work explor- ing social-welfare-maximizing incentives, adversarial multi-agent RL between compet- ing ambulance providers, and simulation-grounded regulatory strategies for equitable emergency care.
Finding Skill Changes in Major League Baseball Player Development: A Bayesian Approach with LSTM Networks and Hidden Markov Models
(2025-04-07) Kram, Kaden; Fan, JianqingMajor League Baseball (MLB) organizations invest heavily in player evaluation and development, often relying on end-of-season statistics and traditional regression-to-the-mean models to assess talent. However, regression-to-the-mean assumes fixed skill levels and fails to account for the dynamic nature of player performance over a season. My thesis presents a novel approach to evaluating and forecasting MLB player development using Bayesian inference and changepoint detection models, including CUSUM, Bayesian Online Changepoint Detection (BOCPD), Hidden Markov Models (HMMs), and Long Short-Term Memory (LSTM) networks.
I use a Bayesian framework to iteratively update beliefs about a player's true skill level across various performance metrics such as batting average, slugging percentage, and weighted on-base-average. This approach incorporates uncertainty and provides richer comparisons between players than single-point estimates. I tested my models on both synthetic and real MLB play-by-play data, with synthetic data used to benchmark changepoint detection accuracy across controlled scenarios.
My analysis shows that while Bayesian inference effectively captures player skill trends and variation, the changepoint detection models struggle to identify subtle but significant shifts in skill due to the high noise inherent in binary baseball outcomes. The LSTM model initially showed promise but ultimately failed to outperform simpler methods in accuracy or consistency. Nevertheless, this work provides a foundation for future efforts to disentangle random fluctuations from true skill changes in athlete performance.
By offering a probabilistic framework for evaluating player development, this thesis contributes a more nuanced perspective to player scouting and performance forecasting, with implications for team decision-making, player strategy, and contract valuation.
Visualizing Harmony: Transfer Learning in Music Genre Classification
(2025-04-08) Caras, George W.; Rigobon, DanielThis thesis investigates the application of transfer learning and embedding-based approaches to music genre classification, addressing the challenge of limited labeled data in music information retrieval. We explore three complementary approaches using the GTZANdataset: a baseline multilayer perceptron with hand-crafted audio features, a convolutional neural network leveraging VGGish embeddings pre-trained on YouTube audio, and a k-nearest neighbors classifier operating in the embedding space. Analysis of confusion patterns provides insights into genre boundaries and overlaps, suggesting that the embedding space effectively captures musical similarity beyond rigid genre categorization. We conclude by proposing a framework for transforming the genre classifier into a music recommendation system by utilizing the learned embeddings for similarity-based retrieval, potentially enabling more nuanced music discovery that transcends traditional genre limitations.
A Look Into Risk and Returns: The Predictive Value of Risk Indicators in Emerging Market Equities
(2025-04-08) Sukha, Deven P.; Rigobon, DanielEmerging Markets (EMs) are known to exhibit greater volatility in risk, meaning the indicators used to track risk fluctuate more than those in developed markets. This raises an important question: does the movement of risk indicators contain information that can aid in predicting returns in EM equity markets? To address this, we focused on five types of risk—credit, financial, political, economic, and composite- using values from several financial services firms. However, we found that changes in risk scores were not correlated across providers during the period studied, leading us to utilize a single provider S&P Global for sovereign credit and the International Country Risk Guide (ICRG) for the remaining risk indicators. We also modeled equity returns using pooled and country-specific Random Forest models, incorporating a range of macroeconomic variables that are relevant for return prediction. Predictive performance was evaluated using R2 and root mean squared error (RMSE), and the contribution of risk indicators was assessed through feature importance. We trained baseline models excluding risk indicators to test whether macroeconomic factors could compensate. Our results highlight the inherent difficulty of predicting equity returns: model performance was poor across the board, with R2 values near or below zero. While models that included risk indicators performed slightly better, the improvement was marginal. These findings suggest that changes in the selected risk indicators provide limited additional predictive value under the modeling approach. However, this does not necessarily mean that such indicators generally have no predictive usefulness. One plausible explanation is that equity indices may already reflect or even precede changes in these risk metrics, making any subsequent shifts in the risk indicators appear to have little effect. Further research could investigate different lags and modeling strategies to understand whether, and under what conditions, these risk indicators might enhance equity return predictions.
Incorporating Skew in Hedge Fund Evaluation and Portfolio Allocation
(2025-04-09) Woolbert, Avery C.; Ahmadi, Amir AliWe have reason to believe that negatively-skewed assets are commonly overvalued, despite having significant downside risk. This paper investigates how the skewness of returns should change investors’ evaluation of and allocation to hedge funds. We first provide evidence to support the claim that many hedge funds have negatively skewed returns. We then propose a new evaluation benchmark for negatively-skewed funds. Finally, we discuss how investors can construct portfolios that take into account the skew of the underlying assets. We find that many common hedge fund indices have return distributions that resemble the shape of short put option payoffs, which are known to be left-skewed. We argue that when choosing a benchmark against which to measure a fund’s performance, investors should choose one with similar skew. Therefore, instead of measuring hedge fund performance relative to the S&P 500, we propose that funds be compared to a strategy of shorting monthly put options on the S&P 500. Not only should skew affect the way investors evaluate hedge fund performance, but it should also influence their capital allocation. We solve a mean-variance-skewness (MVS) portfolio optimization problem to construct an optimal portfolio across common hedge fund indices. We compare this optimal portfolio to the traditional Markowitz portfolio containing the same assets. The differences between these two portfolios provide evidence that incorporating skew in portfolio optimization should change how investors optimally allocate to hedge funds.
Blood Glucose Prediction and Control for Type I Diabetes Management: A Machine Learning Approach
(2025-04-09) Dantzler, Aaron; Akrotirianakis, IoannisType I Diabetes is a chronic disease in which patients cannot make insulin or make very little insulin to regulate their blood glucose. It affects over 1.7 million adults in the United States. People with Type I Diabetes are reliant on taking insulin every day, and recently insulin pumps and specifically Automated Insulin Delivery (AID) systems have revolutionized diabetes care, making treatment easier and more effective. There are three components needed for an AID system: a Continuous Glucose Monitor which relates patient blood glucose, an Insulin Pump which infuses insulin into the body, and an algorithm which translates information from the first two components to an amount of insulin necessary to keep blood glucose in the target range. Our focus will be on the last component. First, this thesis will provide an overview of machine learning techniques for blood glucose prediction on the novel DiaTrend dataset (2023) which has not been extensively studied before (although research on machine learning models has been applied to previous datasets). Our work finds that adding complexity to our model only barely improves performance and does not justify longer run times and less interpretable results. Rather, we recommend a simple Autoregressive time series model which reaches similar impressive performance to the rest of our models while being simpler for healthcare providers to interpret. In the second part of the thesis, we propose two new AID algorithms which utilize our Autoregressive model: the Threshold Controller and IOB Controller. Rather than a PID or MPC approach, these algorithms rely on a set of simple heuristics similar to what an actual patient would use. We find that in a stressful scenario, these controllers are able to improve time in Target Range by up to 12% more than the leading Open Source OpenAPS oref0 algorithm, while providing safety by mitigating low blood glucose. This work lays the foundation for researchers and healthcare providers to implement new AID algorithms which utilize a combination of machine learning models and patient-based heuristics.
We Need More Data: The Promise and Peril of Training Large Language Models on Synthetically Generated Text
(2025-04-09) Lam, Gordon K.; Cattaneo, Matias DamianOur research investigates the viability of using Large Language Models (LLMs) for natural language text augmentation. One major factor behind the significant improvement in LLM performance in recent years has been the increased volume of data being used to train models. However, state-of-the-art models have already been trained on nearly the entire internet, effectively exhausting the supply of unique, human-generated data. As a result, availability of unique data is emerging as a significant bottleneck for further advances in model performance. To address this issue, we explored data augmentation as a method of synthetic data generation aimed at expanding the size of existing training corpora. While data augmentation has proven effective for expanding dataset sizes and improving model performance in domains like image classification, robust methods for text data remain underdeveloped due to the complex structure of natural language. In our study, we used the Gutenberg English dataset to generate augmented versions of long-form passages using a state-of-the-art large language model. We then trained three identical ~124M parameter GPT-2 style models to convergence: one on the original dataset, one on the synthetic dataset, and one on a combination of both. Across nearly all evaluation benchmarks, including in-distribution and zero-shot tasks, the model trained solely on human-generated data outperformed the others. These findings highlight the importance of data quality in pretraining, not only underscoring the role it plays in improving model performance, but also the potential risks associated with relying on synthetically generated data, even as past gains have largely been driven by data volume. Our work also highlights limitations in current approaches to text data quality assessment, such as the inadequacy of cosine similarity as a proxy. While our results tell a cautionary tale about the risks of training LLMs on synthetic data, we also suggest potential directions for future work, particularly in refining synthetic data generation and filtering strategies.
Exploring the Benefits of Multimodal Sensor Fusion in Autonomous Driving: A Comparative Study of Camera and LiDAR Using Transformer Architectures for Object Detection
(2025-04-09) Doniger, Sammy; Kornhauser, Alain LucienAccurate and robust object detection is critical for advancing autonomous driving systems. In recent years, transformer-based architectures have shown significant promise in this domain, offering improved performance over previous state-of-the-art technologies, largely due to their ability to handle long-range dependencies. This thesis explores the potential benefits of multimodal sensor fusion in autonomous driving by evaluating three transformer-based architectures for object detection tasks, each trained on the nuScenes dataset. The first model, TransFusion, integrates camera and LiDAR data within a unified transformer framework. The second model is a LiDAR-only variant, adapted from the TransFusion implementation to isolate the contribution from the LiDAR sensors. The third model, FCOS3D, is a camera-only model that isolates the contribution from the camera sensors. The primary goal of this research is to identify scenarios in which single-modality models (camera-only or LiDAR-only) produce conflicting detections and to analyze how the fusion-based approach handles these discrepancies. By closely examining these instances, the study evaluates whether LiDAR offers critical advantages over camera-only systems in consumer vehicles. Given the higher cost and complexity associated with LiDAR sensors, understanding whether these advantages justify the integration of LiDAR is vital for automotive manufacturers and researchers seeking to optimize safety, reliability, and system efficiency under cost constraints. Through extensive experimental evaluations, this thesis contributes insights into how multimodal fusion impacts object detection, revealing that while the LiDAR- only variant yields higher overall detection metrics in limited training environments, the camera-only approach excels at identifying near-range objects, and the fusion model effectively refines extraneous predictions. This synergy underscores trade-offs between cost and detection coverage, providing guidance for future sensor design and deployment strategies in the pursuit of a fully autonomous driving system.
Multi-Period Optimization of Portfolio Transitions: Incorporating Short-Term Alpha Signals and Practical Constraints
(2025-04-09) Zhao, Helen Y.; Almgren, RobertThis thesis develops a multi–period portfolio optimization framework that integrates short–term alpha signals with practical trading constraints, including market impact and deviation risk. By transforming the problem from portfolio–space variables to impact–space variables, our model captures the primary trade–off between harnessing alpha and mitigating market impact, while a risk penalty is imposed to ensure adherence to a target portfolio. After deriving the foundational objective function, the framework is enhanced through the incorporation of multiple alpha signals with distinct decay profiles and Monte Carlo simulations to account for forecast uncertainty. Comprehensive performance evaluations are conducted using an array of benchmarks—including linear trading, all–at–once execution, and half–at–midpoint trading—across metrics such as final wealth, cumulative return, volatility, maximum drawdown, turnover, tracking error, and implementation shortfall. The approach is further extended to multi–asset portfolios, where outcomes are compared across varying levels of stock correlation. Our results demonstrate that, despite the optimized trade schedule often resembling a nearly linear strategy, subtle deviations to exploit alpha allow for meaningful improvements in risk–adjusted performance. This work contributes both theoretical insights and practical tools for managing portfolio transitions in the presence of realistic market frictions and dynamic return forecasts, offering a pathway for future research into more complex cross–asset dynamics and nonlinear impact functions.
The Gas Gambit: Optimizing Iran's Natural Gas Exports Through Risk Minimization
(2025-04-09) Madaeni, Ghazal S.; Sircar, RonnieIran, despite possessing vast natural gas reserves, faces significant export constraints due to geopolitical isolation, sanctions, and infrastructure limitations. This thesis examines Iran’s optimal natural gas export strategy and aims to mitigate risk through the optimal exports of natural gas via pipeline and liquefied natural gas (LNG). In this paper, risk is defined as political and dyadic risk, which measure the risk associated with a country and the risk associated with the relationship between two countries, respectively. A comparison with Oman, a more politically stable LNG-focused exporter, provides another means for assessing Iran’s optimal exports. Using an optimization model based on portfolio theory, we minimize export risk while accounting for capacity constraints, export commitments, profit targets, transportation costs, and trade limitations. Analysis of different scenarios assesses Iran’s export allocation under a lack of export commitments, increased sanctions, and infrastructure investments, with Oman serving as a benchmark. Results show that export commitments (especially with high-risk countries) and sanctions increase risk, while diversifying and expanding the overall capacity decrease risk. Oman’s strategy highlights the advantages of export flexibility, contracts with low-risk countries, and LNG. These findings suggest that Iran’s reliance on pipelines heightens geopolitical vulnerabilities, while LNG expansion could enhance trade resilience. More broadly, the study con- tributes to understanding how geopolitical circumstances, infrastructure investments, and trade policies shape global natural gas markets.
Smart Bidding for Smart Homes: Multi-Market Electricity Trading
(2025-04-10) Reddy, Laya P.; Ahmadi, Amir Ali\indent The 2021 Texas blackouts exposed the vulnerability of ERCOT’s deregulated electricity grid to extreme weather and price volatility, and these risks intensify with climate change, electrification, and rising electricity demand. At the same time, residential distributed energy resources (DERs)—rooftop solar, home batteries, and electric vehicles—are becoming more widespread and capable of providing valuable grid flexibility. Virtual Power Plants (VPPs) aggregate these resources to bid into wholesale markets, but typically rely on opaque, automated platforms that offer users little visibility or control. As a result, most households lack tools to understand or optimize their DERs for financial benefit.
This thesis presents a transparent, data-driven framework that enables household DERs to actively participate in ERCOT’s two-settlement electricity market, where day-ahead bids must anticipate uncertainty in real-time prices and solar generation. We combine probabilistic forecasting with two-stage stochastic optimization to model household market decisions. Quantile LSTM (QLSTM) models, trained on historical price and solar data, generate scenario-based forecasts that feed into a two-stage optimization model capturing DER dynamics, including battery degradation, EV availability, and flexible load scheduling. To align forecasts with downstream outcomes, we fine-tune the QLSTM using a decision-focused learning (DFL) loss that minimizes regret in the two-stage problem.
Simulations across four Texas cities and three levels of residential load profiles show that the baseline predict-then-optimize strategy consistently recovers most of the profit achievable under perfect foresight, while DFL improves robustness under volatile conditions. This work demonstrates how machine learning and optimization approaches can empower households to participate meaningfully in electricity markets by offering a user-centric alternative to DER optimization and supporting a more distributed, resilient, and responsive energy grid.
Branching Out: An Alternative Approach to Variational Inference Based Clonal Tree Reconstruction Using Wilson’s Algorithm
(2025-04-10) Tsai, Kyle; Raphael, BenIn this thesis, we explore the application of variational inference in reconstructing tumor phylogenies, or clone trees, from copy number aberrations measured in single-cell DNA sequencing data. As a first step, we identify a key computational bottleneck in existing variational inference algorithms for clone tree inference [10], and propose a computationally attractive alternative. Specifically, we analyze and test the weighted spanning tree sampling algorithm LARS used in the clone tree inference pipeline VicTree [10]. Through comprehensive testing, we discover that LARS is not robust and fails to properly sample from its target sampling distribution. As an alternative, we propose applying Wilson’s sampling algorithm [13], and find that it significantly outperforms LARS at sampling from the target distribution. Furthermore, Wilson’s algorithm provides substantial computational benefits over LARS, and scales much better in the problem size. Having demonstrated the superior performance of Wilson’s sampling algorithm to LARS, we attempt to incorporate it into the VicTree variational inference pipeline. Preliminary results show that the clone tree reconstruction with the modified VicTree algorithm is promising, as it is more accurate and significantly faster than before, though our analysis also identifies several issues with the modified VicTree pipeline.
Opinion Dynamics in Digital Networks: Integrating Bounded Confidence and Expressed Private Opinion Models
(2025-04-10) Riendeau-Krause, Dominic; Rebrova, ElizavetaThis paper examines how opinions form in social networks, particularly when individuals look to a centralized source for the majority opinion. Motivated by the increasingly connected and selective nature of digital information platforms, this study introduces a new extension to the bounded confidence model that distinguishes between the expressed and private opinions of individuals. The proposed Expressed Private Opinion-Bounded Confidence (EPO-BC) model integrates two existing models to support a more complete understanding of how opinion clusters form, polarization emerges, and pluralistic ignorance develops in networked environments. Key findings show the potential role of centralized broadcasting in the creation of perceived consensus that hides underlying opinion diversity. While primarily theoretical, this research helps to explain and understand how digital platforms impact opinion formation and offers insights into mechanisms that may mitigate unfavorable dynamics in these networks.
Reading Between the Lines: A Quantitative Analysis on the Importance of Moneyline Odds in NBA Game Prediction Accuracy
(2025-04-10) Bigharassen, Malik M.; Almgren, RobertRecent developments and popularity in prediction markets, in addition to the increase in recent advertisements of sports gambling, have been very prevalent throughout 2024 and the early months of 2025. In an effort to explore whether or not these spaces could offer consistent profit comparable to other investment techniques, excluding arbitrage opportunities, this work attempts to measure the profitability of NBA sports book wagers by developing a data-driven machine learning model that measures the true probability of an NBA team winning a game on a specific night. This work is novel, in that it places greater emphasis on exploring the relationship that moneyline odds have with game outcomes. First, we perform feature engineering to expand our initial dataset, which contains historical moneyline odds alongside game outcomes, into a multitude of components that are associated with the home team's win rate. We then train our model with data of the first 41 games that each of the 30 NBA teams played in a given season, and utilize machine learning algorithms to predict the true probability a team has of winning a game. This is then compared to the implied probabilities sports book set, which is derived from their listed moneyline odds, to provide novel insights. While the algorithms achieve an average accuracy of 70%, the insight gained from attempting to measure profitability in the first half of the 2022-23 NBA season ultimately lays computational and methodological foundations for analyzing associations with moneyline odds.
Rethinking Rail: The Feasibility of Modern High-Speed Rail Projects in the U.S.
(2025-04-10) Aronow, Nicholas; Massey, William AlfredRail travel in the U.S. is extremely outdated and operates on legacy infrastructure systems, both caused by and contributing to an overreliance on alternative modes of transportation: most notably automobiles and planes. Driven in part by a rise in international initiatives, public demand for HSR expansion within the U.S. has surged. This paper seeks to identify and optimize the implementation of hypothetical HSR routes, as well as analyze why the U.S. lags behind other developed nations when it comes to advanced rail infrastructure. We use a two-level mixed integer optimization problem to identify the optimal set of lines to implement. The lower-level problem maximizes revenue across each possible rail segment. This is done by pairing a logit function, which models mode split between HSR and flying, with the total trip demand for each segment. Demand is derived from a Markov chain constructed from the DOT’s sample of 7 million annual airline tickets. The higher-level problem leverages the results from the lower-level problem to select the subset of segments that maximizes the profit from HSR upgrades under the constraint of a given budget. The results indicate that the optimal budget to realize benefits from HSR upgrades is $216 billion, yielding an annual profit of $10 billion and an ROI of 4.6%. This budget encompasses upgrades of the Portland–Seattle–Vancouver, Chicago–Detroit, Los Angeles–Las Vegas, Miami–Tampa, and Boston–New York City–Washington, D.C. (Northeast Corridor) lines. A sensitivity analysis identified construction costs, interest rates, and construction time as the dominant risks to HSR projects. A cost/benefit analysis of incremental upgrades of existing tracks to higher speeds versus new HSR lines indicated that the Northeast Corridor and a line from San Francisco–Los Angeles–Las Vegas would benefit the most from dedicated HSR. The other identified lines from the optimization problem, including additional lines in Texas and the Midwest, would yield the greatest benefit from incremental upgrades to higher speeds. This suggests that, in many cases, the additional cost of constructing new HSR lines outweighs the marginal benefits when compared to incremental upgrades to higher speeds. Even under conservative U.S. cost and demand assumptions, the results of this study show that upgrades to rail and dedicated HSR may be commercially viable transportation options. These improvements could displace millions of short-haul flights and vehicle miles traveled.
r/LinguisticPolarization: Lexical and Semantic Variation between Political Communities on Reddit
(2025-04-10) McGonigle, Evelyn R.; Ahmadi, Amir AliPolitical Polarization is a growing issue in the US, and undermines the stability of our democracy. Linguistic Polarization is the manifestation of political polarization in the language used by ideological groups, and can serve to deepen ideological divides. In this thesis, we investigate two forms of linguistic polarization: lexical polarization and semantic polarization. Lexical Polarization focuses on vocabulary differences between ideological groups, while semantic polarization captures shifts in the meanings of words. We examine four corpora of Reddit data, collected from r/democrats and r/Republican in 2019 and 2023. We use frequency and embedding-based analysis methods to characterize the language polarization in our datasets. This allows us to identify polarizing issues and political figures, and identify any communication gaps between the two sides of the ideological spectrum that may be exacerbating overall polarization.
From Policy to Patient: A Finite-Horizon Markov Decision Process for Optimizing Non-Small Cell Lung Cancer Treatment
(2025-04-10) Parikh, Krishna V.; Cattaneo, Matias DamianAdvancements in immunotherapy have transformed treatment for advanced stages of non-small cell lung cancer (NSCLC). However, optimal sequencing of chemotherapy, immunotherapy, and combination chemoimmunotherapy still remains unresearched. Chemotherapy may prime the tumor microenvironment, enhancing immune activation and, as a result, immunotherapy’s effectiveness. To explore this timing advantage, we develop a finite-horizon Markov Decision Process (MDP) to model treatment selection over a course of ten cycles. The model incorporates four clinical variables to guide decision making: toxicity, PD-L1 expression (as a proxy for immune activation), disease progression, and overall survival. Transition probabilities and survival outcomes are derived from clinical trial data, and cost is defined as a normalized ratio of burden (toxicity and disease progression) to survival. The results indicate that chemotherapy is only optimal under extreme exaggeration of its role in immune activation or when parameters like progression are eliminated. However, there is benefit in combined regimens: chemoimmunotherapy followed by immunotherapy proves optimal in all initial states of no toxicity or disease progression. When compared to the three therapies on their own, the costs for the optimal policy is significantly lower in all cases, highlighting the benefit of an adaptive treatment plan. Such can inform future clinical trial planning for NSCLC. This work is the first of its kind to integrate immunotherapy and account for dynamic immune activation, providing a novel starting point for more complex treatment optimization.