Operations Research and Financial Engineering, 2000-2024
Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp011r66j119j
Browse
Recent Submissions
Know When to Fold'Em: a Supervised Machine Learning Approach to Tilt Detection in Online Poker
(2025-04-10) Umar, Farouk; Hanin, BorisTilt, a psychological state where players lose emotional control, significantly affects decision-making and financial outcomes in poker. This paper addresses the challenge of detecting tilt in online poker by leveraging supervised machine learning on a publicly available repository of online hand history data. We first pre-process the data by converting the hands into chronological sequences played by each player on a given table and extract key features related to player behavior in accordance with existing research. We then propose and validate the Composite Tilt Indicator (CTI), intended to represent the likelihood that a player was tilted in a given sequence, in order to label the dataset. We then train and evaluate a supervised machine learning model to detect tilt, achieving high performance on key metrics such as precision and recall. This work contributes to poker research by providing a systematic statistical framework to approach tilt detection where previous methods have relied on subjective player testimony or facial recognition.
Optimal Execution Against Strategic Traders: A Stackelberg Mean-Field Game Formulation
(2025-04-10) Garcia-De La Jara, Christian A.; Carmona, Rene A.This thesis investigates optimal execution strategies within a predatory trading environment through the lens of a Stackelberg mean field game. Specifically, it addresses the problem faced by a distressed institutional investor (leader) forced to liquidate a significant asset position over a finite horizon, anticipating strategic reactions from a large population of high-frequency traders (HFTs), modeled collectively as followers. Extending previous models, the framework introduced here leverages mean field approximations to capture the aggregate behavior of HFTs and the hierarchical decision-making inherent in such scenarios. Under the assumption of linear price impacts consistent with the Almgren–Chriss framework, we adopt cost functionals in the spirit of Cartea–Jaimungal. Building on the probabilistic approach of Carmona and Delarue, the equilibrium dynamics are fully determined by the fixed points of coupled forward–backward stochastic differential equations (FBSDEs), which can be solved to yield an explicit open-loop feedback control. We derive analytical solutions and present numerical results alongside a sensitivity analysis. Ultimately, this thesis proposes a realistic model that can serve as a benchmark for evaluating execution strategies, whether for a large institution or for a high-frequency trading desk.
Modeling Movement in Ancient Pompeii: An Urban Network Analysis of Pedestrian Flow and Economic Change
(2025-04-10) Chen, Samantha; Holen, MargaretThis project uses network-based pedestrian modeling to investigate how movement and economic activity in ancient Pompeii were shaped by the city’s infrastructure, particularly in the aftermath of the 62–63 CE earthquake and the resumption of gladiatorial games in 64 CE. Drawing from urban planning tools, the Urban Network Analysis toolkit was applied to simulate pedestrian flows across Pompeii’s street network. A sensitivity analysis was conducted to calibrate key parameters, including detour ratio and distance-decay, to generate behaviorally realistic simulations in the absence of complete archaeological records. Citywide simulations were run on both the reconstructed pre-earthquake network and the excavated 79 CE network. Surprisingly, there was no significant difference in the distribution of pedestrian flows between the two, challenging prior hypotheses that post-earthquake changes to street connectivity drove Pompeii’s economic shift from industrial production to commerce. To assess the potential impact of game-day movement, a scenario-based simulation was conducted from the amphitheater to the city gates. This highlighted several unexcavated segments that likely experienced high foot traffic. If gladiatorial games played a role in economic reorientation, future excavations along these paths may reveal a transition from industrial to retail-based activity. Overall, this study demonstrates how adapting urban planning models to archaeological contexts can offer new frameworks for interpreting urban change and testing hypothetical reconstructions. It also emphasizes the need for further research into the influence of event-based foot traffic on economic investment patterns, as well as a re-examination of assumptions about the long-term urban impacts of the 62–63 CE earthquake.
Pardon My French: Assessing the Potential for Data Centers in Northern Quebec With Machine Learning Models
(2025-05) Berretta, Tyler R.; Kornhauser, Alain LucienThis thesis aims to explore the feasibility of data centers in Northern Quebec using machine learning models to determine feature importance of site selection factors. Specifically, Random Forests are used to learn feature importance on a large, multisource dataset of hyperscale data centers and corresponding relevant data points captured across national and regional levels through 2006-2024. SVMs, LASSO regression, and XGBoost models are used to corroborate the feature importance results of the Random Forest. Installed Solar PV Power Capacity and Internet Adoption—representing categorical features of Renewable Electricity Supply and Closeness to Customers, respectively—are determined to be robust predictors for the existence of a data center at a given location in a given year. Canada boasts a healthy supply of renewable electricity supply, with abundant hydro energy and rapid growth in nuclear energy, as well as comparable closeness to customers with large cities and proximity to American cities across the border. Correspondingly, Canadian hyperscale data centers have begun to arise across Montreal, Toronto, and Vancouver. Northern Quebec, specifically, has high potential for renewable electricity supply with favorable geographical factors for nuclear plants and cheap hydro energy. However, its rural geography lacks closeness to customers, making it currently only viable for low-latency use cases such as model training. While unlikely in the near future, advancements in technology and corresponding reductions in latency may unlock the potential for data centers in Northern Quebec.
Bayesian Adaptive Clinical Trials: A Soft Actor-Critic Reinforcement Learning Approach
(2025-04-13) Willer, Matt; Rigobon, DanielAdaptive clinical trial designs are aimed to improve efficiency and enhance ethical considerations by dynamically allocating patients to treatments based on accruing evidence. In this thesis, we formulate an adaptive clinical trial as a finite-horizon Markov Decision Process (MDP). The trial state comprises patient outcomes and Bayesian-updated treatment success probabilities, and is sequentially updated at each decision point. To solve the resulting treatment allocation decision-making problem, we implement a Soft Actor-Critic (SAC) framework that leverages maximum entropy reinforcement learning to balance exploration and exploitation effectively. To further capture this balance, we employ a weight-adjusted Total Variation Distance (TVD) component to the reward function, enabling us to quantify the value of information gathered between decision points. We conducted numerical simulations under two training schemes: one in which outcomes were generated using the true treatment success probabilities, and another where outcomes were based on the agent’s estimated probabilities. Across diverse hypothetical scenarios varying in cohort size, trial length, and prior knowledge, our SAC-based policy consistently approximated the ideal (oracle) policy in the true-probability setting. The agent was able to achieve success proportions close to that of the optimal policy while judiciously allocating more patients to the superior treatment. When the model was trained on estimated probabilities, performance degraded under high uncertainty or poorly specified priors, sometimes favoring a fixed, non-adaptive approach. Our results underscore the potential and limitations of employing SAC in adaptive trial design. Our proposed model provides a foundation for utilizing reinforcement learning in a clinical trial setting, highlighting the need for accurate prior information to fully realize its benefits. Our framework establishes a rigorous testbed for adaptive patient allocation, providing both theoretical insights and practical guidelines for future clinical trial designs.
The Gold Standard of Robust Mixed-Frequency Portfolio Optimization? A MIDAS-CVaR Application for Renewable Asset Investments
(2025-04-18) Gualy, Cristian E.; Sircar, RonnieIntegrating high‑frequency market signals with mixed‑frequency explanatory variables is essential for accurate tail risk modeling in traditionally volatile renewable energy portfolios. Characteristics heavy‑tailed return distributions and exogenous variable influence on historical prices challenge a traditional mean‑variance optimization approach. Analyzing 13 energy assets and 12 explanatory variables from 2015 to 2024 demonstrates Conditional Value-at-Risk (CVaR) optimization's superiority over traditional Markowitz frameworks in such settings. Both historical simulation and Monte Carlo analyzes reveal CVaR portfolios' superior performance across a spectrum of market conditions, establishing CVaR as the study's preferred risk measure. A Forward Search implementation enhances CVaR estimation by mitigating outlier influence while preserving essential tail information, delivering substantial risk reduction without increasing portfolio concentration. Building upon these findings, this study proposes a novel Log-Component MIDAS framework with exogenous variables (LC-MIDAS-X) that decomposes volatility into high-frequency market components and low-frequency policy/macroeconomic drivers. The LC-MIDAS-X model integrates with CVaR estimation, and empirical validation confirms the model produces accurate CVaR forecasts for renewable energy assets, with further improvements in downside risk protection. The model reveals significant differences in volatility response patterns across renewable subsectors, providing insights into how solar, wind, and infrastructure assets respond to changing economic and regulatory conditions. Portfolio construction leverages enhanced CVaR estimates to demonstrate the potential for improved risk-adjusted performance while maintaining strategic exposure to clean energy opportunities. Acknowledging that our study is limited by the absence of extensive robustness checks and an arbitrary selection of assets and variables, these constraints may temper broader generalizations. However, the findings provide valuable insights for refining mixed‑frequency tail‑risk forecasting and portfolio optimization methodologies within the renewable space.
How to Win Nodes and Influence Networks: A Multidimensional Approach to Opinion Dynamics and Influence Games
(2025-04-10) Elsheikh, Raafa A.; Rigobon, DanielIn a period where socialization is hyper-reliant on digital platforms, permitting information to spread instantaneously, understanding strategies for optimal influence is vital. This paper develops a multidimensional, threshold-based opinion dynamics model extending the work from DeGroot and Friedkin-Johnsen. Our model incorporates intertopic dependencies and external influence to model competitive diffusion over networks. We introduce a novel opinion update rule that incorporates local (neighbor) and global (external players) impact on opinion shifts. Through coupling linear threshold dynamics with traditional opinion models (FJ) and introducing intricacies of topic dependencies and multidimensional opinions, our model emulates realistic evolution of opinion and behavior. By simulating over synthetic and real-world data from the General Social Survey (GSS), we assess strategies of one and two-player models where influence is maximized. Results reveal that optimal strategies depend critically on initial opinion distribution, network topology, and the interdependence of topics. In particular, optimal strategies surface that leverage indirect influence by exploiting cross-topic relationships, and in the presence of competition, second movers gain a strategic edge. This work provides practical insight for designing self-regulating environments in polarized societies by strategically disseminating information. The implications of this research range from political campaigns, public health messaging, and ethical information diffusion.
From xG to WAR: A Comprehensive Framework for Evaluating NHL Player Value
(2025) Larson, Thomas P.; Kornhauser, Alain LucienThis thesis presents a machine learning-based Wins Above Replacement (WAR) model for NHL skaters, integrating play-by-play and shift data from the 2023–24 and 2024–25 seasons. A Random Forest classifier predicts expected goals (xG) at the shot level, capturing offensive and defensive contributions, while a team-level Random Forest regressor translates performance metrics into win probabilities. Individual player contributions are standardized per 60 minutes, compared to replacement-level baselines, and weighted using feature importances from the win model to compute WAR. The result is a single, context-aware metric that quantifies a skater’s total value in terms of added team wins.
Three Toed Pete: Examining equilibria and player behavior in a high-variance game
(2025-08-10) Deschenes, Jack; Cerenzia, MarkI study “three toed pete,” a high-variance, sequential wagering game in which players decide—over multiple rounds—whether to commit to a common pot based on private signals. I develop and compare a suite of computational methods for characterizing equilibrium behavior: (i) simulation-based grid search to identify candidate cutoff strategies; (ii) gradient- based and simulated-annealing optimizers to navigate the noisy, multi-dimensional payoff landscape; (iii) state-dependent cutoff maps that adjust to current “toe” counts and alternating move order; and (iv) backward-induction algorithms that bootstrap the t = 1 solution to solve for general t recursively. My numerical experiments confirm theoretical predictions in the two-player, one toe case, reveal how cutoff thresholds rise with increasing target toes, and demonstrate scalability to more complex, n-player settings. I also prove structural lemmas—such as the weak dominance of non-contiguous strategies—that under- pin our computational approach. Beyond game theory, my methods have direct applications to multi-round auctions, sequential bidding for large-scale contracts (e.g. Olympic host selection, pension-liability transfers, 401(k) administration), and other contexts where agents face uncertainty, risk, and dynamic strategic interaction.
An Analysis of MOVES Style Transportation in New York City
(2025-05-10) Ginder, Koby; Kornhauser, Alain LucienToday, we stand at a critical moment in the evolution of automotive technology. Driverless technology has made tremendous progress over the past decade, and driverless vehicles have begun to permeate our society. The growth of this technology and the path it takes is sure to redefine how we think about mobility. This exploration aims to introduce, simulate and test an innovative transportation style that has only recently been made possible by the strides in automotive driverless technology. This network, known as MOVES style transportation, will be analyzed in America’s most populous city: New York City. This paper will first analyze the current patterns of transportation systems in the city; by inspecting public transportation data it will show the current movement patterns of New Yorkers and visualize it. It will introduce and describe the MOVES style autonomous driving network as it will be implemented in this specific use case. It will then model and simulate the performance of this system using specialized software developed by the Princeton Department of Operations Research and Financial Engineering. Financial performance will also be discussed based on the simulated results.
Evaluating Domain-Specific Topic Reduction for Sparse Vector Document Retrieval
(2025-05-10) Irons, Carson P.; Hanin, BorisThis thesis investigates the limitations of current document retrieval systems and introduces an alternative architecture leveraging topic-level sparse indexing of contextual embeddings. This theoretical retrieval system seeks to achieve high computational efficiency through low latency and indexing overhead, while also achieving high semantic understanding and respecting local meaning and document cohesion. Additionally, the system supports scalable and context-aware document matching without reliance on user interaction data
In pursuit of these objectives, the system makes 2 key assumptions on the structure and content of documents within a chosen application domain. The first assumption is that documents can be broken into self-contained semantic components, the second assumes an ability to represent the application domain's distinct meanings as a finite, discrete set of topics.
At a high level, the proposed system aims to represent a document as a bag of topics, then apply sparse vector ranking algorithms at retrieval time. Topics are inferred by clustering the contextualized embeddings of semantic components within a learned embedding space.
The contributions of this thesis involve a review of existing retrieval methods, an outline of the proposed system's intuition and architecture, and an explorative implementation against a strategically chosen application domain. The thesis finds that standard embedding models (SBERT in this case) are insufficient for identifying application specific topics. Future work will focus on fine-tuning embedding models to better capture domain-specific semantics and fully evaluate the potential of this topic-based retrieval framework.
The thesis also provides the necessary tooling, for extension and modification of the retrieval pipeline. Namely, it supports the training and querying of the proposed retrieval system, while accepting custom implementations at each step.
Evaluating the Geographically Weighted Regression for Modeling Fertility Rates in South Korea
(2025-05-01) Cho, Sung; Cerenzia, MarkSouth Korea’s total fertility rate (TFR) has steadily declined to unprecedented levels, reaching 0.72 in 2023, which is well below the replacement level of 2.1. As this decline continues, the trend poses severe economic and demographic challenges, including rapid population aging, labor force contraction, and increasing strain on welfare systems. This thesis evaluates the effectiveness of using the Geographically Weighted Regression (GWR) to model South Korea’s TFR at the local level. In particular, we revisit the work done by Jung et al. (2019), which fitted the model on data from 2019. One aspect of the model not addressed in their paper is its use of “pseudo-t statistics,” which is a result of the model’s violation of classical OLS assumptions. To address this gap, we re-estimate both an Ordinary Least Squares (OLS) model and a GWR model using updated 2023 data across 190 administrative regions. The model’s fit is assessed using test statistics including AICc, Moran’s I, and Koenker (BP). We then implement a 5,000-iteration nonparametric bootstrap procedure to evaluate the stability of the GWR coefficient estimates, computing empirical confidence intervals and percent-opposite-sign metrics for each coefficient. The results suggest that GWR improves model fit relative to OLS, capturing meaningful spatial heterogeneity in the data which OLS does not take into account. However, the bootstrap analysis reveals instability in the coefficient estimates, casting doubt on the reliability of inference drawn from the GWR pseudo-t statistics. These findings ultimately support the use of GWR as an exploratory rather than an immediately inferential tool and underscore the spatial and statistical complexity of TFR modeling in Korea.
A Hybrid GARCH and LSTM Model for Forecasting Volatility and Investment Horizons
(2025-04-10) Le, Jason; Scheinerman, DanielAccurately forecasting financial volatility is a critical component of modern finance, underpinning tasks such as risk management, asset pricing, and portfolio optimization. However, the stochastic and dynamic nature of financial markets poses significant challenges for existing models. Econometric approaches like Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models are effective at capturing short-term volatility clustering but are limited in addressing nonlinearities and long-term dependencies in financial time series. Machine learning models such as Long Short-Term Memory (LSTM) networks can model complex patterns and sequential dependencies but often lack the interpretability and theoretical grounding of traditional econometric methods.
This thesis develops a hybrid GARCH-LSTM model designed to improve the precision of volatility forecasts by combining the strengths of both methodologies. The hybrid model uses GARCH to estimate conditional volatilities and feeds these estimates, along with historical price data, into an LSTM network for further refinement. A central application of this hybrid approach lies in solving a practical investment problem: determining the maximum time horizon an investor can remain invested without exceeding a predefined loss tolerance, given a specific confidence level.
The time horizon is estimated by combining the hybrid model's volatility forecasts with Monte Carlo simulations, which generate potential price paths based on predicted volatilities. These simulations provide a probabilistic framework for quantifying the likelihood of maintaining an investment within acceptable loss thresholds.
By focusing on optimizing investment time horizons, this thesis contributes a model for integrating advanced forecasting techniques into practical financial decision-making. Additionally, the results aim to equip investors and risk managers with tools to make informed decisions in the face of uncertainty.
A Look Into Risk and Returns: The Predictive Value of Risk Indicators in Emerging Market Equities
(2025-04-08) Sukha, Deven P.; Rigobon, DanielEmerging Markets (EMs) are known to exhibit greater volatility in risk, meaning the indicators used to track risk fluctuate more than those in developed markets. This raises an important question: does the movement of risk indicators contain information that can aid in predicting returns in EM equity markets? To address this, we focused on five types of risk—credit, financial, political, economic, and composite- using values from several financial services firms. However, we found that changes in risk scores were not correlated across providers during the period studied, leading us to utilize a single provider S&P Global for sovereign credit and the International Country Risk Guide (ICRG) for the remaining risk indicators. We also modeled equity returns using pooled and country-specific Random Forest models, incorporating a range of macroeconomic variables that are relevant for return prediction. Predictive performance was evaluated using R2 and root mean squared error (RMSE), and the contribution of risk indicators was assessed through feature importance. We trained baseline models excluding risk indicators to test whether macroeconomic factors could compensate. Our results highlight the inherent difficulty of predicting equity returns: model performance was poor across the board, with R2 values near or below zero. While models that included risk indicators performed slightly better, the improvement was marginal. These findings suggest that changes in the selected risk indicators provide limited additional predictive value under the modeling approach. However, this does not necessarily mean that such indicators generally have no predictive usefulness. One plausible explanation is that equity indices may already reflect or even precede changes in these risk metrics, making any subsequent shifts in the risk indicators appear to have little effect. Further research could investigate different lags and modeling strategies to understand whether, and under what conditions, these risk indicators might enhance equity return predictions.
A Supervised Learning Framework for Generating DJ Transitions
(2025-04-10) Hein, Michael; Hubert, EmmaA disc jockey (DJ) curates a seamless auditory experience by skillfully transitioning between tracks. While these transitions can sometimes involve complex loops and sound effects, their most fundamental components often involve manipulating volume and adjusting frequency ranges to blend two songs. Prior work on automating DJ transitions has largely relied on heuristics or unsupervised learning approaches such as generative adversarial networks (GANs). In this paper, we present a unique supervised learning framework for generating DJ transitions between two tracks, providing an interpretable, data-driven alternative to previous methods. Using a dataset from 1001Tracklists containing real DJ mixes and their source tracks, we extract mel-spectrograms of the audio and train a convolutional neural network (CNN) to predict control signals that specify how volume and equalizer (EQ) bands should change over time. These predicted control signals are then applied to the source tracks to produce a transition, which is compared to the original transition from the DJ mix. To generate labeled input-output training pairs, we developed a full preprocessing pipeline that includes track-to-mix alignment using dynamic time warping (DTW), supported by both theoretical and empirical analyses of feature selection. While inspired by differentiable digital signal processing (DDSP), our learning phase operates entirely in the mel-spectrogram domain for simplicity and interpretability. We trained the model on a single example and found that it was able to replicate the corresponding ground truth transition with reasonable accuracy, offering early evidence that the task is learnable and that our framework has the capacity to produce non-trivial transitions. This work demonstrates the potential of supervised learning in generating realistic DJ transitions and lays the foundation for future research training on more data.
A Sustainable Extension of the Fama-French Factor Models: The Role of Carbon Emissions-Based Factors in Describing U.S. Stock Returns
(2025-04-10) Huang, Elaine L.; Cattaneo, Matias DamianAmidst climate change concerns, many investors are incorporating climate-related considerations, such as a company's carbon dioxide (CO2) emissions, into their investment decisions. Unfortunately, CO2 data is often missing or estimated. Therefore, we aim to understand how companies' carbon emissions can describe --- and how sector membership and carbon disclosure can impact --- excess stock returns. We extend the Fama-French (FF) three-factor and five-factor models, which describe stock returns using financial metrics, to also include our constructed ``Green-Minus-Brown" (GMB) factors: GMB_U (based on Log(CO2) emissions), and GMB_S (based on CO2 intensity). Our results show (1) Both GMB_U and GMB_S are statistically significant and have negative associations with excess stock returns; (2) Stocks in greener sectors have more positive interactions with the GMB factors, stocks in browner sectors have more negative interactions, and sectors with less polarizing CO2 emissions tend to have statistically insignificant interactions; and (3) The returns of companies with reported CO2 data are more sensitive to changes in the GMB factors than those with estimated CO2 data. Our research supports existing literature that carbon emissions can be used to describe stock returns while being the first to build factors based on both unscaled and scaled carbon emissions and to analyze performance across sectors and CO2 data sources (i.e., estimated vs. reported). In addition, our GMB factors can be used by companies and investors alike to track the monthly spreads between the excess returns of green stocks and the excess returns of brown stocks.
A Temporal Network Approach to Modeling Quantitative Success in Venture Capital Ecosystems
(2025-04-10) Tziampazis, George E.; Akrotirianakis, IoannisThis thesis investigates how temporal network structures can predict financial success in early-stage startups. Using investment data from Pitchbook, it constructs a dynamic graph of the North American Venture Capital (NAVC) ecosystem, capturing evolving relationships between investors and startups over time. From this network, node-level features such as temporal centrality and community embedding are computed to represent each startup’s structural identity. These features are used as inputs to train an Extreme Gradient Boosting (XG- Boost) supervised ML model, to predict a binary classification target of successful exits (IPO or acquisition) or failure within a fixed time window. Results show that models incorporating temporal network features consistently outperform baselines and results from similar problems, particularly on Precision@K metrics, which are practically relevant to VC decision-making. The findings demonstrate that interpretable, time-aware network metrics can meaningfully enhance startup evaluation frameworks. This work contributes to the intersection of finance, network science, and predictive modeling, offering new tools for data-driven early-stage investment.
Arbitrage-Free and Simulation-Based Election Forecasting
(2025-04-10) O'Keefe, Edward P.; Tangpi, LudovicAfter mainstream electoral forecasts inaccurately predicted the outcome of the 2016 U.S. Presidential Election, alternative approaches to election forecasting became more prominent. Among these alternatives, prediction markets and arbitrage-free forecasting models have gained attention for offering more disciplined forecasts that can be interpreted as the price one would pay to wager on an election outcome. This thesis extends and enhances a popular-vote forecasting model developed by Fry & Burke. Specifically, our model addresses the inherent measurement errors in polling data, explicitly incorporating them into the forecasting methodology. Evaluations conducted across U.S. presidential elections from 1972 to 2024 demonstrate that this explicit consideration of polling errors significantly improves forecast accuracy. Additionally, comparisons with popular vote prediction market data from 2016 to 2024 show that prediction markets consistently underperform in forecasting outcomes for non-contested states, indicating systematic biases. To forecast electoral vote outcomes – a more complicated problem – we introduce a simulation-based approach that integrates Fry & Burke’s popular-vote forecasting techniques with Monte Carlo simulation. Our electoral forecasting method outperforms forecasts provided by Nate Silver’s FiveThirtyEight and prediction markets from 2016 to 2024. These findings underscore the effectiveness of our electoral vote forecasting model and highlight the potential biases present in electoral vote prediction markets.
Forecasting The Future: Utilizing the Statistical Jump Model to Forecast GDP Output of the US Economy
(2025-04-10) Sinarya, Charles Nelson; Hubert, EmmaThe dynamic macroeconomic environment today has placed a spotlight on the importance of economic forecasting. In the last few decades, many mathematical and econometric models have been developed to forecast economic performance; these models include the univariate and multivariate Autoregression models. One particularly interesting model is called the Markov Switching model, which falls under a subset of models called regime-switching models. Regime-switching models are models whose parameters depend on a series of homogeneous regimes. However, Markov switching models have many fundamental drawbacks, most notably its time-varying transition probabilities. Recently, a lot of research has been dedicated towards another regime-switching model called the Statistical Jump Model. The Statistical Jump Model minimizes an objective function, which consists of a loss function and a penalization term, and builds upon the Markov Switching Model. In the past, the Statistical Jump Model has been geared towards equity markets. This thesis seeks to explore the applicability of the Statistical Jump Model in forecasting US economic performance. We propose three methodologies to forecast US GDP using the Statistical Jump Model, which includes the utilization of boosting methods and external economic indicators to create a well-informed forecast. We conclude by benchmarking the our proposed models against common models in the literature.
Inferences on Parameters in Severely Heterogeneous Degree Corrected Stochastic Block Models
(2025-04-10) Jiang, Stephen C.; Fan, JianqingWith the rise of big data, networks have pervaded many aspects of our daily lives, with applications ranging from the social to natural sciences. Understanding the latent structure of network is thus an important question. In this paper, we model the network using a Degree-Corrected Mixed Membership (DCMM) model, in which every node
has an intrinsic membership vector measuring its belonging to one of communities. Our central aim is to construct inferential procedures for the probability matrix and degree parameters and of the DCMM, an often overlooked question in the literature. By providing new procedures, we empower practicioners to answer various ranking and dynamics questions related to networks. These questions may prove to be impactful, as they may aid in identifying non-tradiational canditates for targeted therapies and detecting subtle shifts within network time series, among other applications. At the end of our work, for example, we present an application for detecting changepoints in real-world global trade networks, revealing a significant changepoint that concides with the year corresponding to the global financial crisis.