Computer Science, 1987-2025

Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp01mp48sc83w

Browse

Now showing 1 - 20 of 58

A Monte-Carlo Hearts Engine

(2025-04) Bendory, Eden R.; Kincaid, Zachary
The card game Hearts is a stochastic, sequential, non-zero sum, 4-player, partial information game. Such qualities of the game prevent standard game algorithms from finding optimal play in reasonable time. The Monte-Carlo tree search partially addresses this issue by offering an approximation of the payoff resulting from optimal play, so that every potential move in the game does not have to be searched for an action’s value to be evaluated. However, a standard Monte-Carlo tree search does not address imperfect information, stochastic, or N-Player games. My approach aims to close this gap by integrating other algorithms such as maxn [2] and Monte-Carlo sampling [3] to address these aspects of the game that Monte-Carlo tree search does not. The combination of these techniques results in a Hearts engine that is able to beat many base-level algorithms, existing Hearts engines, and advanced human Hearts players.
InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics

(2025-04-07) Wachspress, Benjamin; Fong, Ruth Catherine
In recent years, hate speech has risen at an alarming rate, underscoring the urgent need for effective content moderation systems to ensure the safety of online spaces. However, mounting political pressure from the new Trump administration, coupled with widespread skepticism about the reliability of hate speech classification models, has led many social media platforms to significantly reduce their moderation efforts. This thesis investigates the weaknesses and vulnerabilities of three hate speech detection models - logistic regression, SVM, and BERT - on Twitter posts. It explores how these models distinguish hate speech from offensive or neutral language, with a particular focus on the impact of slurs and references to gender, race, and sexuality on classification outcomes. The findings reveal three key insights: (1) While BERT achieves the highest overall accuracy (82%), all models struggle to differentiate hate speech from offensive language. (2) All models also exhibit a clear bias against classifying even blatant misogyny as hate speech. (3) Model performance deteriorates significantly when encountering text that differs from the training data. As the issue of online hate speech continues to escalate, it is crucial that we improve the ability of hate speech detection systems to identify and mitigate the most harmful online discourse.
QuickCase: The AI-Powered Legal Editing Assistant

(2025-04-08) Drapkin, Matthew B.; Singh, Jaswinder Pal
“Law Review” is one of the most sought after extracurriculars for students in law school. Following a competitive application process, each member serves as an editor for their respective legal journal. While prestigious, the actual work required of editors can be tedious, mundane, and repetitive. Many editors feel legal editing does not improve skills critical to their success as lawyers. Worse, the editing process can exceed tens of hours for a single assignment. The most ambitious students are incentivized to exchange their time for a more impressive resume. These students deserve to get their time back, so that they can realign their focus towards what actually matters.

QuickCase, The AI-Powered Legal Editing Assistant, transforms a week-long editing process into one that can be completed in a single sitting. The process is simple: Upload your draft to catch any Bluebook formatting mistakes, gather all referenced sources at once, and find where each source adequately supports the claims made in the manuscript. In this paper, we review current market trends in legal tech, the theory behind QuickCase’s machine learning implementation, and early user feedback from a cohort of user testers from The University of Pennsylvania Carey Law School. In addition to software iteration based on user feedback, future work will focus on marketing and product distribution, specifically targeting institutions like law schools and independent legal journals.
The Price of Bias: A Study of Gender Bias Mitigation Techniques in Financial Loan Decision-Making

(2025-04-09) Lin, Jessica; Li, Xiaoyan
As machine learning becomes increasingly used in financial decision-making, concerns about algorithmic fairness, particularly regarding gender bias, are growing. This thesis evaluates the effectiveness of four common bias mitigation techniques - Reweighing, Learning Fair Representations (LFR), Equality of Odds, and Reject Option-Based Classification - across multiple supervised learning models and under two distinct gender regimes. Rather than use a dataset that is uniformly biased, this study uses real-world HMDA data to investigate how different definitions of fairness may reveal or obscure bias. The study finds that while predictive accuracy remains relatively stable, withholding gender as a variable often improves both accuracy and fairness outcomes. Notably, debiasing techniques produce mixed results: Reweighing and Equality of Odds reduce difference in means (DIM) in outcome significantly, but standard fairness metrics often remain unchanged, raising questions about the adequacy of current fairness definitions. This study also finds that the tested models replicate the original discrepancies in loan approval rates between men and women, and the debiasing techniques are largely unable to improve this gap. These findings highlight the need for lenders to consider multiple fairness dimensions beyond widely accepted metrics and suggest further research into more holistic definitions of fairness in machine learning.
Punishing Memory, Rewarding Amnesia: Direct Preference Optimization as a Framework for Mitigating Undesirable LLM Memorization

(2025-04-10) Jeong, Jonathan J.; Stewart, Brandon Michael
Large Language models have revolutionized the machine learning and Artificial Intelligence field. With the advent of the transformer architecture in 2017, companies and governments across the world have invested trillions of dollars into Language Models. Along with the level of investment, there have been rapid scalings of language model sizes. While the numbers are not official, GPT-3.5 released by OpenAI is reported to be around 3.5 billion parameters. On the other hand, the more recent Llama 4 Behometh is reported to have more than 2 trlilion parameters. [17] Although language models have become larger, more efficient, and higher performing, there is a consistent problem of memorization. Language models across all sizes memorize and regurgitate their training data. While memorization is necessary for language models to learn basic facts and reasoning, there are concerns that undesirable memorization hurts data privacy, creates copyright issues, and compromises output quality. Many papers have been written attempting to mitigate LLM memorization. However, many of the efforts have been focused on extensive preprocessing of datasets or postprocessing of outputs. Methods that focus on altering the training often degrade generation performance significantly. We propose mitigating LLM memorization using Direct Preference Optimization (DPO), which is a newer alternative to the standard Reinforcement Learning with Human Feedback (RLHF) and Proximal Policy Optimization (PPO) framework. With this framework, we reward non-memorized outputs and punish memorized outputs. We hypothesize that this preference framework will incentivize LLMs to generalize better without relying on verbatim reproduction of training data, mitigating memorization concerns without significantly degrading overall performance. We find that Direct Preference Optimization is a viable framework, mitigating memorization rates of a fine-tuned language model by an average of 88.46% over different temperatures. We also find that there is no significant degradation in performance after performing DPO training while maintaining performance
VocalSep: High-Resolution Target Speaker Extraction

(2025-04-10) Eggert, Sam; Finkelstein, Adam
Target Speech Separation (TSE) is the task of isolating an individual speakers from an auditory scene composed of a mixture of multiple speakers and environmental noise. Recent models in the larger field of audio source separation have achieved impressive performance utilizing convolutional neural networks. These models vary in their use cases, from isolating individual instruments in music to more general-use models capable of separating based on a language query (text description). Impressive performance has also been achieved by recent “voice encoder” models capable of creating useful representations of the characteristics of a speaker’s voice. This thesis seeks to combine the methods of recent audio source separation and voice encoder models to isolate individual voices from complex auditory scenes containing multiple speakers and environmental noise. While previous TSE models have succeeded in extracting individual voices from an auditory scene, they can only be used on low sample rate audio that captures frequencies less than half the human-audible range. In this work, I introduce VocalSep, a high resolution TSE model that uses a short audio prompt of a target speaker to recognize and extract their voice from noisy audio mixtures containing multiple speakers.
Improving Depth Completion With Optimization-Guided Neural Iterations For Better Robustness

(2025-04-10) Yang, Willow; Deng, Jia; Zuo, Yiming
Depth completion is a computer vision task of generating a dense depth map by predicting the missing or uncertain parts from an RGB image and a sparse depth map. Some of the current depth completion models lack the ability to generalize across diverse scenarios, such as sparsity in depth map or outdoor settings. In this thesis, we discuss the novel solution OMNI-DC, which handles sparse depth maps of varying densities, and is robust to scenarios including indoor, outdoor and urban settings. I also discuss the specific contributions that I have towards OMNI-DC, including experimenting with multi-res DDI variants, implementing gradient matching loss, 3D visualizer, and generating visualizations and running experiments.
A Comparison of Model Predictive Control and Reinforcement Learning Methods for Building Energy Storage Management

(2025-04-10) Toh, Yi Jin; Eysenbach, Benjamin
The residential building sector is a major contributor to energy consumption and greenhouse gas emissions, making electrification and intelligent energy management essential for decarbonization. However, increased electricity demand can strain the power grid, leading to higher costs and emissions. Demand-side flexibility, enabled by on-site power generation, energy storage, and optimized control algorithms, can mitigate this problem by shifting electricity consumption to times when electricity is cheaper and cleaner.

This study evaluates three methods for centralized building energy storage management using CityLearn, an open-source environment for simulating and benchmarking building energy control. The evaluation compares Model Predictive Control (MPC) with two Reinforcement Learning (RL) methods: Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO). The methods are assessed across three dimensions: (1) energy performance, including cost, carbon emissions, electricity consumption, and stability of electricity use over time; (2) computational efficiency, including training time, memory usage, and inference speed; and (3) scalability, measured across different district sizes of two, four, and eight buildings.

Overall, SAC achieved the strongest performance on cost and energy metrics, performing slightly better than PPO in those areas. PPO, however, produced smoother control behavior with more stable electricity use over time while requiring significantly less memory than SAC and less computation than MPC. Both RL methods outperformed MPC across most metrics, with MPC particularly struggling to scale. Nonetheless, MPC remained more interpretable and required no training data, though it involved substantial engineering effort to develop an accurate system model.

These findings highlight trade-offs between performance, stability, and deployability. PPO emerged as the most balanced controller, offering strong performance with scalability and computational efficiency, making it well-suited for real-world use.
Construction and Evaluation of Celltype-Specific Protein-Protein Interaction Networks

(2025-04-10) Mhrous, Emmanuel N.; Zhong, Ellen
Protein-protein interaction (PPI) networks serve as critical tools for probing the molecular mechanisms that define function and connect genotype to phenotype, yet context-agnostic PPI databases fail to capture the celltype-specific contexts in which these interactions occur. This thesis addresses this limitation by integrating single-cell RNA sequencing (scRNA-seq) data into these context-agnostic human PPI networks using two dominant methods in the literature: SCINET (parametric) and PINN (non-parametric). Using a dataset of dopaminergic midbrain neurons implicated in Parkinson's disease, we construct networks with these methods and offer ways to evaluate these networks to ensure they preserve important properties at multiple scales of biology. These include evaluations at the level of functional protein complexes, pathways, celltype-specific processes, and systematic interactions within tissues. Our analyses show that genes implicated in Parkinson's Disease play a significant role in the topology of their respective networks, highlighting the essentiality of these proteins. Furthermore, we construct contextual embeddings using PINNACLE, a graph neural network model for single-cell biology, to represent proteins at a systems-level scale. Despite limitations inherent to PPI representations of biological processes, this thesis emphasizes the importance of context-specificity in these networks, compares different methods of their construction, and offers a robust system of evaluations that show the strengths of different construction methods at various dimensions of biology.
Celebrating Excellence, Equally? A Quantitative Analysis of Social Media Posts during the 2024 Paris Olympics and Paralympics

(2025-04-10) Toujas-Bernaté, Clara L.; Fellbaum, Christiane Dorothea
This study examines X posts from the 2024 Paris Olympics and Paralympics using natural language processing (NLP) techniques to conduct a comparative analysis of public discourse. While much existing work has focused on language surrounding the Olympics, studies on the Paralympics remain scarce, and none provide a direct comparison of public perception between the two events. This research addresses this gap by applying both topical and sentiment analysis through a diverse set of NLP methods, including Latent Dirichlet Allocation (LDA) for topic modeling, word frequency analysis, Word2Vec for contextual word relationships, and Valence Aware Dictionary and sEntiment Reasoner (VADER) for sentiment classification and temporal trends. Six datasets are created, consisting of X posts from the Olympic and Paralympic Games, covering both English- and French-language discussions, as well as posts from the general public and the o fficial Olympic and Paralympic X accounts. By analyzing differences in language and sentiment across these datasets, this study explores how perceptions of the two global sporting events vary across cultures and between public discourse and institutional narratives.
Where Latent Meets Spatial: Cross-Modal Learning Between scRNA-seq and Proteomics

(2025-04-10) Hathwar, Jairam J.; Pritykin, Yuri
Understanding the spatial and molecular heterogeneity of tissues is integral to advancing precision medicine. Here, we present an unsupervised integration framework that bridges single-cell (SC) RNA sequencing and spatial proteomics (SP) CODEX data, focusing on liver hepatocellular carcinoma (HCC). By combining these two weakly linked modalities, we surpass the resolution limits of conventional spot-based spatial transcriptomics (ST) approaches and gain a more nuanced view of cellular organization in tumor tissues. Our pipeline builds on the MaxFuse algorithm, enhanced with biologically grounded receptor-ligand (R-L) interactions derived from SC data using CellPhoneDB. This modification improves cross-modal alignment: F-1 score increased from 0.66 to 0.69, and Adjusted Rand Index from 0.78 to 0.83. Importantly, the inferred SC pseudo-Visium spots constructed demonstrate robust Spearman correlation with their corresponding Visium ST data, validating the fidelity of our approach. Moreover, by mapping RNA readouts onto microenvironments identified via SP and the SPACE-GM method, we reveal distinct spatially organized niches with contrasting enrichment patterns--such as immune-rich regions with inflammatory signaling, stromal areas with hypoxia and mTORC1 activity, and epithelial zones showing metabolic reprogramming--supporting the accuracy of our integration. Beyond these insights, the SC-to-SP mapping provides superior spatial granularity relative to traditional spot-level deconvolution methods like Tangram, thereby enabling finer delineation of molecular heterogeneity. Moving forward, we aim to refine hyperparameter settings, incorporate gene expression-adjusted R-L interaction effects, and extend this strategy to diverse tissue types. By accurately resolving subcellular interactions and microenvironmental structure, our computational pipeline holds promise for guiding target identification and novel therapeutic strategies in translational cancer research.
Court v. Classifier: A Data-Driven Evaluation of Language and Decision-Making on the U.S. Supreme Court

(2025-04-10) Lee, Erin; Kernighan, Brian W.
This thesis investigates the language, behavior, and decision-making of U.S. Supreme Court justices through a computational lens. Grounding my study in structured and curated datasets—including justice- and case-level variables, authored opinions, and over 1,600 transcribed oral arguments—I analyze how justices speak, write, and vote.

I begin with an empirical study of voting patterns, opinion authorship, and judicial trends across natural court eras. I then turn to oral argument behavior, quantifying the participation of justices across alignments and outcomes. Building on these insights, I implement a series of predictive classifiers, replicating and extending a previous statistical model to include oral argument features. While the inclusion of these features yields modest and at times inconclusive improvements in accuracy, they underscore the complexity of predicting voting patterns based on oral argument behavior, given the distinct rhetorical styles and engagement patterns of individual justices. Nonetheless, the findings allude to promising directions for future modeling of case outcomes using alternative features derived from oral arguments.

Finally, I experiment with prompting large language models (LLMs) to classify tones of judicial questioning due to the limitations of more traditional natural language processing techniques. I also simulate justice voting behavior with LLMs on unseen cases, assessing the capabilities of generative AI for legal reasoning. Through our experimentation, the LLMs proved to be limited in their capacity for legal judgement, though they also demonstrate opportunity to be better leveraged when provided additional guidance through fine-tuning.

Altogether, this study offers a data-driven portrait of the Supreme Court and its justices, rooted in empirical data and powered by modern machine learning methods.
Computational Models of Goal Inference in Open-Ended Domains

(2025-04-10) Siegel, Zachary S.; Griffiths, Tom
How are people able to quickly infer the goals of others when observing just a few of their actions? This work investigates the cognitive mechanisms underlying human goal inference by building and evaluating computational models that capture how people make these goal inferences in unstructured domains. Drawing on the frame- work of Bayesian inference, I formalize the idea that people interpret others’ actions by determining how consistent their observed evidence is with given goal hypotheses. I built a cooking domain called Recipe-Graph and ran human experiments to under- stand how well human predictions agree with those of our computational models. I find that people’s goal predictions correlate with those of our models, suggesting that Bayesian inference can capture how people predict the goals of others around them.
Reimagining Home: A New Home Interface Framework for the Apple Vision Pro

(2025-04-10) Kim, Irene; Reinfurt, David; Abtahi, Parastoo
With the rise of AR/VR technologies, we are shifting from screen-based computing to spatial computing. In this context, the interface is no longer bounded behind a screen but exists within our space. This thesis questions what it means to design an interface for a space, and reimagines the Apple Vision Pro’s Home View to propose an answer. While the Vision Pro introduces innovative user experiences, its current home screen interface remains rooted in two-dimensional conventions: a window con- sisting of a multi-page grid of flat application icons. Drawing from Apple’s design legacy of simplicity, playfulness, and deference, this thesis introduces a new Home interface framework of two key components: a new visual library of tactile and play- ful 3D application icons and an immersive home space that includes a volumetric App Library and custom interaction model to bring applications to life in the user’s physical space. The resulting interface is one that emphasizes play and personaliza- tion. The interface is evaluated through both a heuristic analysis and scenario-based walkthroughs; through these evaluations we find that the interface’s strength lie in its spatial freedom and user autonomy, and possess opportunities of improvement through diversifying system feedback mechanisms and including a user onboarding. This interface aims to propose a framework for future spatial interfaces and through it, encourage more efforts for research and exploration in spatial UI/UX design.
Bridging Physical and Digital Spaces through AR Zone Triggers in Capybara

(2025-04-10) Mak, Tinney; Monroy-Hernandez, Andres
Augmented Reality (AR) is uniquely positioned to bridge the virtual and physical worlds, enabling new forms of creative expression and interaction. In this work, we introduce the touches zone block in Capybara, an AR-based visual programming environment, that allows children to define spatial regions in their surroundings and program behaviors when virtual characters enter those areas. This feature expands on Capybara’s block-based system by providing a more flexible and intuitive alternative to predefined object detection. We demonstrate the expressive potential of zones through a series of user studies with 20 children in the United States and Argentina. Our findings suggest that zones support rich storytelling tied to physical space, encourage embodied exploration, and enable spatial reasoning through trial-and-error debugging. Participants also proposed future directions such as scanning custom objects, more intelligent interaction with the environment, and designing goal-driven narratives. These insights highlight how spatial triggers like zones can support more active, creative, and personally meaningful AR experiences for children.
Handwritten Chinese Error Correction for Learners of Chinese as a Foreign Language

(2025-04-10) Chan, Emilio; Fong, Ruth Catherine
Handwritten Chinese character error correction (HCCEC) is the process by which machine-learning models assess an image of a handwritten Chinese character, determine whether or not it is written incorrectly, and if it written incorrectly, output the character that the writer intended to write. HCCEC has gained more attention in recent years, but so far no work has been done to assess or create models targeted towards learners of Chinese as a foreign language (CFL learners). CFL learners stand to gain a great deal from HCCEC. An effective HCCEC model would be an effective educational tool to help CFL learners learn and practice handwriting Chinese characters. As part of this work, a dataset containing handwritten Chinese characters produced by CFL learners was created that contains both correctly written Chinese characters and incorrectly written Chinese characters. Next, an existing HCCEC model called the Tree-structure Analysis Network (TAN) is trained on a large dataset containing characters written by middle school students in China and then evaluated on test sets of the CFL learner dataset (Li et al. 2023, Li et al. 2023). Finally, TAN is fine-tuned using the training and validation sets of the CFL learner dataset and re-evaluated on the test sets. While performance on key evaluation metrics does not reach that of previous work on different datasets, this work does show that fine-tuning HCCEC models using data produced by CFL learners can improve all key metrics when evaluating the model on characters written by CFL learners (Hu et al. 2023, Li et al. 2023). It is my hope that this work can be the first of many exploring the potential of HCCEC applied to characters written by learners of Chinese as a foreign language.
SpectraLDS: Distilling Spectral Filters into Constant-Time Recurrent Models

(2025-04-10) Fortgang, Shlomo T.; Hazan, Elad
We introduce the first provable method for learning a symmetric linear dynamical system of arbitrarily high effective memory. This allows us to distill the convolutional layers in a leading hybrid state space model, FlashSTU, into O(1) linear dynamical systems, merging Transformer and RNN architectures in a manner suitable for scaling and with application to language modeling and other sequential processing tasks.
Cosmic Computation: Applying an Astrophysics Lens to COS 126 Assignments

(2025-04-10) Slisher, Alex; Moretti, Christopher M.
Over the last couple of years, Princeton has seen a rise in department-specific, introductory computer science (COS) courses offered. This rise demonstrates a potential interest in subject-specific alternatives to COS 126, the introductory course in the department. However, it is unclear if these courses serve as an adequate alternative to COS 126 for students interested in continuing the COS sequence. As a result, I propose astrophysics adaptations of COS 126 assignments and fully implement three of them. Full-scale implementation includes an assignment specification document, an assignment rubric, a sample solution, and student scaffolding code. After surveying nine students and two COS department instructors, responses indicate that the modified assignments were clear, engaging, and similarly challenging to current COS 126 assignments. Further research should explore how assignments can be adapted using other fields of study.
Machine Learning Classification of Biblical Translations Across Languages and Literary Genres

(2025-04-10) Coen, Caroline A.; Moretti, Christopher M.
For many the world over, the Bible is a foundational source of authority, one that is vital to understand. Yet the Bible is also the most translated text in history, and decisions made by translators are hugely impactful on our understanding of what we read. An important factor that goes into the translation of any text is its genre; Bible translations must take into consideration Biblical genre. While there are many ways to evaluate translation styles and efforts have been made to provide translators with resources to translate accurately, consistency across translations within Biblical genres is one that has not been deeply studied. We aim to make an initial contribution to this area of research by approaching Biblical genre from a quantitative angle. By training and testing logistic regression, multiclass regression with catboost, and random forest models on 33 different translations of the Bible in 4 different languages, we will understand not only which model is best suited to the task of classifying verses based on Biblical genre, but we will also determine whether differences in Biblical genre are distinctive enough to be quantifiably recognizable. This research will allow us to set a foundation for future research on the impact of translation methodology and language of translation on understanding Biblical genre.
The Semantic Predictability of Grammatical Gender: A Computational Exploration of Linguistic Relativity in Four European Languages

(2025-04-10) Sinha, Isha; Fellbaum, Christiane Dorothea

Browse

Browsing Computer Science, 1987-2025 by Issue Date