Computer Science, 1987-2025

Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp01mp48sc83w

Browse

Now showing 1 - 20 of 57

All About Discourse Particles in Gen Z Text Messages... lol

(2025-04-10) Liu, Michelle; Kalin, Laura; Chazelle, Bernard
Whether it’s with our closest BFFs or long-distance lovers, texting has become an integral part of how we communicate with each other in the 21st century. As it becomes an increasingly powerful substitute for chatting in real life, it also begins to naturally develop advanced mechanisms for communicating nuances that have not previously existed in standard writing systems. Discourse particles like lol, lmao, and emojis don’t just fill space — they are a modern way to convey the subtleties of body language, facial expression, and tone. In this paper, we evaluate a corpus of about 200,000 real text messages from one speaker and investigate the typology, pragmatics, and modifiers of these particles. By connecting these phenomena to existing linguistic theories of discourse particles, variation, and digital communication, we explore how texting language has evolved to fit the fast-paced, screen-based communication we engage in every day, becoming a de- tailed, blossoming structured system of its own that is a fruitful field for extensive exploration.
LLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages

(2025-05-04) Agarwal, Tara; Kshirsagar, Mihir
The Constitution of India designates English as the language of jurisprudence in the Supreme and High Courts, thereby excluding those without English proficiency—the majority of the country’s population—from meaningful participation in legal discourse. To address the dual challenges of linguistic inaccessibility and legal complexity, this thesis investigates a range of large language model (LLM) configurations for summarizing English-language legal judgments into Hindi, India’s most widely spoken language. Recognizing the limitations of traditional rule-based evaluation metrics for cross-lingual tasks, we introduce a task-aligned, interpretable evaluation suite. Key components include pairwise BERTScores to assess output homogeneity, a question-answering framework (LLM-as-a-judge) to measure faithfulness, and named entity preservation metrics as proxies for legal precision. Consistent with prior work, we observe that the summarize-then-translate pipeline outperforms direct end- to-end generation. Surprisingly, across both paradigms, one-shot prompting results in performance declines relative to its zero-shot counterpart for LLaMA 3.1 8B and Qwen 2.5 7B alike. Accompanied by increased homogeneity and extractiveness, this behavior indicates that providing a judgment-summary example encourages stylistic imitation at the cost of information coverage. We find that decoder-only models, despite generating less extractive summaries, achieve substantial gains over baseline ROUGE and BERTScores, contrary to the presumed trade-off between abstractiveness and faithfulness. Finally, our question-answering framework reveals that models are prone to error when reproducing the court’s reasoning. Despite favorable scores from embedding- and overlap-based metrics, this demonstrates that current LLMs fall short of the factuality required for high-stakes legal summarization tasks. Nonetheless, our task-aligned evaluation suite serves as an important institutional readiness check for public-facing deployment, mirrors how humans evaluate summaries, and yields deeper insight into model behavior.
Designing for Efficiency: Enhancing the Gig Driver Experience with Driver’s Seat

(2025-04-29) Ahmed, Abani; Monroy-Hernandez, Andres
Driving for apps like Uber, Lyft, and DoorDash has become a central part of the modern-day gig economy. Yet, drivers for these apps often struggle to make ends meet, stay safe, and feel respected, whether due to inefficient routing, long unpaid wait times, information asymmetry (e.g., withheld route details), or difficulty accurately tracking earnings from multiple platforms. In 2019, the Driver’s Seat Cooperative launched an app called Driver's Seat to help drivers make informed decisions by giving them access to their own data in an industry that often limits driver control—features included mileage tracking across multiple platforms, crowd sourced earnings per hour data, and expense tracking. Since then, the app has lacked regular updates, faces usability challenges, and no longer fully meets the evolving needs of its users.

This study proposes a redesign of Driver’s Seat—through targeted features and a more user-friendly interface—to better support gig drivers. We conducted interviews with both current users of Driver’s Seat and drivers who had never used the app, identifying key needs such as access to reliable data, personalized insights, and support for safer working conditions. Based on these findings, we created wire-frames incorporating features that could address the identified needs such as an incident dashboard, a driver resource map, and more individualized insights. We then evaluated these features in workshops and additional semi-structured interviews with drivers. We also consulted with a data ethics expert to assess the broader implications of data sharing and visibility in safety reporting systems. This feedback informed our usability priorities and helped shape future directions for the app’s development, to ensure the redesign effectively supports their daily driving routines.
Garbage Upstream, Garbage Downstream: Diagnosing Embedding Model Failures in Yorùbá NLP

(2025-04-27) Aliu, Aminah O.; Dieng, Adji Bousso
Embedding models, which map text or other data to a point in vector space, form the backbone of many modern Natural Language Processing (NLP) tasks, including Machine Translation (MT), Question-Answering (QA), and Named Entity Recognition (NER). While an abundance of data and Machine Learning (ML) tools exist for NLP Tasks in English, the same cannot be said for low-resource languages. A low-resource language is one that lacks the online data or technical-linguistic tools necessary to effectively train ML models. In particular, Yorùbá is a low-resource African language for which embedding model availability is limited. This scarcity presents a bottleneck across African NLP development efforts, as access to quality embeddings affects multiple downstream tasks. Through application of the Vendiscope, a tool capable of analyzing the composition of data at scale, I uncover insight into presently available Yorùbá-friendly embedding models. Further analysis reveals implicit assumptions within ML development which should be mitigated in future African NLP work
LLM Supervised ReAct Agents for Travel Planning

(2025-04-10) Arnold, Christian T.; Bangalore, Srinivas
Constraint driven travel planning, easy for humans, remains difficult for AI systems. We examine the TravelPlanner benchmark provided by Xie et al, and provide the new ManagerAgent architecture for travel planning. We find a preliminary 5% success rate across “Easy” queries, an improvement from 0% success rates reported from other models and frameworks. This thesis details the ManagerAgent for travel planning, providing an in-depth implementation description. Code is available at github.com/ctarnold/TravelPlanner.
Exploring “biaoqing” Through Clustering: Computational Insights Into Chinese Digital Culture

(2025-04-10) Cao, Guanyi; Li, Xiaoyan
Memes have become a central aspect of online culture, serving as a medium for individuals to express sentiments, humor, and social commentary in creative and accessible ways. They appear in Chinese social media as biǎoqíng (表情), literally meaning “facial expression,” widely utilized and characterized by their incorporation of cultural references, idiomatic expressions, and conveyance of subtle social messages. This thesis will examine current popular biaoqing based on textual and visual content in an effort to understand how biaoqing is used to comment on current events and daily lives, as well as providing a valuable glimpse into the mentality and sentiments of China’s younger generation, who are the primary creators and consumers of biaoqing. The analysis will be performed through image and word clustering methods conducted on memes collected from Weibo (one of the largest social media platforms in China) and fabiaoqing.com, a site with an extensive collection of trending memes sourced from various Chinese social media platforms. Through analyzing clustering results and dataset trends, this research finds that biaoqing reflects the emotional and social realities of China’s younger generations, often shaped by humor, irony, and online subcultures. It serves not only as a form of expression and coping mechanism, but also as a key component of developing digital communication and internet culture.
Design and Evaluation of a Modular Architecture To Assess LLMs in Summarizing Electronic Health Records

(2025-05-02) Chen, Rachel; Kaplan, Alan
Low health literacy affects nearly half of Americans and poses a major barrier to effective healthcare, particularly when interpreting complex electronic health records (EHRs). While large language models (LLMs) offer promising capabilities for simplifying medical information, little research has explored their performance on personalized patient data. This thesis presents a modular framework for evaluating five state-of-the-art LLMs—GPT-4o-mini, Gemini 2.0 Flash, Claude 3.7 Sonnet, DeepSeek V3, and MiniMax-01 Text—on their ability to generate readable, patient-friendly summaries of structured EHRs. Using synthetic data from the Synthea dataset and prompts targeting sixth-grade (AMA) and eighth-grade (NIH) reading levels, the framework measures outputs using quantitative readability metrics, including Flesch-Kincaid, SMOG, and Gunning Fog scores. The results reveal significant variation in tone, complexity, and prompt adherence across models. GPT-4o-mini consistently produced the most readable summaries, while Claude struggled with prompt sensitivity and cost-effectiveness. The findings highlight the importance of prompt engineering, context length, and model choice in improving health communication. This work contributes a replicable evaluation pipeline and underscores the potential of LLMs to enhance health literacy and patient empowerment
Handwritten Chinese Error Correction for Learners of Chinese as a Foreign Language

(2025-04-10) Chan, Emilio; Fong, Ruth Catherine
Handwritten Chinese character error correction (HCCEC) is the process by which machine-learning models assess an image of a handwritten Chinese character, determine whether or not it is written incorrectly, and if it written incorrectly, output the character that the writer intended to write. HCCEC has gained more attention in recent years, but so far no work has been done to assess or create models targeted towards learners of Chinese as a foreign language (CFL learners). CFL learners stand to gain a great deal from HCCEC. An effective HCCEC model would be an effective educational tool to help CFL learners learn and practice handwriting Chinese characters. As part of this work, a dataset containing handwritten Chinese characters produced by CFL learners was created that contains both correctly written Chinese characters and incorrectly written Chinese characters. Next, an existing HCCEC model called the Tree-structure Analysis Network (TAN) is trained on a large dataset containing characters written by middle school students in China and then evaluated on test sets of the CFL learner dataset (Li et al. 2023, Li et al. 2023). Finally, TAN is fine-tuned using the training and validation sets of the CFL learner dataset and re-evaluated on the test sets. While performance on key evaluation metrics does not reach that of previous work on different datasets, this work does show that fine-tuning HCCEC models using data produced by CFL learners can improve all key metrics when evaluating the model on characters written by CFL learners (Hu et al. 2023, Li et al. 2023). It is my hope that this work can be the first of many exploring the potential of HCCEC applied to characters written by learners of Chinese as a foreign language.
Mixed Messages: Computational Approaches to Cross-Corpus Comparison of Chinese Media

(2025-04-10) Cheng, Brandon; Fellbaum, Christiane Dorothea; Truex, Rory
The Chinese government strategically leverages its state-controlled media to shape both domestic and international perspectives on key issues. However, there exists limited literature comparing China's domestic media objectives with its international media objectives. First, this thesis introduces the media objective elicitation model, in which government media objectives can be uncovered by contrasting domestic and international messaging. Computational methods of cross-corpus comparison are then applied to a novel corpus of nearly 1 million articles written in the home and overseas edition of the China Daily between 2020 and 2024, a broadly-circulated newspaper controlled by the Chinese Communist Party. First, BERTopic and cosine-based topic alignment are used to discover differences in the types of content included. Then, aspect-collocate and aspect-based sentiment analysis are used to characterize differences in how certain topics are framed. Overall, this thesis discovers evidence that (1) bribery convictions of high-level officials are excluded from the home edition, (2) the overseas edition serves as a shield by responding to Western criticism while omitting Western viewpoints in the home edition, and (3) the home edition portrays Western and Taiwanese politicians far more negatively than the overseas edition. In addition, this thesis illustrates the effectiveness of a computational approach to cross-corpus comparison of news media.
Machine Learning Classification of Biblical Translations Across Languages and Literary Genres

(2025-04-10) Coen, Caroline A.; Moretti, Christopher M.
For many the world over, the Bible is a foundational source of authority, one that is vital to understand. Yet the Bible is also the most translated text in history, and decisions made by translators are hugely impactful on our understanding of what we read. An important factor that goes into the translation of any text is its genre; Bible translations must take into consideration Biblical genre. While there are many ways to evaluate translation styles and efforts have been made to provide translators with resources to translate accurately, consistency across translations within Biblical genres is one that has not been deeply studied. We aim to make an initial contribution to this area of research by approaching Biblical genre from a quantitative angle. By training and testing logistic regression, multiclass regression with catboost, and random forest models on 33 different translations of the Bible in 4 different languages, we will understand not only which model is best suited to the task of classifying verses based on Biblical genre, but we will also determine whether differences in Biblical genre are distinctive enough to be quantifiably recognizable. This research will allow us to set a foundation for future research on the impact of translation methodology and language of translation on understanding Biblical genre.
Algorithmic Auditing under Data Access Mandates: A risk limiting framework for third party evaluations of AI fairness

(2025-04-27) DeLucia, Lacey Rose L.; Liu, Lydia Tingruo
As AI systems become more prominent in decision-making for domains such as employment and advertising, ensuring fairness in these models is increasingly important. In this work, we design a black-box, risk-limiting audit framework for assessing fairness with the four-fifths rule. Inspired by election auditing techniques and sequential hypothesis testing, we propose two-group and multi-group algorithms that maintain risk-limiting guarantees and stop early for innocent models. Unlike fixed-sample methods, our approach evaluates fairness while continuously sampling, allowing auditors to repeatedly request more data when needed. We demonstrate the effectiveness of our algorithms through empirical evaluations on real-world employment datasets collected for New York City's Local Law 144. The audits detect fairness violations correctly 100% of the time and verify fairness after sampling on average 66% of the data in the multi-group setting. Our approach enables third-party auditors to efficiently and confidently evaluate fairness claims, even in settings with limited transparency.
QuickCase: The AI-Powered Legal Editing Assistant

(2025-04-08) Drapkin, Matthew B.; Singh, Jaswinder Pal
“Law Review” is one of the most sought after extracurriculars for students in law school. Following a competitive application process, each member serves as an editor for their respective legal journal. While prestigious, the actual work required of editors can be tedious, mundane, and repetitive. Many editors feel legal editing does not improve skills critical to their success as lawyers. Worse, the editing process can exceed tens of hours for a single assignment. The most ambitious students are incentivized to exchange their time for a more impressive resume. These students deserve to get their time back, so that they can realign their focus towards what actually matters.

QuickCase, The AI-Powered Legal Editing Assistant, transforms a week-long editing process into one that can be completed in a single sitting. The process is simple: Upload your draft to catch any Bluebook formatting mistakes, gather all referenced sources at once, and find where each source adequately supports the claims made in the manuscript. In this paper, we review current market trends in legal tech, the theory behind QuickCase’s machine learning implementation, and early user feedback from a cohort of user testers from The University of Pennsylvania Carey Law School. In addition to software iteration based on user feedback, future work will focus on marketing and product distribution, specifically targeting institutions like law schools and independent legal journals.
SpectraLDS: Distilling Spectral Filters into Constant-Time Recurrent Models

(2025-04-10) Fortgang, Shlomo T.; Hazan, Elad
We introduce the first provable method for learning a symmetric linear dynamical system of arbitrarily high effective memory. This allows us to distill the convolutional layers in a leading hybrid state space model, FlashSTU, into O(1) linear dynamical systems, merging Transformer and RNN architectures in a manner suitable for scaling and with application to language modeling and other sequential processing tasks.
SPADE: A Synthetic Paired Dataset for Specular-Diffuse Video Decomposition

(2025-04-10) Barrett, Matthew W.; Fong, Ruth Catherine
Computer vision systems struggle with specular highlights—bright spots that obscure underlying visual information—yet video-based removal methods remain unexplored due to the absence of temporally consistent training data. This thesis demonstrates that incorporating temporal information significantly improves highlight removal quality and consistency, addressing a critical gap in computational photography. I introduce SPADE, the first dataset of paired specular-diffuse video sequences, created through controlled synthetic rendering of 250 objects under varied conditions. An ablation study comparing frame-based and sequence-based neural architectures quantifies temporal processing benefits: the temporal model achieves 16.2% higher PSNR, 10.2% better SSIM, and 2.0% improved temporal consistency. Material analysis reveals these improvements are most pronounced for metallic surfaces and moderate camera movements. Beyond highlight removal, this work establishes a paradigm for leveraging temporal information in appearance decomposition tasks, with applications in augmented reality, film production, and medical imaging.
Where Latent Meets Spatial: Cross-Modal Learning Between scRNA-seq and Proteomics

(2025-04-10) Hathwar, Jairam J.; Pritykin, Yuri
Understanding the spatial and molecular heterogeneity of tissues is integral to advancing precision medicine. Here, we present an unsupervised integration framework that bridges single-cell (SC) RNA sequencing and spatial proteomics (SP) CODEX data, focusing on liver hepatocellular carcinoma (HCC). By combining these two weakly linked modalities, we surpass the resolution limits of conventional spot-based spatial transcriptomics (ST) approaches and gain a more nuanced view of cellular organization in tumor tissues. Our pipeline builds on the MaxFuse algorithm, enhanced with biologically grounded receptor-ligand (R-L) interactions derived from SC data using CellPhoneDB. This modification improves cross-modal alignment: F-1 score increased from 0.66 to 0.69, and Adjusted Rand Index from 0.78 to 0.83. Importantly, the inferred SC pseudo-Visium spots constructed demonstrate robust Spearman correlation with their corresponding Visium ST data, validating the fidelity of our approach. Moreover, by mapping RNA readouts onto microenvironments identified via SP and the SPACE-GM method, we reveal distinct spatially organized niches with contrasting enrichment patterns--such as immune-rich regions with inflammatory signaling, stromal areas with hypoxia and mTORC1 activity, and epithelial zones showing metabolic reprogramming--supporting the accuracy of our integration. Beyond these insights, the SC-to-SP mapping provides superior spatial granularity relative to traditional spot-level deconvolution methods like Tangram, thereby enabling finer delineation of molecular heterogeneity. Moving forward, we aim to refine hyperparameter settings, incorporate gene expression-adjusted R-L interaction effects, and extend this strategy to diverse tissue types. By accurately resolving subcellular interactions and microenvironmental structure, our computational pipeline holds promise for guiding target identification and novel therapeutic strategies in translational cancer research.
A Comparative Study of Syntax and Word Usage Between Standard French and Cameroonian French Using Natural Language Processing

(2025-04-10) Hines, Julia R.; Fellbaum, Christiane Dorothea
This study uses natural language processing (NLP) techniques to analyze the syntactic and lexical differences between Standard French and Cameroonian French, as well as examine how the dialect evolves when used by the Cameroonian diaspora in France. The central methodology involves training and evaluating two distinct NLP models: one fine-tuned on a corpus of Standard French, and the other on Cameroonian French. The LSTM model, on the other hand, outperformed the Logistic Regression model in all key metrics, including accuracy, precision, recall, and F1-score. The results of this study illustrate the limitations of traditional NLP methods, such as logistic regression, when applied to dialects with syntactical and linguistic differences, and they highlight the potential of deep learning approaches to better handle these variations. The findings point to the importance of fostering linguistic diversity within computational models.
Punishing Memory, Rewarding Amnesia: Direct Preference Optimization as a Framework for Mitigating Undesirable LLM Memorization

(2025-04-10) Jeong, Jonathan J.; Stewart, Brandon Michael
Large Language models have revolutionized the machine learning and Artificial Intelligence field. With the advent of the transformer architecture in 2017, companies and governments across the world have invested trillions of dollars into Language Models. Along with the level of investment, there have been rapid scalings of language model sizes. While the numbers are not official, GPT-3.5 released by OpenAI is reported to be around 3.5 billion parameters. On the other hand, the more recent Llama 4 Behometh is reported to have more than 2 trlilion parameters. [17] Although language models have become larger, more efficient, and higher performing, there is a consistent problem of memorization. Language models across all sizes memorize and regurgitate their training data. While memorization is necessary for language models to learn basic facts and reasoning, there are concerns that undesirable memorization hurts data privacy, creates copyright issues, and compromises output quality. Many papers have been written attempting to mitigate LLM memorization. However, many of the efforts have been focused on extensive preprocessing of datasets or postprocessing of outputs. Methods that focus on altering the training often degrade generation performance significantly. We propose mitigating LLM memorization using Direct Preference Optimization (DPO), which is a newer alternative to the standard Reinforcement Learning with Human Feedback (RLHF) and Proximal Policy Optimization (PPO) framework. With this framework, we reward non-memorized outputs and punish memorized outputs. We hypothesize that this preference framework will incentivize LLMs to generalize better without relying on verbatim reproduction of training data, mitigating memorization concerns without significantly degrading overall performance. We find that Direct Preference Optimization is a viable framework, mitigating memorization rates of a fine-tuned language model by an average of 88.46% over different temperatures. We also find that there is no significant degradation in performance after performing DPO training while maintaining performance
The Hunt for Data Leakage Reviews: Using LLMs to Automate Academic Paper Screening

(2025-04-27) Jerdee, Alexandra; Narayanan, Arvind
Machine learning (ML) techniques have been increasingly implemented across diverse fields, and many suffer from a set methodological errors called data leakage. Some scholars describe this wave of incorrect ML executions as a "reproducibility crisis." However, the pervasiveness of machine learning pitfalls has not been robustly measured, and the task of finding erroneous papers is difficult due to the diverse language to describe ML across disciplines. This thesis project leverages large language models (LLMs) to build a systematic search pipeline to find papers with data leakage and help to quantify the scale of erroneous ML practices. The pipeline uses LLMs to answer questions using abstract text and full-text of academic papers, filtering from a set of 5 million papers down to 1000 papers. In this process, we double the number of known papers affected by data leakage, and point towards thousands more. This provides a proof of concept of large-scale LLM-based search pipelines, and contributes substantial evidence for the existence of a "reproducibility crisis" in machine learning.
Beyond Algorithms: Autonomous Agentic Systems for Personalized Recommendations

(2025-04-10) Khalid, Roshaan; Adams, Ryan P.
This paper explores generative AI-based agents for autonomous, personalized content recommendations, utilizing state-of-the-art software for high-performance custom workflows, high-dimensional vector storage and searching, language-based tasks, and autonomous 24/7 running capabilities. Using unstructured, unseen, real-time data from YouTube, we utilize large language models to quantitatively handle subjective tasks and evaluate the outcomes. In essence, we created a recommendation system that uses artificial intelligence to autonomously find content and reduce the time spent on manual search. Content recommendation is a prominent problem in the industry, and we find that the performance of our system is satisfactory, and the scope of such systems is substantial. If used in correlation with default recommendation systems, the system can provide an improved interactive recommendation experience.
Reimagining Home: A New Home Interface Framework for the Apple Vision Pro

(2025-04-10) Kim, Irene; Reinfurt, David; Abtahi, Parastoo
With the rise of AR/VR technologies, we are shifting from screen-based computing to spatial computing. In this context, the interface is no longer bounded behind a screen but exists within our space. This thesis questions what it means to design an interface for a space, and reimagines the Apple Vision Pro’s Home View to propose an answer. While the Vision Pro introduces innovative user experiences, its current home screen interface remains rooted in two-dimensional conventions: a window con- sisting of a multi-page grid of flat application icons. Drawing from Apple’s design legacy of simplicity, playfulness, and deference, this thesis introduces a new Home interface framework of two key components: a new visual library of tactile and play- ful 3D application icons and an immersive home space that includes a volumetric App Library and custom interaction model to bring applications to life in the user’s physical space. The resulting interface is one that emphasizes play and personaliza- tion. The interface is evaluated through both a heuristic analysis and scenario-based walkthroughs; through these evaluations we find that the interface’s strength lie in its spatial freedom and user autonomy, and possess opportunities of improvement through diversifying system feedback mechanisms and including a user onboarding. This interface aims to propose a framework for future spatial interfaces and through it, encourage more efforts for research and exploration in spatial UI/UX design.

Browse

Recent Submissions