Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN).
 

Publication:

MAKE AI GREAT AGAIN: Using supervised learning to finetune LLaMA-3.2B-Instruct to Align with Donald Trump's Persona

Loading...
Thumbnail Image

Files

Frank_Alexandra.pdf (1.41 MB)

Date

2025-04-19

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

In the context of the growing field of AI safety and alignment, it is imperative to understand alignment not just at the scale of human ethics, but also at the level of the individual. The purpose of this thesis is to explore and design a replicable process to fine-tune a model aligned with Donald Trump’s persona through value alignment, rhetorical mimicry, and factual alignment—and to formalize the data acquisition and training process for persona-aligned AI models. Specifically, it aims to fine-tune a model to replicate Trump’s values, rhetoric, and factual references, while investigating how technical and political constraints shape the alignment process. The model was developed using supervised fine-tuning on Trump’s official presidential documents, speeches, and tweets, which were scraped from public sources. Question–answer pairs were generated using the GPT‑4o‑mini API and used to train LLaMA‑3.2B‑Instruct via QLoRA fine-tuning. The model was evaluated using both red-teaming prompts and a test set, with scoring based on three metrics: rhetorical alignment, value alignment, and factual accuracy. Among all models tested, the base LLaMA‑3.2B‑Instruct model—prompted to use Trump’s rhetorical style and values—performed best across all metrics. The same base model, when additionally prompted to give short responses, performed second best and comparably to the fine-tuned model. To evaluate all variants, a new scoring framework was introduced, assigning 0–5 scores across the three alignment dimensions: value alignment, rhetorical alignment, and factual accuracy. This thesis contributes to AI alignment research by demonstrating a novel process for building persona-aligned models and offering a replicable framework for evaluating how closely an AI model reflects an individual’s values, style, and factual reference behavior.

Description

Keywords

Citation