Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

PokéChamp: A Human-Expert-Level Language Agent for Competitive Pokémon

datacite.rightsrestricted
dc.contributor.advisorJin, Chi
dc.contributor.authorNguyen, Andy L.
dc.date.accessioned2025-08-12T16:25:53Z
dc.date.available2025-08-12T16:25:53Z
dc.date.issued2025-04-28
dc.description.abstractWe introduce Pok\'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok\'emon battles. Built on a general framework for two-player competitive games, Pok\'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key components: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize game play history and human knowledge to reduce the search space and address partial observability. In the second phase of our research, we develop a ReAct-like framework and incorporate retrieval-augmented generation (RAG) to evaluate the efficacy of LLMs in the specialized task of competitive team generation. Notably, our frameworks requires no additional LLM training. We evaluate Pok\'eChamp in the popular Gen 9 OU format. For battling, the battling agent achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot when powered by GPT-4o, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok\'eChamp consistently outperforms the previous best LLM-based bot, Pok\'eLLMon powered by GPT-4o, with a 64% win rate. For the team generation task, the LLM agent was able to achieve high performing teams on par with a heuristic approach that specifically utilized statistical metagame usage data. These specialized tasks show the efficacy of LLMs trained only on generalized prior data, especially when given the same tools as current heuristic-based approaches and real human players. This work here led to publication (Karten et al., 2025)
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp019w0326500
dc.language.isoen_US
dc.titlePokéChamp: A Human-Expert-Level Language Agent for Competitive Pokémon
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-04-29T03:59:14.446Z
pu.contributor.authorid920306009
pu.date.classyear2025
pu.departmentElectrical and Computer Engineering
pu.minorComputer Science
pu.minorRobotics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nguyen_Andy.pdf
Size:
7.8 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download