Publication: PokéChamp: A Human-Expert-Level Language Agent for Competitive Pokémon
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Restrictions
Abstract
We introduce Pok'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok'emon battles. Built on a general framework for two-player competitive games, Pok'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key components: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize game play history and human knowledge to reduce the search space and address partial observability. In the second phase of our research, we develop a ReAct-like framework and incorporate retrieval-augmented generation (RAG) to evaluate the efficacy of LLMs in the specialized task of competitive team generation. Notably, our frameworks requires no additional LLM training. We evaluate Pok'eChamp in the popular Gen 9 OU format. For battling, the battling agent achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot when powered by GPT-4o, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok'eChamp consistently outperforms the previous best LLM-based bot, Pok'eLLMon powered by GPT-4o, with a 64% win rate. For the team generation task, the LLM agent was able to achieve high performing teams on par with a heuristic approach that specifically utilized statistical metagame usage data. These specialized tasks show the efficacy of LLMs trained only on generalized prior data, especially when given the same tools as current heuristic-based approaches and real human players. This work here led to publication (Karten et al., 2025)