Optimizing for Interpretable Phutball Policies

Jin, ChiSixkiller, Kalen S.2025-08-122025-08-122025-04-17https://theses-dissertations.princeton.edu/handle/88435/dsp018w32r907sPhutball is an impartial, rules-light game with an arbitrarily scalable board, making it an appealing testbed for human-inspectable multi-step reasoning. This thesis introduces PhutballEnv, a turn-based Markov game environment that is fully compatible with AlphaZero-style self-play. The system ships with a logging and visual-diagnostic stack that records every board position and action, while simultaneously producing gradient-based saliency maps that highlight the board features driving each decision. These rich traces can be automatically exported as text corpora, enabling language models to be fine-tuned on plain moves, saliency-tagged positions, or synthetic rationales generated post hoc. The document also describes a lightweight evaluation protocol that uses relative-Elo ladders against frozen checkpoints, along with a small user study assessing explanation clarity. Because full training was beyond the project’s time budget, the emphasis is on providing reliable implementations of the environment and interface, a data pipeline, and validation utilities that future work can build on at the intersection of reinforcement learning, language modeling, and interpretability.en-USOptimizing for Interpretable Phutball PoliciesPrinceton University Senior Theses