Publication: Optimizing for Interpretable Phutball Policies
Loading...
Files
Date
2025-04-17
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Phutball is an impartial, rules-light game with an arbitrarily scalable board, making it an appealing testbed for human-inspectable multi-step reasoning. This thesis introduces PhutballEnv, a turn-based Markov game environment that is fully compatible with AlphaZero-style self-play. The system ships with a logging and visual-diagnostic stack that records every board position and action, while simultaneously producing gradient-based saliency maps that highlight the board features driving each decision. These rich traces can be automatically exported as text corpora, enabling language models to be fine-tuned on plain moves, saliency-tagged positions, or synthetic rationales generated post hoc.
The document also describes a lightweight evaluation protocol that uses relative-Elo ladders against frozen checkpoints, along with a small user study assessing explanation clarity. Because full training was beyond the project’s time budget, the emphasis is on providing reliable implementations of the environment and interface, a data pipeline, and validation utilities that future work can build on at the intersection of reinforcement learning, language modeling, and interpretability.