Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN). If you are not part of the University requesting a copy of a thesis, please note, all requests are processed manually by staff and will require additional time to process.
 

Publication:

Bayesian Adaptive Clinical Trials: A Soft Actor-Critic Reinforcement Learning Approach

Loading...
Thumbnail Image

Files

Matthew_Willer_Thesis_Absolutely_Final_Version.pdf (4.25 MB)

Date

2025-04-13

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Adaptive clinical trial designs are aimed to improve efficiency and enhance ethical considerations by dynamically allocating patients to treatments based on accruing evidence. In this thesis, we formulate an adaptive clinical trial as a finite-horizon Markov Decision Process (MDP). The trial state comprises patient outcomes and Bayesian-updated treatment success probabilities, and is sequentially updated at each decision point. To solve the resulting treatment allocation decision-making problem, we implement a Soft Actor-Critic (SAC) framework that leverages maximum entropy reinforcement learning to balance exploration and exploitation effectively. To further capture this balance, we employ a weight-adjusted Total Variation Distance (TVD) component to the reward function, enabling us to quantify the value of information gathered between decision points. We conducted numerical simulations under two training schemes: one in which outcomes were generated using the true treatment success probabilities, and another where outcomes were based on the agent’s estimated probabilities. Across diverse hypothetical scenarios varying in cohort size, trial length, and prior knowledge, our SAC-based policy consistently approximated the ideal (oracle) policy in the true-probability setting. The agent was able to achieve success proportions close to that of the optimal policy while judiciously allocating more patients to the superior treatment. When the model was trained on estimated probabilities, performance degraded under high uncertainty or poorly specified priors, sometimes favoring a fixed, non-adaptive approach. Our results underscore the potential and limitations of employing SAC in adaptive trial design. Our proposed model provides a foundation for utilizing reinforcement learning in a clinical trial setting, highlighting the need for accurate prior information to fully realize its benefits. Our framework establishes a rigorous testbed for adaptive patient allocation, providing both theoretical insights and practical guidelines for future clinical trial designs.

Description

Keywords

Citation