Towards Better Policies in Sequential Decision Making: A Robust Test for Stationarity
Presented by Zhenke Wu
Department of Biostatistics
University of Michigan
Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in sub-optimal policies learned under stationary assumptions. We propose a doubly-robust procedure for testing the stationarity assumption and detecting change points in offline RL settings, e.g., using data obtained from a completed sequentially randomized trial. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. I will use an interventional mobile health study, the largest to date in the US, to illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.
A seminar tea will be held at 11:00 a.m. in University Office Plaza, Room 116.