`

Timezone: »

 
Poster
Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes
HyunJi Alex Nam · Scotty Fleming Fleming · Emma Brunskill

Fri Dec 10 08:30 AM -- 10:00 AM (PST) @ None #None

Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment. We design and analyze algorithms for reinforcement learning (RL) in Action-Contingent Noiselessly Observable MDPs (ACNO-MDPs), a special class of POMDPs in which the agent can choose to either (1) fully observe the state at a cost and then act; or (2) act without any immediate observation information, relying on past observations to infer the underlying state. ACNO-MDPs arise frequently in important real-world application domains like healthcare, in which clinicians must balance the value of information gleaned from medical tests (e.g., blood-based biomarkers) with the costs of gathering that information (e.g., the costs of labor and materials required to administer such tests). We develop a PAC RL algorithm for tabular ACNO-MDPs that provides substantially tighter bounds, compared to generic POMDP-RL algorithms, on the total number of episodes exhibiting worse than near-optimal performance. For continuous-state, continuous-action ACNO-MDPs, we propose a novel method of incorporating observation information that, when coupled with modern RL algorithms, yields significantly faster learning compared to other POMDP-RL algorithms in several simulated environments.

Author Information

HyunJi Alex Nam (Stanford University)
Scotty Fleming Fleming (Stanford University)
Emma Brunskill (Stanford University)

More from the Same Authors