Timezone: »

Representation Balancing MDPs for Off-policy Policy Evaluation
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill

Wed Dec 05 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #123

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.

Author Information

Yao Liu (Stanford University)
Omer Gottesman (Harvard University)
Aniruddh Raghu (Massachusetts Institute of Technology)
Matthieu Komorowski (Imperial College London / MIT)

I hold full board certification in anesthesiology and critical care in both France and the UK. A former medical research fellow at the European Space Agency, I completed a Master of Research in Biomedical Engineering at Imperial College London. I currently pursue a PhD at Imperial College and a research fellowship in intensive care at Charing Cross Hospital in London, supervised by Professor Anthony Gordon and Dr Aldo Faisal. A visiting scholar at the Laboratory for Computational Physiology at MIT, I collaborate with the MIT Critical Data group (Professor Leo Celi) on numerous projects involving secondary analysis of healthcare records. My research brings together my expertise in machine learning and critical care to generate new medical evidence and build decision support systems. My particular interest is sepsis, the number one killer in intensive care and the single most expensive condition treated in hospitals.

Aldo Faisal (Imperial College London)
Finale Doshi-Velez (Harvard)
Emma Brunskill (Stanford University)

More from the Same Authors