Timezone: »

Pareto-Optimal Diagnostic Policy Learning in Clinical Applications via Semi-Model-Based Deep Reinforcement Learning
zheng Yu · Yikuan Li · Joseph Kim · Kaixuan Huang · Yuan Luo · Mengdi Wang

Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects labtest panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the F1 score directly instead of the error rate. However, the F1 score cannot be written as a cumulative sum of rewards, which invalidates standard RL methods. To remedy this issue, we develop a reward-shaping approach, leveraging properties of the F1 score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained F1 score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on clinical tasks: ferritin prediction, sepsis prevention, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identify all Pareto-front solutions. Across all three tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to 85% reduction in testing cost.

Author Information

zheng Yu (Princeton University)
Yikuan Li (Northwestern University)
Joseph Kim (Cornell University)
Kaixuan Huang (Princeton University)
Yuan Luo (Northwestern University)
Mengdi Wang (Princeton University)

Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years).

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors