Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects labtest panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the F1 score directly instead of the error rate. However, the F1 score cannot be written as a cumulative sum of rewards, which invalidates standard RL methods. To remedy this issue, we develop a reward-shaping approach, leveraging properties of the F1 score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained F1 score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on clinical tasks: ferritin prediction, sepsis prevention, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identify all Pareto-front solutions. Across all three tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to 85% reduction in testing cost.