Despite promising performances, reinforcement learning (RL) is only rarely appliedwhen a high level of risk is implied. Glycemia control in type I diabetes is onesuch example: a variety of RL agents have been shown to accurately regulateinsulin delivery and yet no real life application can be seen. For such applications,managing risk is the key. In this paper, we use the evolution strategies algorithmto train a policy network for glycemia control: it has state-of-the-arts results,and recovers, without any a priori knowledge, the basics of insulin therapy andblood sugar management. We propose a way to equip the policy network withan epistemic uncertainty measure which requires no further model training. Weillustrate how this epistemic uncertainty estimate can be used to improve the safetyof the device, paving the way for real life clinical trials.