We present a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, we show that several policy learning approaches are competitive with or outperform classical baseline approaches. In order to train these algorithms, we develop novel techniques to convert historical data into a simulator and present a collection of results that motivate this approach. We also present a model-based reinforcement learning procedure (Direct Backprop) to solve the dynamic periodic review inventory control problem by constructing a differentiable simulator. Under a variety of metrics Direct Backprop outperforms model-free RL and newsvendor baselines, in both simulations and real-world deployments.
Bio: Dhruv Madeka is a Principal Machine Learning Scientist at Amazon. His current research focuses on applying Deep Reinforcement Learning to supply chain problems. Dhruv has also worked on developing generative and supervised deep learning models for probabilistic time series forecasting. In the past - Dhruv worked in the Quantitative Research team at Bloomberg LP, developing open source tools for the Jupyter ecosystem and conducting advanced mathematical research in derivatives pricing, quantitative finance and election forecasting.