Timezone: »
Personalization for Web-based Services using Offline Reinforcement Learning
Pavlos A Apostolopoulos · Zehui Wang · Hanson Wang · Chad Zhou · Kittipat Virochsiri · Norm Zhou · Igor Markov
Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We investigate both the sequential and non-sequential formulations, highlighting their benefits and drawbacks. In the sequential setting, we address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, compare several ML techniques, provide insights on training and evaluation of RL models, and discuss generalizations.
Author Information
Pavlos A Apostolopoulos (Meta)
Zehui Wang (Facebook)
Hanson Wang (Facebook)
Chad Zhou (Facebook)
Kittipat Virochsiri (Facebook)
Norm Zhou (Facebook)
Igor Markov (Facebook)
More from the Same Authors
-
2021 : Practical Policy Optimization with PersonalizedExperimentation »
Mia Garrard · Hanson Wang · Ben Letham · Zehui Wang · Yin Huang · Yichun Hu · Chad Zhou · Norm Zhou · Eytan Bakshy