Wang Zhang: What's Now and Next for veRL?
Abstract
The veRL framework is a high-performance, open-source infrastructure solution for scalable RL post-training for Large Language Models (LLMs). Its core HybridFlow architecture is designed to solve the complexity of RLHF by strategically separating concerns: algorithm researchers use the single-controller paradigm for rapid logic development, while ML engineers leverage the multi-controller system for optimal, large-scale distributed execution. This unique approach enables state-of-the-art training throughput by decoupling the intricacies of distributed systems from the core research loop, accelerating iteration and deployment.
To enhance efficiency and developer experience, we will be exploring features next year, including structural improvements—extracting verl-core and adding a service mode for key components—and experimental advanced infrastructure. This includes exploring solutions for robust RDMA-native communication (e.g., integrating Monarch from PyTorch) and adding JAX as a unified compute option. Finally, we will explore developing a universal asynchronous policy framework for customizable, high-efficiency RL.