Poster Thu, Dec 4, 2025 • 4:30 PM – 7:30 PM PST Exhibit Hall C,D,E #1602

Towards Generalizable Multi-Policy Optimization with Self-Evolution for Job Scheduling

Inguk Choi · Woo-Jin Shin · Sang-Hyun Cho · Hyun-Jung Kim

[ Poster] [ OpenReview]

Abstract

Reinforcement Learning (RL) has shown promising results in solving Job Scheduling Problems (JSPs), automatically deriving powerful dispatching rules from data without relying on expert knowledge. However, most RL-based methods train only a single decision-maker, which limits exploration capability and leaves significant room for performance improvement. Moreover, designing reward functions for different JSP variants remains a challenging and labor-intensive task. To address these limitations, we introduce a novel and generic learning framework that optimizes multiple policies sharing a common objective and a single neural network, while enabling each policy to learn specialized and diverse strategies. The model optimization process is fully guided in a self-supervised manner, eliminating the need for reward functions. In addition, we develop a training scheme that adaptively controls the imitation intensity to reflect the quality of self-labels. Experimental results show that our method effectively addresses the aforementioned challenges and significantly outperforms state-of-the-art RL methods across six JSP variants. Furthermore, our approach also demonstrates strong performance on other combinatorial optimization problems, highlighting its versatility beyond JSPs.

Video

Chat is not available.