ParetoMIL: Early Risk Detection in Dialogue under Weak Supervision
Abstract
Large Language Models (LLMs) increasingly operate in multi-turn interactions where the cost of failure grows with delay, creating a need for turn-level risk assessment and timely alerts. Existing approaches fall short: process reward modeling presumes step-wise labels; multi-instance learning (MIL) overlooks earliness; and early classification of time series (ECTS) neglects the complex relationship between turn-level events and dialogue-level risk. We propose a novel approach that integrates MIL and ECTS to deliver controllable early alerts from weak dialogue-level supervision. A soft-MIL scorer with prefix-conditioned encodings and monotone pooling produces a non-decreasing prefix risk, while a reinforcement-learning trigger, conditioned on a control parameter, balances earliness and accuracy with a single policy that traces the Pareto frontier without retraining. Empirically, our method improves the earliness–accuracy trade-off on multi-turn dialogues compared to strong baselines.