Skip to yearly menu bar Skip to main content


PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Yuhua Jiang ⋅ Yuwen Xiong ⋅ Yufeng Yuan ⋅ Chao Xin ⋅ Wenyuan Xu ⋅ Yu Yue ⋅ Qianchuan Zhao ⋅ Lin Yan

Abstract

Chat is not available.