Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Women in Machine Learning

Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models

Yik Siu Chan · Yong Zheng-Xin · Stephen Bach

Abstract

Log in and register to view live content