Talk Wed, Dec 3, 2025 • 9:45 AM – 9:57 AM PST

The Importance and Fragility of CoT Monitorability

Bowen Baker

Abstract

AI systems that verbalize their “thinking” in human language offer a new, yet possibly fragile, opportunity for AI safety. We’ve found that monitoring chains of thought (CoT) can be highly effective for catching misbehavior during frontier reasoning model training and deployment. However, chain-of-thought monitorability may prove fragile in the face of increased scaling and algorithmic advancements. In this talk, we’ll discuss existing work OpenAI has done in this area and where we are looking to go moving forward.

Video

Chat is not available.