Invited Talk
in
Workshop: Workshop on Multi-Turn Interactions in Large Language Models Sat, Dec 6, 2025 • 10:30 AM – 11:00 AM PST

Invited Talk 4 - Peter Henderson

Peter Henderson

Abstract

Title: Building Better Rules and Finding Shallow Alignment in AI Agents

Abstract: We are increasingly relying on natural language specifications, or human feedback, to align agents. But getting long-horizon behaviors right under ambiguous specifications can be challenging. In this talk, we take an interdisciplinary perspective on how to lint rule specifications for potential ambiguities. We also examine how fine-tuning can induce (or undo) "shallow" alignment in AI agents.

Speaker

Peter Henderson

Video

Chat is not available.