Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

1 Results

<<   <   Page 1 of 1   >>   >
Workshop
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Leo McKee-Reid · Joe Needham · Maria Martinez · Christoph Sträter · Mikita Balesni