Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

5 Results

<<   <   Page 1 of 1   >>   >
Workshop
AI Sandbagging: Language Models can Selectively Underperform on Evaluations
Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward
Workshop
The Elicitation Game: Stress-Testing Capability Elicitation Techniques
Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward
Workshop
AI Sandbagging: Language Models can Selectively Underperform on Evaluations
Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward
Workshop
Sandbag Detection through Model Impairment
Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes
Workshop
Sandbag Detection through Model Impairment
Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes