firstbacksecondback
5 Results
Workshop
|
AI Sandbagging: Language Models can Selectively Underperform on Evaluations Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward |
||
Workshop
|
The Elicitation Game: Stress-Testing Capability Elicitation Techniques Felix Hofstätter · Jayden Teoh · Teun van der Weij · Francis Ward |
||
Workshop
|
AI Sandbagging: Language Models can Selectively Underperform on Evaluations Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward |
||
Workshop
|
Sandbag Detection through Model Impairment Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes |
||
Workshop
|
Sandbag Detection through Model Impairment Cameron Tice · Philipp Kreer · Nathan Helm-Burger · Prithviraj Singh Shahani · Fedor Ryzhenkov · Teun van der Weij · Felix Hofstätter · Jacob Haimes |