Skip to yearly menu bar Skip to main content


Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer ⋅ Olivia Watkins ⋅ Ethan Mendes ⋅ Justin Svegliato ⋅ Luke Bailey ⋅ Tiffany Wang ⋅ Isaac Ong ⋅ Karim Elmaaroufi ⋅ Pieter Abbeel ⋅ Trevor Darrell ⋅ Alan Ritter ⋅ Stuart J Russell

Abstract

Chat is not available.