Skip to yearly menu bar Skip to main content


Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell

Abstract

Chat is not available.