firstbacksecondback
4 Results
Workshop
|
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell |
||
Workshop
|
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell |
||
Workshop
|
Fri 13:45 |
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection Jun Yan · Vikas Yadav · Shiyang Li · Lichang Chen · Zheng Tang · Hai Wang · Vijay Srinivasan · Xiang Ren · Hongxia Jin |
|
Workshop
|
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer · Olivia Watkins · Ethan Mendes · Justin Svegliato · Luke Bailey · Tiffany Wang · Isaac Ong · Karim Elmaaroufi · Pieter Abbeel · Trevor Darrell · Alan Ritter · Stuart J Russell |