NeurIPS Discovering the Hidden Vocabulary of DALLE-2

Poster
in
Workshop: NeurIPS 2022 Workshop on Score-Based Methods

Discovering the Hidden Vocabulary of DALLE-2

Giannis Daras · Alex Dimakis

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that Apoploe vesrreaitais'' means birds andContarra ccetnxniams luryca tanniounons'' (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.

Chat is not available.

Poster in Workshop: NeurIPS 2022 Workshop on Score-Based Methods

Discovering the Hidden Vocabulary of DALLE-2

Giannis Daras · Alex Dimakis

Poster
in
Workshop: NeurIPS 2022 Workshop on Score-Based Methods