Intuitive Image Descriptions are Context-Sensitive
Shayan Hooshmand · Elisa Kreiss · Christopher Potts
Consumers of image descriptions want them to be context-sensitive, but previous crowdsourced efforts to create text from images have presented the images in isolation. We tested whether untrained crowdworkers naturally take context into account when writing image descriptions by asking them to write descriptions for images that we embedded in the first paragraph of a Wikipedia article. Our analysis shows that the produced descriptions were statistically significantly more likely to reflect contents of the article they were presented with than those of mismatched articles. These findings have implications on the extent and usefulness of training crowdworkers when developing large scale context-sensitive description corpora, as well as the development of deep learning models for automatic description generation.

Shayan Hooshmand (Columbia University)
Elisa Kreiss (Stanford University)
Christopher Potts (Stanford University)

