Program Highlights »
Wed Dec 6th 07:00 -- 10:30 PM @ Pacific Ballroom Concourse #D9
Interactive-Length Multi-Task Video Captioning with Cooperative Feedback
Han Guo · Ramakanth Pasunuru · Mohit Bansal

We present a fast and accurate demo system for our state-of-the-art multi-task video captioning model, with additional interactive-length paragraph generation and cooperative user feedback techniques. The task of automatic video captioning has various applications such as assistance to a visually impaired person and improving the quality of online visual content search or retrieval. Our recent multi-task model uses auxiliary temporal video-to-video and logical premise-to-entailment generation tasks to achieve the best results on three popular community datasets. To address the lack of useful online demo systems for video captioning, we present a fast and interactive demo system of our state-of-the-art multi-task model, that allows users to upload any video file or YouTube link, with the additional novel aspect of generating multi-sentence, paragraph-style captions based on redundancy filtering (especially useful for real-world lengthy videos), where the user can ask for longer captions on the fly. Our demo system also allows for cooperative user feedback, where the user can click on a displayed alternative top-k beam option or rewrite corrections directly, providing us with valuable data for discriminative retraining.