Skip to yearly menu bar Skip to main content


CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations

Yazhou Zhang · Yang Yu · Qing Guo · Benyou Wang · Dongming Zhao · Sagar Uprety · Dawei Song · Qiuchi Li · Jing Qin

Great Hall & Hall B1+B2 (level 1) #504


Human communication has a multi-modal and multi-affection nature. The inter-relatedness of different emotions and sentiments poses a challenge to jointly detect multiple human affections with multi-modal clues. Recent advances in this field employed multi-task learning paradigms to render the inter-relatedness across tasks, but the scarcity of publicly available resources sets a limit to the potential of works. To fill this gap, we build the first Chinese Multi-modal Multi-Affection conversation (CMMA) dataset, which contains 3,000 multi-party conversations and 21,795 multi-modal utterances collected from various styles of TV-series. CMMA contains a wide variety of affection labels, including sentiment, emotion, sarcasm and humor, as well as the novel inter-correlations values between certain pairs of tasks. Moreover, it provides the topic and speaker information in conversations, which promotes better modeling of conversational context. On the dataset, we empirically analyze the influence of different data modalities and conversational contexts on different affection analysis tasks, and exhibit the practical benefit of inter-task correlations. The full dataset will be publicly available for research\footnote{}

Chat is not available.