Skip to yearly menu bar Skip to main content

Workshop: Meaning in Context: Pragmatic Communication in Humans and Machines

The gap between QUD-based topic determination and learning-based topic extraction for NLG

Maurice Langner · Ralf Klabunde


Generated texts should not be limited to conveyed facts, but should also realize the many pragmatic aspects that make a text cohesive and coherent. Since present natural language generation (NLG) systems use learning-based methods for generation, the question comes up whether and how linguistic pragmatics -- providing elaborate theories and detailed analyses of pragmatic phenomena based on these theories -- could be considered for learning-based NLG. Using topic determination as an example we show that question-under-discussion (QUD) based theories of information structure provide deep insights on the discourse structure of texts, but that they cannot be mapped to learning approaches in a direct way. The main problem is data sparseness of QUD-based corpora, which ultimately goes back to the fact that content selection and discourse planning, the first two steps in a NLG pipeline from content determination to the final linguistic realization, concern non-linguistic content and its preparation, while deep learning methods require texts for learning the correspondences between user requests and target texts.