Interaction-Aware Video Narrative Generation for Short-Form Gaming Content
Abstract
The rapid growth of short-form video consumption underscores the need for a next-generation paradigm in video generation. A key challenge in this paradigm is to design models that can identify dense interaction segments and generate coherent narratives. However, existing video understanding models remain limited in capturing complex interactions, resulting in narratives that often lack coherence. Game videos, in particular, where multiple agents interact in real time to create non-linear storylines, require a deeper understanding of interaction dynamics and narrative coherence. To address this challenge, we introduce an Interaction-Aware Video Narrative Generation (IaVNG) model. Our approach first extracts key interaction segments through kernel density estimation and then produces coherent narratives to generate short-form videos. In experiments, IaVNG shows promise as a generalizable model for next-generation video generation in non-linear domains by selecting key interactions and generating coherent short-form narratives.