Skip to yearly menu bar Skip to main content


Efficient Generative Multimodal Integration (EGMI): Enabling Audio Generation from Text-Image Pairs through Alignment with Large Language Models

Taemin Kim ⋅ Wooyeol Baek ⋅ Heeseok Oh

Abstract

Video

Chat is not available.