Skip to yearly menu bar Skip to main content


Poster

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Xinwang Chen · Ning Liu · Yichen Zhu · Feifei Feng · Jian Tang

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications.%To reduce the computation budget of transformer-based DPMs, this work proposes the \textbf{E}fficient \textbf{D}iffusion \textbf{T}ransformer (EDT) framework.%This framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. % Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. %Our extensive experiments demonstrate the efficacy of EDT. %The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement.%For instance, our EDT-S has achieved a FID score of 34.27, which is lower than 39.5 in MDTv2-S. And EDT-S has 3.93 $\times$ faster training speed and 2.29 $\times$ faster inference speed compared to MDTv2-S.

Live content is unavailable. Log in and register to view live content