Exploring Diffusion Transformer Designs via Grafting
Juan Carlos Niebles
Abstract
In this session, Juan Carlos Niebles presents grafting, a method for exploring new diffusion transformer (DiT) architectures by directly editing pretrained models. This approach enables systematic investigation of operator and structural variations - such as replacing attention with convolution or reconfiguring block depth - without full pretraining. Experiments show that grafted models retain strong generative quality (e.g., FID 2.38–2.64 vs. 2.27 for DiT-XL/2) using under 2% of pretraining compute. The results demonstrate that pretrained DiTs can serve as a foundation for efficient architectural design and analysis.
Successful Page Load