NeurIPS Expo Demonstration SwiftEdit: Fast Text-guided Image Editing via One-step Diffusion on a Mobile Device

Expo Demonstration

SwiftEdit: Fast Text-guided Image Editing via One-step Diffusion on a Mobile Device

Ron Tindall

Upper Level Room 29A-D

[ Abstract ]

Tue 2 Dec noon PST — 3 p.m. PST

Abstract:

In this demo, we show an on-device inference of our one-step diffusion image editing model (SwiftEdit) [1] that performs interactive image editing based on the user’s source image and text prompt, running on an Android smartphone powered by Qualcomm Technologies’ latest Snapdragon Mobile Platform. On A100 GPUs, this technique can run in real-time with 0.23s per single edit operation. We expect SwiftEdit to perform each edit operation in seconds on the smartphone, demonstrating efficient and responsive on-device diffusion inference. x000D
x000D
Scientific Challenge that we tackle x000D
x000D
Existing text-guided image editing methods fell short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we developed SwiftEdit that performed image editing using just one-step inversion and one-step image reconstruction. x000D
x000D
Efficiently running SwiftEdit requires concurrently on-boarding multiple deep models, including IP-Adapter (Vision Encoder and Image Projection), SwiftBrush (U-Net, VAE, Text Encoder), and SwiftBrush-based Inversion Network. This poses significant challenges for efficient execution and inter-module communication, while enabling an interactive image editing experience for the user — with all computation performed entirely on the edge device.

Chat is not available.