Workshop
NeurIPS 2023 Workshop on Machine Learning for Creativity and Design
Yingtao Tian · Tom White · Lia Coleman · Hannah Johnston
Room 252 - 254
Machine co-creativity grows continually and exponentially with machine learning, especially with the recent surge of generative models on multiple domains. This workshop, as a continuation of a long series, explores these topics, including state-of-the-art algorithms for the creation, accessibility of these models for artists, social and cultural impact, as well as actual artistic applications. This workshop is consistent of Presentations by invited speakers, presentation of selected papers and artworks, two panels and an art showcase (collaborating with the chairs of the NeurIPS Creative AI track). The goal of this workshop is to bring together researchers and artists interested in exploring the intersection of human creativity and machine learning, and to look beyond technical issues to better understand the needs of artists and creators.
Schedule
Sat 6:15 a.m. - 6:30 a.m.
|
Welcome and Introduction
(
Opening/Closing Remarks
)
>
SlidesLive Video |
🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
Invited Talk 1 - Tianwei Yin
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:00 a.m. - 7:25 a.m.
|
Invited Talk 2 - Misha Konstantinov & Daria Bakshandaeva
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:30 a.m. - 7:55 a.m.
|
Invited Talk 3 - Alexander Mordvintsev & Ettore Randazzo
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:50 a.m. - 8:30 a.m.
|
Art gallery / Coffee Break / Social
(
Break
)
>
|
🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
Sat 9:00 a.m. - 10:00 a.m.
|
Paper & Artwork Spotlight
(
Spotlight
)
>
|
🔗 |
Sat 9:00 a.m. - 9:05 a.m.
|
Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
(
Spotlight
)
>
link
SlidesLive Video In this paper, we first present the character texture generation system specifiedto Minecraft video game toward in-game application. Ours can generate face-focused texture for texture mapping tailored to 3D virtual character having cubemanifold. While existing projects or works only generate texture, proposed systemcan inverse the user-provided real image, or generate mean/random appearancefrom learned distribution. Then it can be manipulated with text-guidance usingStyleGAN and StyleCLIP. These provide the more extended user experience withenlarged freedom as a user-friendly AI-tool. Project page can be found at https://gh-bumsookim.github.io/Minecraft-ify/ |
Bumsoo Kim · Sanghyun Byun · Yonghoon Jung · Wonseop Shin · Sareer Ul Amin · Sanghyun Seo 🔗 |
Sat 9:05 a.m. - 9:10 a.m.
|
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
(
Spotlight
)
>
link
SlidesLive Video While ControlNet has enabled pretrained image diffusion models (i.e. Stable Diffusion) to take additional condition inputs, it still has the limitation of only generating images of certain sizes. To this end, we demonstrate an application of SyncDiffusion for conditional image generation, which takes advantage of synchronized joint diffusions to allow conditions of arbitrary size as input and in turn generate globally coherent images. |
Yuseung Lee · Kunho Kim · Hyunjin Kim · Minhyuk Sung 🔗 |
Sat 9:10 a.m. - 9:15 a.m.
|
Real-time Animation Generation and Control on Rigged Models via Large Language Models
(
Spotlight
)
>
SlidesLive Video We introduce a novel method for real-time animation control and generation onrigged models using natural language input. First, we embed a large languagemodel (LLM) in Unity to output structured texts that can be parsed into diverseand realistic animations. Second, we illustrate LLM’s potential to enable flexiblestate transition between existing animations. We showcase the robustness of ourapproach through qualitative results on various rigged models and motions. |
Han Huang · Fernanda De La Torre · Cathy Mengying Fang · Andrzej Banburski · Judith Amores · Jaron Lanier 🔗 |
Sat 9:15 a.m. - 9:20 a.m.
|
Envisioning Distant Worlds: Training a Latent Diffusion Model with NASA's Exoplanet Data
(
Spotlight
)
>
link
SlidesLive Video There are some 5,500 confirmed Exoplanets beyond our solar system. Though we know these planets exist, most of them are too far away for us to know what they look like. In this paper, we develop an algorithm and a model to translate any given exoplanet’s numeric data into a text prompt that can be input into a trained latent diffusion model to generate a predictive visualization of that exoplanet. This paper describes a novel approach of translating numeric data to textual descriptors formulated from prior accepted astrophysical research. These textual descrip- tions are paired with photographs and artistic visualisations from NASA’s public archives to build a training set for a latent diffusion model, which can produce new visualizations of unseen distant worlds. |
Marissa Beaty · Terence Broad 🔗 |
Sat 9:20 a.m. - 9:25 a.m.
|
LEDITS++: Limitless Image Editing using Text-to-Image Models
(
Spotlight
)
>
SlidesLive Video Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. Subsequent research efforts are aiming to exploit the capabilities of these models and leverage them for intuitive, textual image editing. However, existing methods often require time-consuming fine-tuning and lack native support for performing multiple edits simultaneously. To address these issues, we introduce LEDITS++, an efficient yet versatile technique for image editing using text-to-image models. LEDITS++ requires no tuning nor optimization, runs in a few diffusion steps, natively supports multiple simultaneous edits, inherently limits changes to relevant image regions, and is architecture agnostic. |
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting 🔗 |
Sat 9:25 a.m. - 9:30 a.m.
|
Personalized Comic Story Generation
(
Spotlight
)
>
SlidesLive Video We introduce PCSG, a diffusion-based text-to-image synthesis framework for supporting comic story generation, a domain in which authors require control over the consistency, composition, and diversity of content. To support these three requirements, PCSG has controllable plugins for (1) character consistency, (2) scene layout specification, and (3) character pose specification. The novel combination of these plugins enables users to exert fine-grained control and manifest their envisioned comic narratives with personalized characters. Our system provides flexibility which greatly improved user satisfaction in our study over existing approaches such as using MidJourney or Stable Diffusion. To further advance this field and facilitate community engagement, we will open source our code soon. |
WENXUAN PENG · Peter Schaldenbrand · Jean Oh 🔗 |
Sat 9:30 a.m. - 9:35 a.m.
|
WordArt Designer API: User-Driven Artistic Typography with Large Language Models on ModelScope
(
Spotlight
)
>
link
SlidesLive Video Driven by GPT-3.5-turbo and a suite of specialized visual models, WordArt Designer seamlessly transmutes user text prompts into stunning, semantically resonant multilingual typographic masterpieces. With user-friendly interactions, it democratizes the artistic process, inviting all, irrespective of their design expertise, to realize their creative dreams. We welcome you to explore the online demo at https://www.modelscope.cn/studios/WordArt/WordArt |
JUN-YAN HE · Zhi-Qi Cheng · Chenyang Li · Jingdong Sun · Wangmeng Xiang · Xianhui Lin · Xiaoyang Kang · Zengke Jin · Yusen Hu · Bin Luo · Yifeng Geng · Xuansong Xie · Jingren Zhou
|
Sat 9:40 a.m. - 10:00 a.m.
|
Artworks spotlight (for multiple artworks)
(
Spotlight
)
>
|
Yingtao Tian · Lia Coleman 🔗 |
Sat 10:00 a.m. - 11:00 a.m.
|
Lunch
(
Break
)
>
|
🔗 |
Sat 11:00 a.m. - 11:30 a.m.
|
Invited Talk 4 - Aleksander Holynski
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 11:30 a.m. - 11:55 a.m.
|
Invited Talk 5 - Richard Zhang
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Invited Talk 6 - Cristóbal Valenzuela
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Panel/Open Discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Art / Coffee Break / Social
(
Break
)
>
|
🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Latent Painter
(
Poster
)
>
link
Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. Latent Painter also transits one generated image to another, which can happen between images from two different sets of checkpoints. |
Shih-Chieh Su 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application
(
Poster
)
>
link
In this paper, we first present the character texture generation system specifiedto Minecraft video game toward in-game application. Ours can generate face-focused texture for texture mapping tailored to 3D virtual character having cubemanifold. While existing projects or works only generate texture, proposed systemcan inverse the user-provided real image, or generate mean/random appearancefrom learned distribution. Then it can be manipulated with text-guidance usingStyleGAN and StyleCLIP. These provide the more extended user experience withenlarged freedom as a user-friendly AI-tool. Project page can be found at https://gh-bumsookim.github.io/Minecraft-ify/ |
Bumsoo Kim · Sanghyun Byun · Yonghoon Jung · Wonseop Shin · Sareer Ul Amin · Sanghyun Seo 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
BioSpark: An End-to-end Generative System for Biological-Analogical Inspirations and Ideation
(
Poster
)
>
Nature provides a valuable source of inspirations for novel design solutions to challenging engineering problems. Yet, achieving the full potential of biological-analogical inspirations in engineering and design domains has proven difficult due to the difficulty of discovering relevant analogies and sufficiently understanding them to synthesize novel insights.Here, we introduce an end-to-end system that combines a scalable pipeline for generating biological-analogical mechanisms from nature and an interactive interface that facilitates users’ understanding and synthesis with them.Our dataset generation pipeline starts from a small seed mechanism from human experts and expands it using breadth- and depth-focused expansion prompts based on iteratively constructed taxonomic hierarchies.This approach mitigates the sparsity of data due to high cost of expert curation and the limited conceptual diversity in automated analogy generation using Large Language Models (LLMs).Furthermore, the interactive interface assists designers in recognizing and understanding the applicability of analogs to design problems through four interaction features: Explain, Compare, Combine, and Critique.Our case studies showcase the potential value of our system.We end with avenues for future research. |
Hyeonsu Kang · David Chuan-En Lin · Nikolas Martelaro · Aniket Kittur · Yin-Ying Chen · Matthew Hong 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Interactive Machine Learning for Generative Models
(
Poster
)
>
Effective control of generative media models remains a challenge for specialised generation tasks, where no suitable dataset to train a contrastive language model exists. We describe a new approach that enables users to interactively create bespoke text-to-media mappings for arbitrary media generation models, using small numbers of examples. This approach facilitates new strategies---very distinct from contrastive language pretraining approaches---for using language, e.g., high-level descriptors and modal properties, to drive media creation in creative contexts. These controls are not well served by existing methods, which commonly depend on attributes e.g., genre, style, description, to generate and steer creative outputs. |
Junichi Shimizu · Ireti Olowe · Terence Broad · Gabriel Vigliensoni · Prashanth Thattai Ravikumar · Rebecca Fiebrink 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
(
Poster
)
>
While ControlNet has enabled pretrained image diffusion models (i.e. Stable Diffusion) to take additional condition inputs, it still has the limitation of only generating images of certain sizes. To this end, we demonstrate an application of SyncDiffusion for conditional image generation, which takes advantage of synchronized joint diffusions to allow conditions of arbitrary size as input and in turn generate globally coherent images. |
Yuseung Lee · Kunho Kim · Hyunjin Kim · Minhyuk Sung 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Real-time Animation Generation and Control on Rigged Models via Large Language Models
(
Poster
)
>
We introduce a novel method for real-time animation control and generation onrigged models using natural language input. First, we embed a large languagemodel (LLM) in Unity to output structured texts that can be parsed into diverseand realistic animations. Second, we illustrate LLM’s potential to enable flexiblestate transition between existing animations. We showcase the robustness of ourapproach through qualitative results on various rigged models and motions. |
Han Huang · Fernanda De La Torre · Cathy Mengying Fang · Andrzej Banburski · Judith Amores · Jaron Lanier 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
The Interface for Symbolic Music Loop Generation Conditioned on Musical Metadata
(
Poster
)
>
We develop a web interface that generates multi-track music loop sequences. Our system takes musical metadata from users as input conditions and generates MIDI token events that can be played seamlessly. The core component, the loop generation model, is trained with loop sets that have been extracted by observing the repetitive structure of music. Also, the metadata tokens are randomly dropped to ensure flexible controllability during training. Our interface is available at https://github.com/sjhan91/loop-demo. |
Sangjun Han · Hyeongrae Ihm · Woohyung Lim 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Weaving ML with Human Aesthetic Assessments to Augment Design Space Exploration
(
Poster
)
>
People’s semantic connection with product design is an important signal that drives purchase decisions and overall satisfaction. In the concept design phase, capturing and responding to this signal is an important part of a product designer’s job. Yet in automotive design, where online experimentation is not a viable option, this process is driven by speculation about consumers' aesthetic preferences, drawing from designers’ intuition, prior experience, and domain knowledge. Our goal is to reduce the psychological distance between designers and consumers in the automotive concept design process and address potential biases (e.g., design fixation) that could emerge from it. In this work, we developed a novel framework and system that combines machine learning, human aesthetic assessments, and interface design to support designers in organizing a large space of automotive wheel designs. We hope our demo can stimulate discussions around using this framework for professional product design practice. |
Youngseung Jeon · Matthew Hong · Yin-Ying Chen · Kalani Murakami · Jonathan Li · Xiang 'Anthony' Chen · Matt Klenk 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Envisioning Distant Worlds: Training a Latent Diffusion Model with NASA's Exoplanet Data
(
Poster
)
>
link
There are some 5,500 confirmed Exoplanets beyond our solar system. Though we know these planets exist, most of them are too far away for us to know what they look like. In this paper, we develop an algorithm and a model to translate any given exoplanet’s numeric data into a text prompt that can be input into a trained latent diffusion model to generate a predictive visualization of that exoplanet. This paper describes a novel approach of translating numeric data to textual descriptors formulated from prior accepted astrophysical research. These textual descrip- tions are paired with photographs and artistic visualisations from NASA’s public archives to build a training set for a latent diffusion model, which can produce new visualizations of unseen distant worlds. |
Marissa Beaty · Terence Broad 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Zero2Story: Novel Generation Framework for Anyone
(
Poster
)
>
link
This paper explores a novel approach to collaborative storytelling where an AI generates paragraphs and provides branching options for human authors to shape the narrative in a turn-based style. It also discusses the integration of three generative AI technologies for text, image and audio within a unified platform, offering new creative possibilities across multiple media formats. |
Chansung Park · Youngbin Lee · Sangjoon Han · Jungue Lee 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
On the Distillation of Stories for Transferring Narrative Arcs in Collections of Independent Media
(
Poster
)
>
The act of telling stories is a fundamental part of what it means to be human. This work introduces the concept of narrative information, which we define to be the overlap in information space between a story and the items that compose the story. Using contrastive learning methods, we show how modern artificial neural networks can be leveraged to distill stories and extract a representation of the narrative information. We then demonstrate how evolutionary algorithms can leverage this to extract a set of narrative templates and how these templates---in tandem with a novel curve-fitting algorithm we introduce---can reorder music albums to automatically induce stories in them. In the process of doing so, we give strong statistical evidence that these narrative information templates are present in existing albums. While we experiment only with music albums here, the premises of our work extend to any form of (largely) independent media. |
Dylan Ashley · Vincent Herrmann · Zachary Friggstad · Jürgen Schmidhuber 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface
(
Poster
)
>
Text-to-image generation models have grown in popularity due to their ability to produce high-quality images from a text prompt. One use for this technology is to enable the creation of more accessible art creation software. In this paper, we document the development of an alternative user interface that reduces the typing effort needed to enter image prompts by providing suggestions from a large language model, developed through iterative design and testing within the project team. The results of this testing demonstrate how generative text models can support the accessibility of text-to-image models, enabling users with a range of abilities to create visual art. |
Atieh Taheri · Mohammad Izadi · Gururaj Shriram · Negar Rostamzadeh · Shaun Kane 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Multi-Subject Personalization
(
Poster
)
>
Creative story illustration requires a consistent interplay of multiple characters orobjects. However, conventional text-to-image models face significant challengeswhile producing images featuring multiple personalized subjects. For example, theydistort the subject rendering, or the text descriptions fail to render coherent subjectinteractions. We present Multi-Subject Personalization (MSP) to alleviate someof these challenges. We implement MSP using Stable Diffusion and assess ourapproach against other text-to-image models, showcasing its consistent generationof good-quality images representing intended subjects and interactions. |
Arushi Jain · Shubham Paliwal · Monika Sharma · Vikram Jamwal · Lovekesh Vig 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
CalliPaint: Chinese Calligraphy Inpainting with Diffusion Model
(
Poster
)
>
Chinese calligraphy can be viewed as a unique form of visual art. Recent advancements in computer vision hold significant potential for the future development of generative models in the realm of Chinese calligraphy. Nevertheless, methods of Chinese calligraphy inpainting, which can be effectively used in the art and education fields, remain relatively unexplored. In this paper, we introduce a new model that harnesses recent advancements in both Chinese calligraphy generation and image inpainting. We demonstrate that our proposed model CalliPaint can produce convincing Chinese calligraphy. |
Qisheng Liao · Zhinuo Wang · Muhammad Abdul-Mageed · Gus Xia 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
CAD-LLM: Large Language Model for CAD Generation
(
Poster
)
>
Parametric Computer-Aided Design (CAD) is the dominant paradigm for modernmechanical design. Training generative models to reason and generate parametricCAD can dramatically speed up design workflows. Pre-trained foundation modelshave shown great success in natural language processing and computer vision. The cross-domain knowledge embedded in these models holds significant potential for understanding geometry and performing complex reasoning about design. In this work, we develop generative models for CAD by leveraging pre-trained language models and apply them to manipulate engineering sketches. Our results demonstrate that models pre-trained on natural language can be finetuned on engineering sketches and achieve remarkable performance in various CAD generation scenarios. |
Sifan Wu · Amir Khasahmadi · Mor Katz · Pradeep Kumar Jayaraman · Yewen Pu · Karl Willis · Bang Liu 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Unrolling Virtual Worlds for Immersive Experiences
(
Poster
)
>
link
This research pioneers a method for generating immersive worlds, drawing inspiration from elements of vintage adventure games like Myst and employing modern text-to-image models. We explore the intricate conversion of 2D panoramas into 3D scenes using equirectangular projections, addressing the distortions in perception that occur as observers navigate within the encompassing sphere. Our approach employs a technique similar to "inpainting" to rectify distorted projections, enabling the smooth construction of locally coherent worlds. This provides extensive insight into the interrelation of technology, perception, and experiential reality within human-computer interaction. |
Aleksei Tikhonov · Anton Repushko 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Setting Switcher: Changing genre-settings in text-based game environments populated by generative agents
(
Poster
)
>
link
We have developed an LLM-based agent for manipulation of text-based game environments, and generative agents within them, to convincingly alter the genre-setting of a game with respect to pre-existing lore and in-game mechanics. We contribute a novel, tested, LLM-based agent for this purpose: a `Setting-Switcher' agent. This agent opens a range of creative applications and possibilities: our agent can be used as an ideation and productivity tool, deployed within a player focused in-game feature, and used in tandem with other state-of-the-art technologies for application in visual game environments. Our investigation has highlighted the effectiveness of LLM-powered agents beyond conventional text generation and task completion: showcasing their value in crafting coherent narratives, portraying complex characters, and facilitating emergent storytelling within game settings. |
Oliver Wood · Rebecca Fiebrink 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
LEDITS++: Limitless Image Editing using Text-to-Image Models
(
Poster
)
>
Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. Subsequent research efforts are aiming to exploit the capabilities of these models and leverage them for intuitive, textual image editing. However, existing methods often require time-consuming fine-tuning and lack native support for performing multiple edits simultaneously. To address these issues, we introduce LEDITS++, an efficient yet versatile technique for image editing using text-to-image models. LEDITS++ requires no tuning nor optimization, runs in a few diffusion steps, natively supports multiple simultaneous edits, inherently limits changes to relevant image regions, and is architecture agnostic. |
Manuel Brack · Linoy Tsban · Katharina Kornmeier · Apolinário Passos · Felix Friedrich · Patrick Schramowski · Kristian Kersting 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
V2Meow: Meowing to the Visual Beat via Music Generation
(
Poster
)
>
We propose a video-to-music generation system called V2Meow that can generate high-quality music audio for a diverse range of video input types based on a multi-stage autoregressive model, without the need to explicitly model the rhythmic or semantic video-music correspondence. Compared to previous video to music generation work, the video and text prompts are incorporated as a single stream of embedding inputs and fed into the Transformer with feature-specific adaptors. Trained on O(100K) music audio clips paired with video frames mined from in-the-wild music videos, V2Meow is competitive with previous domain-specific models when evaluated in a zero-shot manner. V2Meow can synthesize high-fidelity music audio waveform solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we verify that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality. We would like to present a demo during the workshop. |
Yue Li · Kun Su · Qingqing Huang · Dima Kuzmin · Joonseok Lee · Chris Donahue · Fei Sha · Aren Jansen · Yu Wang · Mauro Verzetti · Timo I. Denk
|
Sat 1:30 p.m. - 2:30 p.m.
|
DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
(
Poster
)
>
This paper presents DiffuseBot , a first step toward efficient automatic robotic andvirtual creature content creation. We propose using physical simulation to guide the generative process of pretrained large-scale 3D diffusion models. Diffusion models pretrained for 3D shapes provide an expressive base distribution that can effectively propose reasonable candidate geometries for soft robots. In order to sample robots in a physics-aware and performance-driven manner, we first optimize the embeddings that condition the diffusion model, skewing the sampling distribution toward better-performing robots as evaluated by our simulator. Then, we reformulate the sampling process that incorporates co-optimization over structure and control. |
Tsun-Hsuan Johnson Wang · Juntian Zheng · Pingchuan Ma · Yilun Du · Byungchul Kim · Andrew Spielberg · Josh Tenenbaum · Chuang Gan · Daniela Rus 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
An Ontology of Co-Creative AI Systems
(
Poster
)
>
The term co-creativity has been used to describe a wide variety of human-AI assemblages in which human and AI are both involved in a creative endeavor. In order to assist with disambiguating research efforts, we present an ontology of co-creative systems, focusing on how responsibilities are divided between human and AI system and the information exchanged between them. We extend Lubart’s original ontology of creativity support tools with three new categories emphasizing artificial intelligence: computer-as-subcontractor, computer-as-critic, and computer-as-teammate, some of which have sub-categorizations. |
Zhiyu Lin · Mark Riedl 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning
(
Poster
)
>
Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve this problem by extending the vocabulary of diffusion models through fine-tuning. However, even lightweight fine-tuning approaches can be prohibitively expensive to run at scale and in real-time. We introduce a method called ObjectComposer for generating compositions of multiple objects that resemble user-specified images. Our approach is training-free, leveraging the abilities of preexisting models. We build upon the recent BLIP-Diffusion model, which can generate images of single objects specified by reference images. ObjectComposer enables the consistent generation of compositions containing multiple specific objects simultaneously, all without modifying the weights of the underlying models. |
Alec Helbling · Evan Montoya · Duen Horng Chau 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
HARP: Bringing Deep Learning to the DAW with Hosted, Asynchronous, Remote Processing
(
Poster
)
>
Deep learning models have the potential to transform how artists interact with audio across a range of creative applications. While digital audio workstations (DAWs) like Logic or Pro Tools are the most popular software environment for producing audio, state-of-the-art deep learning models are typically available as Python repositories or web demonstrations (e.g., Gradio apps). Attempts to bridge this divide have focused on deploying lightweight models as DAW plug-ins that run real-time, locally on the CPU. This often requires significant modifications to the models, and precludes large compute-heavy models and alternative interaction paradigms (e.g., text-to-audio). To bring state-of-the-art models into the hands of artistic creators, we release HARP, a free Audio Random Access (ARA) plug-in for DAWs. HARP supports [h]osted, [a]synchronous, [r]emote [p]rocessing with de≈ep learning models by routing audio from the DAW through Gradio endpoints. Through HARP, Gradio-compatible models hosted on the web (e.g., on Hugging Face Spaces) can become directly useable within the DAW. Using our API, developers can define interactive controls and audio processing logic within their Gradio endpoint. A sound artist can then enter the model's URL into a dialog box on the HARP plugin and the plug-in interface will automatically populate controls, prepare routing, and render any processed audio. Thus, sound artists can create and modify audio using deep learning models in-DAW, maintaining an unbroken creative workflow. |
Hugo Flores Garcia · Christodoulos Benetatos · Patrick O'Reilly · Aldo Aguilar · Zhiyao Duan · Bryan Pardo 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Contextual Alchemy: A Framework for Enhanced Readability through Cross-Domain Entity Alignment
(
Poster
)
>
link
Prior to the development of Large Language Models (LLMs), the pursuit of creative writing or content adjustment mainly focused on tailoring tonality, style, and lexicon to suit reader preferences. In addition, there have been frameworks aimed at simplification like 'Explain it to me like I'm five' and targeted explanation like 'Explain to me like I'm a scientist'.In this work, we present Contextual Alchemy, a framework that identifies examples and its context in a document and suggests alternate examples for different topic of interest, time, and region. Consider that you are reading a document that mentions Magnavox Odyssey. Such an example does not resonate with all readers and they might lose relevance over time. Our framework aims to retrieve other replacable entities in similar context, for example, in the sports domain Reebok has faced a similar outcome to Magnavox Odyssey. In this manner, our work utilises LLMs to enhance readability by adapting entities and context within a document to align closely with varied reader interests, ensuring reading is more engaging, relatable, and factually consistent for diverse readers. |
Simra Shahid · Nikitha Srikanth · Surgan Jandial · Balaji Krishnamurthy 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
JamSketch Deep α: Towards Musical Improvisation based on Human-machine Collaboration
(
Poster
)
>
This paper describes an improvisation support system called JamSketch Deep α. Its basic concept is to allow users to specify the macro structure of melodies while the micro structure is automatically generated. Once the user draws a melodic outline as the macro structure, the system genrates a melody according to the outline. |
Tetsuro Kitahara · Akio Yonamine 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Hacking Generative Models with Differentiable Network Bending
(
Poster
)
>
In this work, we propose a method to ’hack’ generative models, pushing their outputs away from the original training distribution towards a new objective. We inject a small-scale trainable module between the intermediate layers of the model and train it for a low number of iterations, keeping the rest of the network frozen. The resulting output images display an uncanny quality, given by the tension between the original and new objectives that can be exploited for artistic purposes. |
Giacomo Aldegheri · Alina Rogalska · Ahmed Youssef · Eugenia Iofinova 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Lasagna: Layered Score Distillation for Disentangled Image Editing
(
Poster
)
>
Recent text-guided image editing methods achieve great results on a variety of edit types, however, they fail to perform edits that are underrepresented in the training data, such as relighting. Methods that involve finetuning on paired supervised data often fail to preserve the input semantics on out-of-distribution examples, especially if the amount of training data is scarce. In this paper, we propose Lasagna, a method for disentangled image editing that distills the prior of a finetuned diffusion model in a separate visual layer. Lasagna uses score distillation to learn a plausible edit and preserves the semantics of the input by restricting the layer composition function. We show that Lasagna achieves superior shading quality compared to the state-of-the-art text-guided editing methods. |
Dina Bashkirova · Arijit Ray · Rupayan Mallick · Sarah Bargal · Jianming Zhang · Ranjay Krishna · Kate Saenko 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
(
Poster
)
>
We assemble a dataset of creative commons licensed images and train a set of open diffusion models on that dataset that are competitive with Stable Diffusion 2. This task presents two challenges: high-resolution CC images 1) lack the captions necessary to train text-to-image generative models, and 2) are relatively scarce (∼70 million, compared to LAION’s ∼2 billion). In turn, we first describe telephoning, a type of transfer learning, which we use to produce a dataset of high-quality synthetic captions paired with curated CC images. Second, we propose a more efficient training recipe to explore this question of data scarcity. Third, we implement a variety of ML-systems optimizations that achieve ∼3X training speed-ups. We train multiple versions Stable Diffusion 2 (SD2), each on a differently sized subsets of LAION-2B, and find we can successfully train using <3% of LAION-2B. Our largest model, dubbed CommonCanvas, achieves comparable performance to SD2 on human evaluation, even though we only use a CC dataset that is <3% the size of LAION and synthetic captions for training. |
Aaron Gokaslan · A. Feder Cooper · Jasmine Collins · Landan Seguin · Austin Jacobson · Mihir Patel · Jonathan Frankle · Cory Stephenson · Volodymyr Kuleshov 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Personalized Comic Story Generation
(
Poster
)
>
We introduce PCSG, a diffusion-based text-to-image synthesis framework for supporting comic story generation, a domain in which authors require control over the consistency, composition, and diversity of content. To support these three requirements, PCSG has controllable plugins for (1) character consistency, (2) scene layout specification, and (3) character pose specification. The novel combination of these plugins enables users to exert fine-grained control and manifest their envisioned comic narratives with personalized characters. Our system provides flexibility which greatly improved user satisfaction in our study over existing approaches such as using MidJourney or Stable Diffusion. To further advance this field and facilitate community engagement, we will open source our code soon. |
WENXUAN PENG · Peter Schaldenbrand · Jean Oh 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
Combating the "Sameness" in AI Art: Reflections on the Interactive AI Installation Fencing Hallucination
(
Poster
)
>
link
The article summarizes three types of "sameness" issues in Artificial Intelli- gence(AI) art, each occurring at different stages of development in AI image creation tools. Through the Fencing Hallucination project, the article reflects on the design of AI art production in alleviating the sense of uniformity, maintaining the uniqueness of images from an AI image synthesizer, and enhancing the connection between the artworks and the audience. This paper endeavors to stimulate the creation of distinctive AI art by recounting the efforts and insights derived from the Fencing Hallucination project, all dedicated to addressing the issue of "sameness". |
Weihao Qiu · george legrady 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
(
Poster
)
>
Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe --- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the user's preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The combination of these features creates a novel workflow for musicians exemplifying the usefulness of systems developed with a foundation of multimodal deep learning. |
Stephen Brade · · Bryan Wang · Mauricio Sousa · Tovi Grossman · Gregory Lee Newsome 🔗 |
Sat 1:30 p.m. - 2:30 p.m.
|
WordArt Designer API: User-Driven Artistic Typography with Large Language Models on ModelScope
(
Poster
)
>
Driven by GPT-3.5-turbo and a suite of specialized visual models, WordArt Designer seamlessly transmutes user text prompts into stunning, semantically resonant multilingual typographic masterpieces. With user-friendly interactions, it democratizes the artistic process, inviting all, irrespective of their design expertise, to realize their creative dreams. We welcome you to explore the online demo at https://www.modelscope.cn/studios/WordArt/WordArt |
JUN-YAN HE · Zhi-Qi Cheng · Chenyang Li · Jingdong Sun · Wangmeng Xiang · Xianhui Lin · Xiaoyang Kang · Zengke Jin · Yusen Hu · Bin Luo · Yifeng Geng · Xuansong Xie · Jingren Zhou
|
Sat 2:30 p.m. - 2:40 p.m.
|
Conclusion
(
Opening/Closing Remarks
)
>
SlidesLive Video |
🔗 |