Creative AI Session
Creative AI Session 1
Upper Level Room 29A-D
Marcelo Coelho · Luba Elliott · Priya Prakash · Yingtao Tian
AI as intermediary in modern-day ritual: An immersive, interactive production of the roller disco musical Xanadu at UCLA
Mira Winick · Naisha Agarwal · Chiheb Boussema · Ingrid Lee · Camilo Vargas · Jeff Burke
Interfaces for contemporary large language, generative media, and perception AI models are often engineered for single user interaction. We investigate ritual as a design scaffold for developing collaborative, multi-user human–AI engagement. We consider the specific case of an immersive staging of the musical Xanadu performed at UCLA in Spring 2025. During a two-week run, over five hundred audience members contributed sketches and jazzercise moves that vision language models translated to virtual scenery elements and from choreographic prompts. This paper discusses four facets of interaction-as-ritual within the show: audience input as offerings that AI transforms into components of the ritual; performers as ritual guides, demonstrating how to interact with technology and sorting audience members into cohorts; AI systems as instruments "played" by the humans, in which sensing, generative components, and stagecraft create systems that can be mastered over time; and reciprocity of interaction, in which the show's AI machinery guides human behavior as well as being guided by humans, completing a human–AI feedback loop that visibly reshapes the virtual world. Ritual served as a frame for integrating linear narrative, character identity, music and interaction. The production explored how AI systems can support group creativity and play, addressing a critical gap in prevailing single user AI design paradigms.
AI is Misled by GenAI: Stylistic Bias in Automated Assessment of Creativity in Large Language Models
Marek Urban · Petra Kmoníčková · Kamila Urban
Outputs from large language models (LLMs) are often rated as highly original yet show low variability (or greater homogeneity) compared to human responses, a pattern we refer to as the *LLM creativity paradox*. Yet, prior work suggests that assessments of originality and variability may reflect stylistic features of LLM outputs rather than underlying conceptual novelty. The goal of the present study was to investigate this issue using outputs from seven distinct LLMs on a modified Alternative Uses Task. We scored verbatim and "humanized" LLM responses—reworded to reduce verbosity but maintain core ideas—using four automated metrics (supervised OCSAI and CLAUS models, and two unsupervised semantic-distance tools) and compared them with responses from 30 human participants. As expected, verbatim LLM responses were rated as substantially more original than human responses (median $d = 1.46$) but showed markedly lower variability (median $d = 0.85$). Humanizing the responses strongly decreased originality and weakly increased variability, indicating that part of the LLM creativity paradox is driven by stylistic cues. Nevertheless, even after humanization, originality scores of LLM responses remained higher (median $d = 0.80$) and their variability lower ($d = 0.57$) than those of human responses. These findings suggest that automated assessment tools can be partially misled by the style of LLM outputs, highlighting the need for caution when using automated methods to evaluate machine-generated ideas, particularly in real-world applications such as providing feedback or guiding creative workflows.
CoTextor: Training-Free Modular Multilingual Text Editing via Layered Disentanglement and Depth-Aware Fusion
Zhenyu Yu · MOHD IDRIS · Pei Wang · Rizwan Qureshi
We introduce \textit{CoTextor}, a modular and training-free framework for multilingual text editing in images, designed to support human-AI co-creation through a user-controllable and reversible workflow. Unlike diffusion-based systems that operate as black boxes, \textit{CoTextor} separates the editing process into transparent layers—foreground extraction, background inpainting, semantic rewriting, and depth-aware reintegration—allowing precise user-guided operations such as rotation, translation, scaling, and warping. To ensure realism, we introduce a perceptually guided integration module that enhances photometric and geometric coherence during text reinsertion. Built entirely from publicly available pretrained components, \textit{CoTextor} is accessible to non-technical, multilingual users, requiring no retraining or annotation. Through real-world scenarios in poster localization, street art remixing, and educational content creation, we demonstrate how \textit{CoTextor} enables inclusive and expressive visual storytelling across cultural and linguistic contexts.
The Digital Plankton is a physical object installation exploring the life of virtual plankton creatures deployed on-board of a small computing device. The installation shows generated footage on the Raspberry Pi with an embedded display. It is a spiritual continuation of the so-called Evolved Virtual Creatures presented by Karl Sims in 1994 - there, a genetic evolutionary algorithm was used to create rule-based grid-based controls for simple virtual bodies of creatures which were conditioned to move. Our work uses neural networks and learning from data rather than depending on explicitly written rules (such as the goal to "move the furthest" in a simulated environment). As such the inhabited space is continuous rather than discrete (up until the point of resolution given by float32). We present a play on hybridisation as our synthetic plankton shapes live on a physical real-world object, which is exhibited and can be interacted with. As for the theme, humanity could learn from plankton, as one of the oldest species which has lived through evolving climate conditions over billions of years - speculatively, if we consider more than human intelligence, it may even outlive us.This work is the conclusion of a year-long art residency with the Inspiration Forum Lab of the Ji.hlava International Documentary Film Festival, which resulted with a month-long exhibition at the Display photo gallery named Bodies of Water. We want to thank Albert Calbet and the Marine Zooplankton Ecology Lab of CSIC for their data. Finally, for the installation details, this piece contains a custom adapted Raspberry Pi with a display which is attached to a wall segment via USB.
Echoes of Humanity: Exploring the Perceived Humanness of AI Music
Flavio Figueiredo · Giovanni Martinelli · Henrique Sousa · Pedro Rodrigues · Frederico Pedrosa · Lucas Ferreira
Recent advances in AI music (AIM) generation services are currently transforming the music industry. Given these advances, understanding how humans perceive AIM is crucial both to educate users on identifying AIM songs, and, conversely, to improve current models. We present results from a listener-focused experiment aimed at understanding how humans perceive AIM. In a blind, Turing-like test, participants were asked to distinguish, from a pair, the AIM and human-made song. We contrast with other studies by utilizing a randomized controlled crossover trial that controls for pairwise similarity and allows for a causal interpretation. We are also the first study to employ a novel, author-uncontrolled dataset of AIM songs from real-world usage of commercial models (i.e., Suno). We establish that listeners' reliability in distinguishing AIM causally increases when pairs are similar. Lastly, we conduct a mixed-methods content analysis of listeners’ free-form feedback, revealing a focus on vocal and technical cues in their judgments.
We present EmoNest, a generative AI framework for creating interactive, emotionally adaptive storytelling experiences. By integrating advanced language and vision models with user profiling and real-time narrative adaptation, EmoNest generates personalized stories that reflect each user’s emotional state and personal background. Our approach empowers users to co-create immersive narratives that promote engagement and emotional resonance. We demonstrate EmoNest’s potential for delivering personalized artistic experiences and discuss its implications for the development of emotionally intelligent AI.
EMPATHIA: Multi-Faceted Human-AI Collaboration for Refugee Integration
Mohamed Rayan Barhdadi · Mehmet Tuncel · Erchin Serpedin · Hasan Kurban
Current AI approaches to refugee integration optimize narrow objectives such as employment and fail to capture the cultural, emotional, and ethical dimensions critical for long-term success. We introduce EMPATHIA (Enriched Multimodal Pathways for Agentic Thinking in Humanitarian Immigrant Assistance), a multi-agent framework addressing the central Creative AI question: how do we preserve human dignity when machines participate in life-altering decisions? Grounded in Kegan's Constructive Developmental Theory, EMPATHIA decomposes integration into three modules: SEED (Socio-cultural Entry and Embedding Decision) for initial placement, RISE (Rapid Integration and Self-sufficiency Engine) for early independence, and THRIVE (Transcultural Harmony and Resilience through Integrated Values and Engagement) for sustained outcomes. SEED employs a selector–validator architecture with three specialized agents—emotional, cultural, and ethical—that deliberate transparently to produce interpretable recommendations. Experiments on the UN Kakuma dataset (15,026 individuals, 7,960 eligible adults 15+ per ILO/UNHCR standards) and implementation on 6,359 working-age refugees (15+) with 150+ socioeconomic variables achieved 87.4% validation convergence and explainable assessments across five host countries. EMPATHIA's weighted integration of cultural, emotional, and ethical factors balances competing value systems while supporting practitioner–AI collaboration. By augmenting rather than replacing human expertise, EMPATHIA provides a generalizable framework for AI-driven allocation tasks where multiple values must be reconciled.
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions
Vivek Veeriah · Federico Barbero · Marcus Chiam · Xidong Feng · Michael Dennis · Ryan Pachauri · Thomas Tumiel · Johan Obando Ceron · Jiaxin Shi · Shaobo Hou · Satinder Singh · Nenad Tomasev · Tom Zahavy
The rapid advancement of Generative AI has raised significant questions regarding its ability to produce creative and novel outputs. Our recent work investigates this question within the domain of chess puzzles and presents an AI system designed to generate puzzles characterized by aesthetic appeal, novelty, counter-intuitive and unique solutions. We briefly outline our methodology, with a detailed discussion in the technical paper. To assess our system's creativity, we presented a curated booklet of AI-generated puzzles to three world-renowned experts: International Master for chess compositions Amatzia Avni, Grandmaster Jonathan Levitt, and Grandmaster Matthew Sadler. All three are noted authors on chess aesthetics and the evolving role of computers in the game. They were asked to select their favorites and explain what made them appealing, considering qualities such as their creativity, level of challenge, or aesthetic design. This paper compiles these selected puzzles, integrating expert analysis to explore the elements that render them counter-intuitive and beautiful.
How Foundation Models are Reshaping Non-Invasive Brain–Computer Interfaces: A Case for Novel Human Expression and Alignment
Albert Barque Duran · Ada M. Llauradó Crespo
SYNAPTICON is a research prototype at the intersection of neuro-hacking, non-invasive brain-computer interfaces (BCIs), and foundation models, probing new territories of human expression, neuroaesthetics, and AI alignment. Envisioning a cognitive “Panopticon” where biological and advanced synthetic intelligent systems converge, it enables a pipeline that couples temporal neural dynamics with pre-trained language models and operationalizes them in a closed loop for expression. At its core lies a live “Brain Waves-to-Natural Language-to-Aesthetics” system that translates neural states (i.e. electroencephalography (EEG)) into decoded speech, and then into immersive audiovisual output and content; shaping altered perceptual experiences and inviting audiences to directly engage with the user’s mind. SYNAPTICON provides a reproducible reference for foundation-model-assisted BCIs, suitable for advanced studies of human–machine interaction (HMI).
Interactive Artistic Text-To-Voice: Tungnaá and Bla Blavatar vs Jaap Blonk
Victor Shepardson · Jonathan Reus · Thor Magnusson
Advances in deep learning have enabled speech synthesis to rival human speech in realism. While many artists have experimented with these technologies, real-time applications have been limited. We define a new task, interactive artistic text-to-voice (IATV), in order to bridge this gap. We also present a novel IATV system which achieves low-latency synthesis, interactivity, and controllability while allowing for exploration of unconventional vocal expressions. It leverages a character-level text encoder, Tacotron2-based streaming alignment, and a RAVE streaming vocoder. Tungnaá is our open source Python package implementing IATV training and real-time inference, plus a graphical interface for experimental music performance with IATV models. We report on strategies for low-resource training on artist-created datasets, and on an artistic application of Tungnaá in collaboration with sound poet Jaap Blonk.
Large-Scale Training Data Attribution for Music Generative Models via Unlearning
Woosung Choi · Junghyun Koo · Kin Wai Cheuk · Joan Serrà · Marco Martínez-Ramírez · Yukara Ikemiya · Naoki Murata · Yuhta Takida · Wei-Hsiang Liao · Yuki Mitsufuji
This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed the most to the generation of a particular output from a specific model. This is crucial in the context of AI-generated music, where proper recognition and credit for original artists are generally overlooked. By enabling white-box attribution, our work supports a fairer system for acknowledging artistic contributions and addresses pressing concerns related to AI ethics and copyright. We apply unlearning-based attribution to a text-to-music diffusion model trained on a large-scale dataset and investigate its feasibility and behavior in this setting. To validate the method, we perform a grid search over different hyperparameter configurations and quantitatively evaluate the consistency of the unlearning approach. We then compare attribution patterns from unlearning with non-counterfactual approaches. Our findings suggest that unlearning-based approaches can be effectively adapted to music generative models, introducing large-scale TDA to this domain and paving the way for more ethical and accountable AI systems for music creation.
Melody Slot Machine III for Automatic Fingering Saxophone using Servomotors
Masatoshi Hamanaka · Gou Koutaki
Melody Slot Machine is a dial with the staves of music displayed on an iPad, which can be rotated to change the melody variations. The melody variations are generated on the basis of the AI-based melody-morphing method and can be partially switched to another variation without any significant change in the overall melody structure and with no musical breakdown. The microcomputer on the Automatic Fingering Saxophone receives the MIDI note from Melody Slot Machine and moves the servomotor so that the fingering corresponds to the note number.
Orchestrating Emergent Storytelling with Embodied Multi-Agent Systems
Parag Mital · Seth Rosetter · Arturo Prieto · Jacobo Heredia · Breanna Browning
We present a novel approach to emergent storytelling through multi-agent systems powered by large language models (LLMs), advancing beyond current approaches to game AI and interactive storytelling which rely on heavily scripted dialogue systems and moving closer towards genuinely emergent narrative ecosystems. Through two artworks / video games, Conflicts and The Game of Whispers, we demonstrate how LLM-driven agents with persistent memory, behavioral models, and coordination capabilities generate coherent narratives from simulated social dynamics. Our architecture introduces: (1) a hierarchical memory system integrating working memory, episodic buffers, and consolidated narrative storage; (2) a conversation graph that tracks topic centroids, engagement, and unresolved questions; (3) a hybrid orchestrator that directs autonomy by fusing LLM reasoning with the conversation graph; and (4) their integration within embodied agents with a streaming multimodal action-perception loop that enables spatial awareness and environmental responsiveness. Experiments reveal emergent behaviors including strategic deception, coalition formation, the spread of misinformation, and meta-narrative awareness. Our contributions include several architectural patterns for producing stable emergent narrative systems.
This work introduces an innovative “painting-with-paintings” artform, in which each brushstroke contains fragments of historical artworks. The method allows existing paintings to be reimagined as fluid mosaics, preserving the essence of past masterpieces while forming a new visual narrative. By embedding humanity’s artistic heritage into every stroke, the approach connects past and present, links diverse artistic traditions, and transforms art history into a living medium for contemporary creation.
We present a framework designed to generate and execute writing plans for LLMs. This framework decomposes the writing task into multiple sub-tasks, including novel reflection steps for iterative plan refinement, facilitating a structured approach to content generation. It can be applied both to execute writing plans through multiple LLM calls and to generate training data for supervised fine-tuning. To evaluate the effectiveness of our framework, we developed automated raters trained on guideline-based, human-filtered questions to assess writing quality. We show that the writing outputs produced using our framework are superior to those generated by LLMs without a planning component. Additionally, human raters corroborated these results, with 79.4% of participants ranking our framework’s outputs as more interesting.
Real-Time Intuitive AI Drawing System for Collaboration: Enhancing Human Creativity through Formal and Contextual Intent Integration
Jookyung Song · Mookyoung Kang · Nojun Kwak
This paper presents a real-time generative drawing system that interprets and integrates both formal intent—the structural, compositional, and stylistic attributes of a sketch—and contextual intent—the semantic and thematic meaning inferred from its visual content - into a unified transformation process. Unlike conventional text-prompt-based generative systems, which primarily capture high-level contextual descriptions, our approach simultaneously analyzes ground-level intuitive geometric features such as line trajectories, proportions, and spatial arrangement, and high-level semantic cues extracted via vision–language models. These dual intent signals are jointly conditioned in a multi-stage generation pipeline that combines contour-preserving structural control with style and content-aware image synthesis. Implemented with a touchscreen-based interface and distributed inference architecture, the system achieves low-latency, two-stage transformation while supporting multi-user collaboration on shared canvases. The resulting platform enables participants, regardless of artistic expertise, to engage in synchronous, co-authored visual creation, redefining human–AI interaction as a process of co-creation and mutual enhancement.
We present a novel, regression-based method for artistically styling images. Unlike recent neural style transfer or diffusion-based approaches, our method allows for explicit control over the stroke composition and level of detail in the rendered image through the use of an extensible set of stroke patches. The stroke patch sets are procedurally generated by small programs that control the shape, size, orientation, density, color, and noise level of the strokes in the individual patches. Once trained on a set of stroke patches, a U-Net based regression model can render any input image in a variety of distinct, evocative and customizable styles.
Surface Tension is an experimental installation that explores the power of visual media in shaping our realities, probing the tension between visibility and authority in the systems through which we produce and share knowledge. At its center lies the installation’s kernel of truth - raw footage from a microscope capturing a meticulous process of animating individual neurons through physical manipulation. Using optical tweezers, neurons are lifted and orchestrated into movement by the energy of a light beam, drifting in and out of formations that attempt to spell out the word "THOUGHT.”Over the course of the installation, this raw footage undergoes continuous visual transformation, cycling through color inversions, latent textures generated by diffusion models, and other computational processes that conceal and reveal different aspects of the image. Images at various stages in the technological reconstruction of the absurd reality morph in and out, slide over and under each other, wearing the traces of their own making like a skin.In this layering of the real and the speculative, Surface Tension confronts the condition of being human in a world increasingly mediated by black-boxed machine perception. It traces the double-edged power of media to traverse boundaries between physical reality and representation, simulation and experiment, model and metaphor. In animating the very matter that constitutes consciousness, the work both literalizes and questions the future of agency, asking what it means to see, know, and be, amid increasingly autonomous systems.
Synthaesthetic Art: Human-Machine Creative Collaboration
Justin Baird · Ivy Chen · Jookyung Song · Mookyoung Kang · Richard J Savery
Synthaesthetic Art is a human-machine collaborative framework combining live perception, generative AI, and robotic execution. Presented as a live installation at the Super AI Conference 2025, the system captured portraits of attendees, transformed them into stylised caricatures using a diffusion model that was fine-tuned on artist-in-residence Ivy Chen's works, and rendered them on canvas with acrylic paint via a robotic arm. Unlike conventional AI-driven art systems, our Synthaesthetic Art system was designed to preserve and amplify artist authorship Artist authorship is preserved through stylistic training, selective curation, and real-time control. This work introduces the conceptual foundations of Synthaesthetic Art and explores how hybrid creativity can expand agency, re-frame authorship, and deepen emotional expression through machine augmentation.
The Anemoia Device: A Tangible AI System for the Co-creation of Synthetic Memories through Scent
Cyrus Clarke · Jianing Yu · Melo Chen · Yuen Zou · Hiroshi Ishii
We present the Anemoia Device, a synthetic memory generator that uses generative AI to provoke nostalgia for a time you have never experienced. The system transforms an archival photograph into a multi-sensory artifact through a novel synthesis of vision, language, and olfactory technologies. This multi-modal pipeline uses a tangible, dial-based interface to guide an LLM in performing a cross-modal, semantic-to-olfactory translation. The work is an inquiry into memory malleability in an age of AI, proposing an alternative to conventional screen-based interaction through an intentional, embodied ritual that positions the user as an active co-author, rather than a passive consumer.
The Ghost in the Keys: A Disklavier Demo for Human-AI Musical Co-Creativity
Louis Bradshaw · Alexander Spangher · Stella Biderman · Simon Colton
While generative models for music composition are increasingly capable, their adoption by musicians is hindered by text-prompting, an asynchronous workflow disconnected from the embodied, responsive nature of instrumental performance. To address this, we introduce Aria-Duet, an interactive system facilitating a real-time musical duet between a human pianist and Aria, a state-of-the-art generative model, using a Yamaha Disklavier as a shared physical interface. The framework enables a turn-taking collaboration: the user performs, signals a handover, and the model generates a coherent continuation performed acoustically on the piano. Beyond describing the technical architecture enabling this low-latency interaction, we analyze the system's output from a musicological perspective, finding the model can maintain stylistic semantics and develop coherent phrasal ideas, demonstrating that such embodied systems can engage in musically sophisticated dialogue and open a promising new path for human-AI co-creation.
In this paper, we discuss a new physical interface towards agentic models. We define AI Cohabitants as new interfaces for agentic models that act with an inherent character, and have the ability to act autonomously without user inputs. An AI cohabitant is more like a roommate or smart house pet, with its own personality and narrative, reversing the subservient dynamic of current AI assistants. Unlike traditional voice agents that require users to actively open an app or issue a command, AI Cohabitants are inherently physical and exist alongside users continuously, occupying space in everyday life with a degree of autonomy. This persistent, ambient presence allows the AI to observe, learn, and subtly participate in the user’s world—not by maximizing engagement time or demanding attention, but by developing its own narrative and rhythm of interaction. To explore this concept, we developed a physical embodiment of such a cohabitant: a robotic parrot named the Stochastic Parrot. Through its physicality and contextual framing as a living presence in the user’s environment, we observed new patterns of interaction and more nuanced relational dynamics between the AI and its human counterpart. The framing of the AI as a characterful cohabitant, rather than a utilitarian assistant, invites a more spontaneous, expressive, and emotionally textured form of engagement.
The Sublime Ordinary: A Tool for Analysing Temporal City Soundscapes
Elina Oikonomaki · Lukas Debiasi
As cities hurtle toward ever more data-driven futures, The Sublime Ordinary offers an alternative perspective on how we record, understand, and ultimately design urban environments by examining the temporal and sensory dimensions of city life from a first-person perspective. Through a multimodal dataset of synchronized audio, video, and GPS recordings collected in Harvard Square, Cambridge, Massachusetts, the project analyzes how the rhythms of everyday urban life shift over time and in response to policy and environmental changes. By combining semantic segmentation (SegFormer), object detection (YOLO), and sound classification (YAMNet), our system generates linked spatial–temporal–acoustic representations that form the data foundation of a new notational language informed by graphic notation to visualize the interplay of sound, activity, and place. Presented as an interactive web interface, the work enables users to explore recurring sound profiles and similarities between locations—asking: which city block sounds most like another, and how does its acoustic identity change over time? Engaging the NeurIPS Creative AI theme of Humanity, it examines how human and machine perception complement one another in a shared authorship that enables a more sensory, human-centered understanding of urban environments.