Skip to yearly menu bar Skip to main content


Creative AI

Creative AI Videos

Jean Oh · Isabelle Guyon

Abstract:

Chat is not available.


Re·col·lec·tions. Sharing sonic memories through interactive machine learning and neural audio synthesis models.

Gabriel Vigliensoni · Rebecca Fiebrink

“Re·col·lec·tions” is a sound and music exploration wherein we engage with a neural audio synthesis model in real-time through gestural interaction and interactive machine learning. The neural audio synthesis model in the work was trained on part of the sound archive from the Museo de la Memoria y los Derechos Humanos based in Santiago de Chile.

In “Re·col·lec·tions,” we mapped the human-performance space to the high- dimensional, computer-generated latent space of a neural audio model by utilizing a regressive model learned from a set of demonstrative actions. By implementing this method in ideation, exploration, and sound and music performance we have observed its efficiency, flexibility, and immediacy of control over generative audio processes.

Emphasizing real-time control in neural audio synthesis systems is crucial for performers to introduce long-term temporal coherence often lacking in these systems. Even if a generative model produces audio signals with short-term temporal coherence, it can still generate longer structures when appropriate control is applied during generation.

The technologies used in “Re•col•lec•tions” include MediaPipe Face Mesh (Kartynnik et al., 2019) for real-time face landmarks acquisition; RAVE (Caillon and Esling, 2021) for sound modeling and synthesis; Wekinator (Fiebrink, Trueman, Cook, 2009) and FluCoMa (Tremblay, Roma, Green 2021) for mapping human-performance space to computer-latent space; along with an interactive machine learning approach to steer latent audio models (Vigliensoni and Fiebrink, 2023).


Childhood Dreams

Linoy Tsaban · Ezi (Ezinwanne) Ozoani · Apolinário Passos

“What do you want to be when you grow up?” Our installation aims to explore and expose biases in text-to-image models while simultaneously providing viewers with fine-grained control to mitigate these biases. Acknowledging biases and striving for diversity and inclusion at the same time.

We believe our display both challenges preconceived notions and stereotypes about who can occupy certain professions and celebrates diverse aspirations and the notion everyone should have the freedom and opportunity to be whom they want to be, regardless of gender and ethnicity. Children/childhood both represent the future and a naive view of the world, where dreams and aspirations transcend social norms and the answers to the question: “What do you want to be when you grow up?” are limitless, embodying a true celebration of diversity and inclusion.

Participants are welcomed to upload an image (or take a picture of themselves) to the online demo, and provide a childhood dream/profession text prompt. The resulting generated final image along with a ‘comic strip’ that depicts the morphing journey from the original image to the generated one, is then provided. Users have further control of the generated image, by choosing the image style e.g. watercolour, oil painting etc. Participants can also query the timeline of text-to-image models at the bottom of the space, to produce different outputs for their childhood dream/profession text prompt. The ability to query different text-to-image models invites participants to see the incremental changes made to reduce biased image generations, that usually depict stereotypical outputs of professions e.g. an image of a woman along with the text prompt ‘doctor’ would generate a male image. While users can see the changes in generated images, we also wish to highlight we still have a long way to go with centring inclusivity to reduce bias in generative models.

By showcasing the results of these text-to-image driven edits, we prompt viewers to reflect on the potential consequences of these biases on young individuals as they grow up and pursue their aspirations. We wish to emphasise the importance of addressing and mitigating biases within AI technology to create a more diverse, inclusive and equitable future and highlight the need to ensure that AI systems don't perpetuate or amplify societal biases.

The installation aims to spark conversations and raise awareness among the conference attendees about the importance of diversity and inclusivity in AI. By displaying outputs with mitigated biases (that the users can have control over), in addition to the stereotypical outputs, we wish to give the stage to diverse portrayals of professions, as a hopeful glimpse of what could be.

We hope that through this installation we can foster dialogue about the need for inclusivity and diversity in the development of AI systems.


Diffusion Model for Chinese Calligraphy Generation and Inpainting

Qisheng Liao

Chinese calligraphy, which is the artistic writing of Chinese characters and a prominent form of East Asian calligraphy, can be seen as a distinctive form of visual art. It involves writing Chinese characters with a brush and ink but is not just a means of communication but a visual art that has deep cultural and historical significance in China and other East Asian countries. Recent advancements in computer vision hold significant potential for the future development of generative models in the realm of Chinese calligraphy, but most previous works are related to traditional paintings or photographs. In our work, we train a conditional diffusion model that can generate high-quality Chinese calligraphy. We also apply the latest framework, RePaint and LORA, to do Chinese calligraphy inpainting and 1-shot style transfer fine-tuning.


Can a building have a heart? A durational slowly evolving AI artwork

Zaher Joukhadar

Good art binds together communities and bridges across diversity. ‘Living’ art —art that changes over time— is a relative rarity, especially if it involves AI. The Heart at Melbourne Connect is a site-responsive, slow Artificial Intelligence artwork to be lived with over decades. Created by artist Robert Walton, it was installed at the entrance of the University of Melbourne’s Melbourne Connect building in April 2023. The Heart reveals the pulse of a superorganism: the community visiting, living, and working in the building, a city-block size home of businesses, university departments, a kindergarten, dormitory-style accommodation, and a science gallery/museum.

The Heart beats indefinitely for and with the life of the building and its community. Responding to 4800 Building Information Modelling (BIM) sensors that monitor CO2, humidity, room occupancy, temperature, movement, light, and more, the building adjusts its interior environments to create the optimum conditions for human comfort and safety. Normally, the automated work of building sensors and systems is dispersed and imperceptible. The Heart externalises the building’s ‘sensations’ in a way people can perceive and begin to empathise with. Its form in the foyer of Melbourne Connect is a 10-metre-tall volume of brass droppers, reconstituted brick fragments, and LEDs in the shape of a giant human heart. The LEDs are driven by an AI algorithm that convolves periodic inputs from the 4800 building sensors into a spatially organised pulsating display that is constantly changing depending on the sensor readings. The convolution combines actual readings of the moment with a learned pattern reflecting the typical activity of the building at the same time on previous days, resulting in both a dynamic rhythm of light movement and a gradual change in the learned pattern for use in future days.

Presence, being there, underpins the collective corpus of the community and the ability of The Heart to respond. Simply arriving at the building, moving through its spaces, and even breathing impacts the environmental sensors and by extension, The Heart. In this sense it may provoke contemplation of ourselves within the superorganism, and the relation between our personal and collective conduct. It may draw attention to the plurality of experiences occurring simultaneously and the diversity of the community undertaking a collective endeavour. It may prompt mindfulness of the state of others: their quality of breath, their movement, tension or ease within the environment. By extension it may prompt self-awareness and contemplation of what we carry within us into the building, both as superorganisms ourselves, and as physically, culturally and linguistically diverse people.

The Heart invites visitors to donate their own pulse to the building by placing a finger on a monitor. This action connects an individual’s heart and somatic state represented by the pulse with that of the building. From one superorganism to another: we are unified by our diversity, perennial mysteriousness, and the quality of being greater than the sum of our parts.


Project MOSAIC: Using Generative AI for Societal Research and Public Engagement

Jeff Running · Asta Roseway · Matej Ciesko · ·

Project Mosaic is a Generative AI & Art experience designed to capture and dynamically display public discourse around AI. It encourages the public to engage with it by answering survey questions and uses AI-based semantic analysis to interpret the nuance of individual thoughts and feelings and create individual portraits of responsive art. The goal of Mosaic is to provide new engagement models to help promote discourse around AI and reflect the uniqueness of each individual response to the survey, while also capturing sentiment across a broad population group.


Fencing Hallucination

Weihao Qiu

"Fencing Hallucination" is a multi-screen interactive installation that merges real-time human-AI interaction in a virtual fencing game with the co-creation of chronophotographs with image-generative AI, documenting audience movements and AI fencer responses. It effectively navigates the challenges of balancing interactivity, modality diversity, and computational constraints within creative AI tools.

The system uses audience pose data as input for a Multilayer Perceptron (MLP) to generate the virtual AI Fencer's pose data. It also leverages this audience pose data to synthesize chronophotographs. This process involves representing pose data as stick figures, using a Dreambooth-fine-tuned Stable Diffusion model for ControlNet-assisted image-to-image translations to transform these stick figures into a series of realistic fencing images, and finally, combining these images with an additive effect to create the final result. This multi-step approach ensures the preservation of overall motion patterns and delicate details when synthesizing a chronophotograph.

"Fencing Hallucination" explores diversity from various angles. Fencing is a unique sport where participants wear pure white attire, rendering their identities anonymous and transcending factors like gender, age, and ethnicity. This feature offers an equitable platform for male and female competitors and minimizes the impact of age compared to other sports.

Moreover, the project contributes to diversifying AI training datasets, addressing the issue of dataset monopolization by organizations in power. It makes fencing datasets more accessible and empowers individuals to create customized datasets.

Additionally, "Fencing Hallucination" fosters a culture of diverse AI creations. It draws inspiration from the tradition of Chronophotography in Photography and uses AI-based image generation to revive the visual aesthetics and photographic experiences of the past. By juxtaposing AI image synthesis and chronophotography, it invites the audience to reflect on the evolution of technology, echoing the innovative approaches of artists like Tom White and Anna Ridler, who blend AI tools with traditional techniques and cultural elements in their artwork.

This project, through its technical innovations and cultural exploration, contributes to the advancement of AI in the context of art and technology. It emphasizes the importance of diversity, inclusivity, and accessibility in this domain, furthering the dialogue on the intersection of AI and artistic expression.


The Guitarist and Aural Fauna: Duet for Human and Machine-Generated Creatures

Eunsu Kang · Donald Craig

This performance is the premiere of Donald D. Craig’s guitar duet with Aural Fauna, a family of unknown organisms imagined by AI. Aural Fauna specimen’s forms and sounds are generated by machine learning algorithms that were developed by a team of artists and machine learning researchers including Craig and his long-time collaborator Eunsu Kang. During this improvised duet, Craig and eight fauna entities listen and respond to each other by changing pitches or the tempo or adding additional parts or harmony.


Latent Painter

Shih-Chieh Su

Latent diffusers have gained a lot of traction in generative AI for their efficiency, content diversity, and reasonable footprint. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. The highly diversed family of latent diffusers can all be animated.

Latent Painter can also transit one image to another, providing additional options to the existing interpolation-based method. In fact, it also transits the generated images from two different sets of checkpoints.


Inspiration Terminals - A Short Play for Prompter and Language Model

Eyal Gruss

A lecture-performance exploring creative, ethical, and philosophical aspects of working with large language models. Through a playful dialogue, the presentation experiments with various modalities of interaction to challenge LLMs and showcase their potential for accomplishing complex tasks. This exploration invites the audience to contemplate the broader possibilities and implications of human-machine collaboration in the ever-evolving AI landscape.


New Orleans: An Adventure In Music

Stephen Hahn

Our exhibit shows an interactive video game with AI-generated music and visuals. Players can choose a collection of emotions that affect the game’s generated music. The game’s protagonist is an aspiring musician seeking to make their mark in the bustling New Orleans music scene. They navigate the city’s rich cultural tapestry, encountering iconic locations, hidden gems, and electrifying performances along the way.

The novelty of our game comes primarily from our music generation system, which consists of two major components: a sentiment-based harmonic progression generator and a music theory-informed generative framework for melody composition. The harmonic content, as well as orchestration, tempo, and dynamics, is determined using a novel model called SentimentComposer that is based on a mixture of emotions. This sentiment mixture is derived from the current game state. From a corpus of labeled music, the model learned a set of Markovian transition matrices, one for each emotion, and weighted each one by the proportions of the sentiment mixture. Sampling harmonies from the model is easy and highly interpretable.

Using the sentiment-driven harmonic progression, a second novel model, SchenkComposer, generates a melody probabilistically to play atop the harmony. SchenkComposer uses a novel probabilistic context-free grammar informed by musical form theory and Schenkerian analysis in order to generate music with a cohesive, long-term structure.

Beyond the music, the visual backdrops of the game are generated using Microsoft Bing’s image generator powered by DALLE 3. The text narrative was written in collaboration with ChatGPT.

New Orleans: An Adventure in Music celebrates New Orleans’ diverse cultural legacy through its music and iconic scenery. New Orleans is known for its musical fusion, blending jazz with various genres such as blues, gospel, classical, and more – to pay homage to this multitude of influences, our model is trained music from sultry blues to energetic ragtime. Players are able to create their own songs with a cohesive long-term structure influenced by New Orlean’s musical history through our musical model. The game’s intuitive mechanics make playing and creating music a joyous and rewarding experience for anyone. We believe that our project is an innovative, interpretable, and interactive love letter to the city’s diverse musical heritage that anyone can enjoy.


The (m)Otherhood of Meep (the bat translator)

Alinta Krauth

The (m)Otherhood of Meep (the bat translator), is an audio interpreter for Grey-Headed Flying Fox vocalizations, drawing from previous scientific research on flying fox vocalizations to interpret their voices into poetic and artistic form in real-time. It aims to provoke questions surrounding the diversity and inclusivity of AI corpuses with regards to nonhuman species, and evokes an interspecies bridge between species at the center of human/wildlife conflicts. To make the work, Google’s Teachable Machine has been trained on a corpus of collected and categorized vocalizations, resulting in a TensorFlow file that is given a visual display through JavaScript, connecting to an array of wording in English.

In recent years, use of open-source and/or easily accessible AI technology has swiftly raised significant questions surrounding bias, diversity, inclusivity, and accessibility of AI-related products. What is still left out of most current conversations of AI development and use are the ways in which these issues may also affect our relations with other species. Through this artwork, I assert that currently conversations of diversity and inclusivity must extend beyond the human and into conversations of multispecies ethics and futures, inclusive of other conscious, intelligent, and pain-feeling beings with whom our lives intersect. This artwork points to a future for machine learning trained to highlight the voices and expressions of historically and culturally Othered beings, and shows one possible outcome when AI research is centered around planetary care.


Artificial Intelligence Improvisation

Piotr Mirowski

Improbotics is a transnational, interdisciplinary theatre company experimenting with improvised comedy where human actors perform alongside artificial intelligence (AI)-powered chatbots and robots. It functions as a theatre laboratory that bridges the arts and sciences to develop AI technology, conduct academic research, and stage entertaining shows for the general audience. Co-design between actors and AI developers enables synergies where playful, creative, and exploratory interests of artists stimulate novel ludic applications of technology originally designed for pragmatic purpose, and technology itself inspires novel forms of performance. The company explores aspects of AI deeply embedded within modern human culture, from chatbots and machine translation to video communication, and integrates them within traditional performances and cultural spaces.

Improbotics is based in 4 different countries (UK, Canada, Belgium, Sweden), and during the pandemic, has rehearsed and practiced together online in virtual reality. It has also designed a theatre show around real-time live translation and the comedy of (speech recognition) errors in the attempt to blur linguistic boundaries.

Integrating live AI within traditional performance practice fosters a co-creative ethos where the AI becomes experienced as a creativity support tool and then ultimately an anthropomorphised stage partner that actors endow with personality through role-play. The cast members in turn, find humour and inspiration from both limitations and possibilities stemming from imperfect AI.

This experiential trajectory of the performer has become a design focus where we use AI to engage the public about the risks of AI to human culture while also presenting a co-creative mindset to technology as a tool for cultural creation. Our shows are co-created with audiences who act as allies for the actors and technologists, experimenting together on stage as they try to make sense, in real-time, of the uncanny presence of a seemingly intelligent and creative technology.

The focus on playfulness and celebration of failure encourages a different kind of exploration with the technology, and lowers the barrier to entry to experimental work with AI. The success of our troupe both in finding new forms and modalities of live performance through and with technology as well developing novel technologies with applications both for live performance and other ludic domains may rely on the core improvisational activity of accepting the offer- whatever it may be- from the other objects encountered on stage. Following this logic of acceptance and collaboration shifts the nature of interaction with AI from replacement to empowerment and enhancement for human creativity.


Blind Photographer's Frame——reality and fantasy

Ziwei Chi · Zonghan Yang

Visual space has long dominated our perception of the world, with other senses yielding to vision. Language is replaced by visual symbols, touch is enticed by clear textures of images. We effortlessly access vast amounts of utopian visual information on screens, believing in its authenticity, especially in the era where generative technology can create convincing illusions.

what is the world like when vision vanishes?

Inspired by the film "Proof," directed by Jocelyn Moorhouse, We designed a device called the "Blind Photographer's Viewfinder." It utilizes image recognition and artificial intelligence-generated content (AIGC) to transform the captured world into diverse and rich expressions in real-time. When wearing the viewfinder, it produces analogous experience to being blind, the surrounding scenes lose their visual center, transforming into poetry, natural sounds, subtle sensations, and emerging associations, offering a new mode of interpretation to the world. It helps you mediate your sensorium through AI technology. This aspect of sensory translation is a valuable way to inhabit other sensoria, other bodies—a form of diversity.

Moreover, this creation demonstrates care for individuals with disabilities. The audience can experience the world from the perspective of a blind person. If this artwork were to evolve towards practical applications, it could potentially provide assistance for the mobility of visually impaired individuals.

The second part of the project extracts a series of real-world images and location information captured by the blind photographer's viewfinder. Using this authentic data, a series of AI-generated scenes corresponding to these locations are created. I deliberately blend the real images with the AI-generated information, collaging and constructing an indistinguishable world that exists between reality and fantasy.

The emergence of AIGC brings both technological dependency and anxiety of belief. In rebellious fashion, we organize such an experience—a questioning of visually dominant spaces, a deception of trust, and perhaps, a journey into a new multidimensional world akin to dreaming butterflies in Zhouzhuang.