Skip to yearly menu bar Skip to main content


Creative AI

Creative AI Session 3

Hall D1 (level 1)

Jean Oh · Isabelle Guyon

Thu 14 Dec 8:45 a.m. PST — 12:15 p.m. PST

Abstract:

Chat is not available.


Table 1
Archi Guesser - An AI Art Architecture Educational Game

Diversity is part of our architectural history. Architectural styles developed in all cultures across the globe and represent historical values, local materials, and community structures. With generative AI art we can both teach as well as break up these traditional cultural boundaries and remix them to enable new ways to think about these styles. We use a combination of multiple AI Art technologies for the game. We use Chat GPT to select famous styles for various cultural areas, identify famous architects and summarize style characteristics. We then use this to generate sample images of the styles in Midjourney, or poems for the styles with Chat GPT, and present them to the user. The player guesses the style, geographic location, epoch, and landmark by placing 3D printed objects on a map. We use OpenCV to track these objects, creating an interactive and tactile learning experience. This approach allows players to explore a wide range of architectural styles, some of which they may not have heard of before. We hope to encourage players to recognize these styles as stepping stones in global architectural history, rather than just local trends within their cultural bubbles, all while enjoying a novel type of learning game.


Table 10
Collaborative Synthscapes from Words

Modular synthesizers have long offered endless possibilities for sound design, but have a large number of components to patch together and parameters to tune. This makes them complex to effectively explore for many. The system we have developed, which we call CTAG (Creative Text-to-Audio Generation), invites everyone to explore these creative possibilities by imagining sounds and intuitively describing them in words, from which it controls the synthesizer's parameters to create diverse, artistic renderings.

For this project, we propose to invite attendees to co-create a set of soundscapes using CTAG. In alignment with the theme of celebrating diversity, each of the soundscapes will be oriented around a simple but thought-provoking question. Possible prompts include, but are not limited to: what is a sound that reminds you of your childhood? What is a sound that you associate with your cultural identity? What do you hear when you think of home?

This project invites members of the public to provide their own answers to each of these questions as text inputs into the system. By enabling participants to explore and play with generated sounds, it also encourages users to consider the similarities and differences that animate this community-all through sound.


Table 3
Creating Playful Comics Together with AI

This project presents new forms of visual narratives, diversifying ideas and artistic possibilities by collaborating with an AI (tools for image and language generation) in the field of comic creation. The project is based on the artist's approach to analogue comic creation in which events/situations from her daily life get recorded in the form of visual diary through quick, spontaneous drawings later serving as starting points for larger narratives creation (loosely based on the approach by Lynda Barry). The AI extends this approach by allowing the artist to depart from her own analogue comics and take the narrative further through a dialogue with the machine, switching between analogue and digital by varying the prompts, letting the AI caption the hand-drawn images, allowing it to participate in the drawing process (in the artist's style) and taking on its textual suggestions for the continuation of the story. The project showcases several comics/visual narratives/artworks already created in this manner by the artist, all of them humorous, surreal and playful. This project celebrates diversity of ideas and art forms both through using comics as an art form as well as through the numerous possibilities that can emerge as a result of artist-machine collaboration in this field. Such collaboration also diversifies the artistic palette by adding AI as another material much in the same way as inks and pens. Humor further adds to the celebration of difference and diversity. Members of the public will be invited to participate and create their own comics based on the same approach using the same tools in a Google Colab interface. By doing so, the public will be able to sample the diversity of ideas that comes about through creating with an AI, while involving the public in using AI tools will in itself add to the theme of diversity.


Table 4
Q's Views #1-#3 (+1)

Q's Views #1-#3 are part of an ongoing series of digital still images that aim to provide a visual experience of the unbridgeable gap between believers and non-believers of conspiracy theories in today's surreal socio-political situations in the post-truth world, through hybrid image portraits synthesized from AI-generated "fake" images of Hillary Clinton.

Hybrid image is a technique for creating an image that is perceived as one image from afar, but as another image up close, using an optical illusion based on the way the human visual system performs multi-scale processing of images.

For each work, we use generative AI to create a normal portrait of Hillary Clinton and then transform it into a grotesque variant based on conspiracy theories. These images are then synthesized into a hybrid image to embed an optical illusion in which a normal portrait of her suddenly transforms into a grotesque variant when the viewer approaches.

Therefore, as viewers come closer to the threshold of visual perception of the hybrid image, the artwork triggers a sudden realization that conspiracy theorists perceive Hillary Clinton as an entirely different person, thus accentuating the insurmountable divergence in perceived realities between believers and non-believers of conspiracy theories. It is important to acknowledge that even the "normal" portraits themselves are AI-generated fabrications, rendering them as fake images devoid of authenticity. In this work, nothing is genuine, and everything is deliberately fabricated.

As artworks of visual sarcasm, Q's Views #1-#3 refer to our post-truth era, where facts and reality must be all questioned, as generative AIs are used to produce high-fidelity images for fake news and propaganda today, significantly enhancing alienation and division in perceived realities in the same world that we share.

Thus, the work refers to uncanny diversity in perceived realities in our post-truth era. People see what they want to see; now they can even create what they want to see by generative AIs.

NOTE: Since some people value politics over artistry in evaluating artwork, a 'sneak preview' of a prototype from another ongoing hybrid-image portrait series is also showcased at NeurIPS 2023, to satisfy their political preferences


Table 5
Resonator: An AI-assisted Musical Experience for Human Connection

The “Resonator” project is exploring whether a global-youth-focused 3D game experience can 1) provide a compelling way to discover new music while enabling players to express creativity (AI-illustrated playlists, music “song shapes”), resulting in greater direct engagement with music (music exploration and discovery) and human understanding of AI.

Our spatial interface creates a 3D visualization for the MuLan joint embedding model. The software enables users to express creativity through the curation of music playlists while developing a more natural human understanding of how AI represents – and algorithmically navigates – the “space” of music.

The experience is created by a group of game development engineers and designers who specialize in making 3D and 2D experiences intrinsically engaging. We are working to leverage that intrinsic engagement for the visualization, understanding, and evaluation of large models.


Creative AI
Table 6
salad bowl

salad bowl is an interactive neural sound installation where audiences are invited to co-create and co-mingle with “the salad” — a neural network trained on a diverse, eclectic collection of sounds. the salad is a heterogeneous mix of sound elements, each with its unique character, all contributing to a vibrant whole. The salad is a collective memory of the past sonic experiences of people, places, and things throughout the world, all encoded in a fuzzy possibility space.

In salad bowl, you can sit down at the dinner table. There’s a salad bowl and a microphone in front of you. You pick a piece of paper from the salad bowl. The piece of paper prompts you to make a sound with your voice. You make the sound into the microphone. The salad picks up the sound. The sound becomes part of the salad. The salad becomes part of the sound. The sound comes out warped. It’s perceptual identity has been transformed. The sound is no longer just your voice, but rather a view into the infinite possibilities that your sound could be, in the context of the salad.

To wildly transform the sounds put into the salad, the neural network takes the participant’s sound as input and destroys around 80-90% of it. It then looks at the missing pieces and creates its best guess of what was missing.

More than a mere exploration of generative sound modeling, salad bowl is a celebration of spontaneous, shared, and diverse human interactions with sound. salad bowl encourages multiple people to sit down together at the dinner table and engage in colorful sonic conversation. The result is not just an exploration of a generative model's internal representations, but also a celebration of spontaneous, shared, and diverse experiences of human interaction with sound.

salad bowl encourages people to think of -- and design -- generative AI systems as "salad bowls" not "melting pots". Unlike in a melting pot, where the identities of each individual member are lost in favor of a uniform whole, a salad bowl's ingredients are able to shine together while preserving their identity, creating a shared collective entity that embraces diversity, showcases unique beauty, and fosters a richer, multifaceted experience for all who engage with it.


Table 8
NeuroView: Generative Visualization of the Diversity of Brain Responses to Jazz

How we perceive the world around us is intrinsically linked to the environments in which we live, the people with whom we interact, and the experiences we’ve had. This subjective reality has explained in part why we like the music we do, which films make us cry, and how certain smells can so quickly bring us back to key moments in our lives. A monumental discovery in neuroscience is that these subjective experiences we share can in part be measured through electroencelography (EEG). EEG is a non-invasive technique which utilizes electrodes placed on the surface of the head to measure electric fields resulting from activity of collections of neurons acting in concert. These electrodes are positioned across the entire head, allowing for measurement of different neural structures related to diverse activities such as auditory processing, volitional movement, and visual processing, among many more. In this work, we present NeuroView, an AI enabled EEG-based brain computer interface to visualize the subjective experience of jazz music. Jazz represents a fusion of diverse cultures and experiences while also being reflective of the general human experience. Originally formed in the African American communities of New Orleans, jazz strongly reflects the communities in which it is played, with unique forms arising in New York, Minneapolis, New Orleans, and Los Angeles. What unites all jazz is the drive for the musicians to collectively form a voice through their instrument while also listening and supporting fellow musicians in improvisational jam sessions. As such, jazz takes influence from all people who are open to its form and creates from it something more. Emotional processing as recorded through EEG is processed in a VQGAN generative neural network co-trained on the ImageNet dataset to produce surrealist music videos which aim to enter the uncanny valley of thought and neural processes. NeuroView thus aims to explore how differences in brain activity and the perception and reception of music can be visualized using generative VQGAN models in ways that highlight this diversity of emotional experience as something beautiful to be celebrated and contribute to our collective humanity.


Table 9
Fusion: Landscape and Beyond

Fusion: Landscape and Beyond is an interdisciplinary art project that explores the relationship between memory, imagination, and Artificial Intelligence (AI) embodied in the century-long practices and discourse of Shan-Shui-Hua – Chinese landscape painting. It draws inspiration from the concept of Cultural Memory, where memories are selectively retrieved and updated based on present circumstances. The project considers text-to-image AI algorithms as analogous to Cultural Memory, as they generate diverse and imaginative images using pre-existing knowledge. In response to this analogy, the project introduces the concept of "AI memory" and situates it in the culturally significant Chinese landscape painting — a synthetic embodiment of creativity derived from the artist's memory.

Diversity plays both as a driving force and major inspiration for this project, which delves deeply into addressing the bias and the necessity for cultural diversity within the realm of machine-learning generative models for creative art. Recognizing that machines inherently exhibit bias stemming from their design and predominant use, it becomes essential to acknowledge and rectify such prejudices, particularly from a cultural standpoint. The initial phase of this project involves the fine-tuning of the Stable Diffusion model. The necessity for fine-tuning stems from the imperative to infuse a deeper cultural resonance within the AI's creations, ensuring they are not just technically accurate but also emotionally and culturally symbiotic. The Stable Diffusion model, while proficient in image generation, reflects its training on a more general and globally diverse dataset. By fine-tuning it, we delicately weave the intricacies of Shan-Shui-Hua's philosophy and aesthetic principles into the AI's fabric. This process not only counters the prevailing Western-centric perspectives but also fosters a generative space where technology and traditional Chinese artistry coalesce, manifesting works that are genuinely reflective of and rooted in Chinese cultural heritage.

The final output of this project – a video animation and a collection of scroll paintings generated using our fine-tuned models – demonstrates that by inserting more culturally diverse datasets, the bias in practical machine-learning creativity is significantly lessened. They are not only manifesting the possibility of increasing diversity in machine-learning generative models, and thus leading to better performances of pre-trained models, but also indicating the stylistic nuances of Chinese landscape painting fueled by AI’s unique synthetic ability. What’s more, this fusion of past, present, and future showcases another fundamental characteristic of AI, which is its inherent competence to bypass time. Viewers are presented with a captivating experience to “travel” through the river of time, seamlessly immersing them in contemporary re-embodiment of the rich, centuries-old tradition of artistic creativity, encapsulating the timeless essence of human expression and experience.