Timezone: »
We present a novel system that gets as an input, video frames of a musician playing the piano, and generates the music for that video. The generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events. To achieve the transformation we built a full pipeline named 'Audeo' containing three components. We first translate the video frames of the keyboard and the musician hand movements into raw mechanical musical symbolic representation Piano-Roll (Roll) for each video frame which represents the keys pressed at each time step. We then adapt the Roll to be amenable for audio synthesis by including temporal correlations. This step turns out to be critical for meaningful audio generation. In the last step, we implement Midi synthesizers to generate realistic music. Audeo converts video to audio smoothly and clearly with only a few setup constraints. We evaluate Audeo on piano performance videos collected from Youtube and obtain that their generated music is of reasonable audio quality and can be successfully recognized with high precision by popular music identification software.
Author Information
Kun Su (University of Washington Seattle)
Xiulong Liu (University of Washington)
Eli Shlizerman (Departments of Applied Mathematics and Electrical & Computer Engineering, University of Washington Seattle)
More from the Same Authors
-
2023 Poster: Learning Time-Invariant Representations for Individual Neurons from Population Dynamics »
Lu Mi · Trung Le · Tianxing He · Eli Shlizerman · Uygar Sümbül -
2023 Poster: AMAG: Additive, Multiplicative and Adaptive Graph Neural Network For Forecasting Neuron Activity »
Jingyuan Li · Leo Scholl · Trung Le · Amy Orsborn · Eli Shlizerman -
2022 Poster: STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers »
Trung Le · Eli Shlizerman -
2022 Poster: INRAS: Implicit Neural Representation for Audio Scenes »
Kun Su · Mingfei Chen · Eli Shlizerman -
2021 Poster: How Does it Sound? »
Kun Su · Xiulong Liu · Eli Shlizerman -
2019 : Opening Remarks »
Guillaume Lajoie · Jessica Thompson · Maximilian Puelma Touzel · Eli Shlizerman · Konrad Kording -
2019 Workshop: Real Neurons & Hidden Units: future directions at the intersection of neuroscience and AI »
Guillaume Lajoie · Eli Shlizerman · Maximilian Puelma Touzel · Jessica Thompson · Konrad Kording