SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
in
Workshop: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design
Abstract
Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe --- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the user's preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The combination of these features creates a novel workflow for musicians exemplifying the usefulness of systems developed with a foundation of multimodal deep learning.