Timezone: »

Towards Learning Semantic Audio Representations from Unlabeled Data
Aren Jansen

Fri Dec 08 04:50 PM -- 05:10 PM (PST) @ None
Event URL: http://media.aau.dk/smc/wp-content/uploads/2017/12/ML4AudioNIPS17_paper_18.pdf »

(+ Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous) Our goal is to learn semantically structured audio representations without relying on categorically labeled data. We consider several class-agnostic semantic constraints that are inherent to non-speech audio: (i) sound categories are invariant to additive noise and translations in time, (ii) mixtures of two sound events inherit the categories of the constituents, and (iii) the categories of events in close temporal proximity in a single recording are likely to be the same or related. We apply these constraints to sample training data for triplet-loss embedding models using a large unlabeled dataset of YouTube soundtracks. The resulting low-dimensional representations provide both greatly improved query-by-example retrieval performance and reduced labeled data and model complexity requirements for supervised sound classification.

Author Information

Aren Jansen (Google, Inc.)

More from the Same Authors