Timezone: »
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.
Author Information
Sercan Arik (Google)
Jitong Chen (ByteDance)
Kainan Peng (Baidu Research)
Wei Ping (Baidu Research)
Yanqi Zhou (Baidu Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Spotlight: Neural Voice Cloning with a Few Samples »
Tue. Dec 4th 08:30 -- 08:35 PM Room Room 220 CD
More from the Same Authors
-
2022 Poster: Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models »
Boxin Wang · Wei Ping · Chaowei Xiao · Peng Xu · Mostofa Patwary · Mohammad Shoeybi · Bo Li · Anima Anandkumar · Bryan Catanzaro -
2022 Poster: Factuality Enhanced Language Models for Open-Ended Text Generation »
Nayeon Lee · Wei Ping · Peng Xu · Mostofa Patwary · Pascale N Fung · Mohammad Shoeybi · Bryan Catanzaro -
2021 Poster: Long-Short Transformer: Efficient Transformers for Language and Vision »
Chen Zhu · Wei Ping · Chaowei Xiao · Mohammad Shoeybi · Tom Goldstein · Anima Anandkumar · Bryan Catanzaro -
2017 Poster: Deep Voice 2: Multi-Speaker Neural Text-to-Speech »
Andrew Gibiansky · Sercan Arik · Gregory Diamos · John Miller · Kainan Peng · Wei Ping · Jonathan Raiman · Yanqi Zhou -
2017 Spotlight: Deep Voice 2: Multi-Speaker Neural Text-to-Speech »
Andrew Gibiansky · Sercan Arik · Gregory Diamos · John Miller · Kainan Peng · Wei Ping · Jonathan Raiman · Yanqi Zhou -
2016 Poster: Learning Infinite RBMs with Frank-Wolfe »
Wei Ping · Qiang Liu · Alexander Ihler -
2015 Poster: Decomposition Bounds for Marginal MAP »
Wei Ping · Qiang Liu · Alexander Ihler