Timezone: »
Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples with unseen style (e.g., speaker identity, emotion, and prosody) derived from an acoustic reference, while facing the following challenges: 1) The highly dynamic style features in expressive voice are difficult to model and transfer; and 2) the TTS models should be robust enough to handle diverse OOD conditions that differ from the source data. This paper proposes GenerSpeech, a text-to-speech model towards high-fidelity zero-shot style transfer of OOD custom voice. GenerSpeech decomposes the speech variation into the style-agnostic and style-specific parts by introducing two components: 1) a multi-level style adaptor to efficiently model a large range of style conditions, including global speaker and emotion characteristics, and the local (utterance, phoneme, and word-level) fine-grained prosodic representations; and 2) a generalizable content adaptor with Mix-Style Layer Normalization to eliminate style information in the linguistic content representation and thus improve model generalization. Our evaluations on zero-shot style transfer demonstrate that GenerSpeech surpasses the state-of-the-art models in terms of audio quality and style similarity. The extension studies to adaptive style transfer further show that GenerSpeech performs robustly in the few-shot data setting. Audio samples are available at \url{https://GenerSpeech.github.io/}.
Author Information
Rongjie Huang (Zhejiang University)
Yi Ren (Sea AI Lab)
Jinglin Liu (Zhejiang University)
Chenye Cui (Zhejiang University)
Zhou Zhao (Zhejiang University)
More from the Same Authors
-
2022 Poster: Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization »
Yang Zhao · Chen Zhang · Haifeng Huang · Haoyuan Li · Zhou Zhao -
2022 Poster: Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech »
Ziyue Jiang · Zhe Su · Zhou Zhao · Qian Yang · Yi Ren · Jinglin Liu · 振辉 叶 -
2022 Poster: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech »
Ziyue Jiang · Zhe Su · Zhou Zhao · Qian Yang · Yi Ren · Jinglin Liu · 振辉 叶 -
2022 Spotlight: GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech »
Rongjie Huang · Yi Ren · Jinglin Liu · Chenye Cui · Zhou Zhao -
2022 Spotlight: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2022 Poster: Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models »
Zijian Zhang · Zhou Zhao · Zhijie Lin -
2021 Poster: PortaSpeech: Portable and High-Quality Generative Text-to-Speech »
Yi Ren · Jinglin Liu · Zhou Zhao -
2021 Poster: Generalizable Multi-linear Attention Network »
Tao Jin · Zhou Zhao -
2020 Poster: Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding »
Zhu Zhang · Zhou Zhao · Zhijie Lin · jieming zhu · Xiuqiang He -
2019 Poster: FastSpeech: Fast, Robust and Controllable Text to Speech »
Yi Ren · Yangjun Ruan · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2018 Poster: MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models »
Boyuan Pan · Yazheng Yang · Hao Li · Zhou Zhao · Yueting Zhuang · Deng Cai · Xiaofei He