Timezone: »

Poster Session Speech: source separation, enhancement, recognition, synthesis
Shuayb Zarar · Rasool Fakoor · SRI HARSHA DUMPALA · Minje Kim · Paris Smaragdis · Mohit Dubey · Jong Hwan Ko · Sakriani Sakti · Yuxuan Wang · Lijiang Guo · Garrett T Kenyon · Andros Tjandra · Tycho Tax · Younggun Lee

Fri Dec 08 09:45 AM -- 11:00 AM (PST) @ None
Event URL: http://media.aau.dk/smc/ml4audio/ »

Poster abstracts and full papers: http://media.aau.dk/smc/ml4audio/

SPEECH SOURCE SEPARATION *Lijiang Guo and Minje Kim. Bitwise Source Separation on Hashed Spectra: An Efficient Posterior Estimation Scheme Using Partial Rank Order Metrics *Minje Kim and Paris Smaragdis. Bitwise Neural Networks for Efficient Single­Channel Source Separation *Mohit Dubey, Garrett Kenyon, Nils Carlson and Austin Thresher. Does Phase Matter For Monaural Source Separation?

SPEECH ENHANCEMENT *Rasool Fakoor, Xiaodong He, Ivan Tashev and Shuayb Zarar. Reinforcement Learning To Adapt Speech Enhancement to Instantaneous Input Signal Quality *Jong Hwan Ko, Josh Fromm, Matthai Phillipose, Ivan Tashev and Shuayb Zarar. Precision Scaling of Neural Networks for Efficient Audio Processing

AUTOMATIC SPEECH RECOGNITION Marius Paraschiv, Lasse Borgholt, Tycho Tax, Marco Singh and Lars Maaløe. Exploiting Nontrivial Connectivity for Automatic Speech Recognition *Brian Mcmahan and Delip Rao. Listening to the World Improves Speech Command Recognition * Andros Tjandra, Sakriani Sakti and Satoshi Nakamura. End­-to-­End Speech Recognition with Local Monotonic Attention Sri Harsha Dumpala, Rupayan Chakraborty and Sunil Kumar Kopparapu. A Novel Approach for Effective Learning in Low Resourced Scenarios

SPEECH SYNTHESIS *Yuxuan Wang, Rj Skerry­Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark and Rif A. Saurous. Uncovering Latent Style Factors for Expressive Speech Synthesis *Younggun Lee, Azam Rabiee and Soo-Young Lee. Emotional End-to-End Neural Speech Synthesizer

Author Information

Shuayb Zarar (Microsoft AI and Research)
Rasool Fakoor (University of Texas At Arlington)
SRI HARSHA DUMPALA (TCS Research and Innovation)
Minje Kim (Indiana University)
Paris Smaragdis (University of Illinois Urbana-Champaign)
Mohit Dubey (Oberlin College)
Jong Hwan Ko (Georgia Institute of Technology)
Sakriani Sakti (Nara Institute of Science and Technology)

SAKRIANI SAKTI received the DAAD-Siemens Program Asia 21st Century Award to study in Communication Technology, University of Ulm, Germany, and received her MSc degree in 2002. During her thesis work, she worked with the Speech Understanding Department, DaimlerChrysler Research Center, Ulm, Germany. Between 2003-2009, she worked as a researcher at ATR SLC Labs, Japan, and during 2006-2011, she worked as an expert researcher at NICT SLC Groups, Japan. While working with ATR-NICT, Japan, she continued her study (2005-2008) with Dialog Systems Group University of Ulm, Germany, and received her Ph.D. degree in 2008. She actively involved in collaboration activities such as Asian Pacific Telecommunity Project (2003-2007), A-STAR, and U-STAR (2006-2011). In 2009-2011, she served as a visiting professor of the Computer Science Department, University of Indonesia (UI), Indonesia. In 2011-2017, she was an assistant professor at the Augmented Human Communication Laboratory, NAIST, Japan. She served also as a visiting scientific researcher of INRIA Paris-Rocquencourt, France, in 2015-2016, under JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation. Currently, she is a research associate professor at NAIST, as well as a research scientist at RIKEN, Center for Advanced Intelligent Project AIP, Japan. She is a member of JNS, SFN, ASJ, ISCA, IEICE, and IEEE. She is also the officer of ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL) and a Board Member of Spoken Language Technologies for Under-Resourced Languages (SLTU). Her research interests include statistical pattern recognition, graphical modeling framework, deep learning, multilingual speech recognition and synthesis, spoken language translation, affective dialog system, and cognitive-communication.

Yuxuan Wang (Google)
Lijiang Guo (Indiana University)
Garrett T Kenyon (Los Alamos National Laboratory)
Andros Tjandra (Nara Institute of Science and Technology)
Tycho Tax (Corti)
Younggun Lee (Korea advanced institute of science and technology)

More from the Same Authors