Speech and Language: Unsupervised Latent-Variable Models

Workshop

Speech and Language: Unsupervised Latent-Variable Models

Slav Petrov · Aria Haghighi · Percy Liang · Dan Klein

Sat 13 Dec, 7:30 a.m. PST

[ Abstract ]

Natural language processing (NLP) models must deal with the complexstructure and ambiguity present in human languages. Because labeleddata is unavailable for many domains, languages, and tasks, supervisedlearning approaches only partially address these challenges. Incontrast, unlabeled data is cheap and plentiful, making unsupervisedapproaches appealing. Moreover, in recent years, we have seen excitingprogress in unsupervised learning for many NLP tasks, includingunsupervised word segmentation, part-of-speech and grammar induction,discourse analysis, coreference resolution, document summarization,and topic induction. The goal of this workshop is to bring together researchers from theunsupervised machine learning community and the natural languageprocessing community to facilitate cross-fertilization of techniques,models, and applications. The workshop focus is on the unsupervisedlearning of latent representations for natural language and speech. Inparticular, we are interested in structured prediction models whichare able to discover linguistically sophisticated patterns from rawdata. To provide a common ground for comparison and discussion, wewill provide a cleaned and preprocessed data set for the convenienceof those who would like to participate. This data will containpart-of-speech tags and parse trees in addition to raw sentences. Anexciting direction in unsupervised NLP is the use of parallel text inmultiple languages to provide additional structure on unsupervisedlearning. To that end, we will provide a bilingual corpus with wordalignments, and encourage the participants to push thestate-of-the-art in unsupervised NLP.

Live content is unavailable. Log in and register to view live content