NIPS 2012
Skip to yearly menu bar Skip to main content


Workshop

xLiTe: Cross-Lingual Technologies

Achim Rettinger · Marko Grobelnik · Blaz Fortuna · Xavier Carreras · Juanzi Li

Tahoe D, Harrah’s Special Events Center 2nd Floor

Workshop Motivation:
Automatic text understanding has been an unsolved research problem for many years. This partially results from the dynamic and diverging nature of human languages, which ultimately results in many different varieties of natural language. This variations range from the individual level, to regional and social dialects, and up to seemingly separate languages and language families.
However, in recent years there have been considerable achievements in data driven approaches to computational linguistics exploiting the redundancy in the encoded information and the structures used. Those approaches are mostly not language specific or can even exploit redundancies across languages.
This progress in cross-lingual technologies is largely due to the increased availability of multilingual data in the form of static repositories or streams of documents. In addition parallel and comparable corpora like Wikipedia are easily available and constantly updated. Finally, cross-lingual knowledge bases like DBpedia can be used as an Interlingua to connect structured information across languages. This helps at scaling the traditionally monolingual tasks, such as information retrieval and intelligent information access, to multilingual and cross-lingual applications.

From the application side, there is a clear need for such cross-lingual technology and services, as a) there is a huge disparity between the mean size of languages and the median size. It turns out that 389 (or nearly 6%) of the world’s languages have at least one million speakers and account for 94% of the world’s population. By contrast, the remaining 94% of languages are spoken by only 6% of the world’s people. And b) in many areas like in the EU member states, 56% of the citizens are able to hold a conversation in one language apart from their mother tongue.
Available systems on the market are typically focused on multilingual tasks, such as machine translation, and don’t deal with cross-linguality. A good example is one of the most popular news aggregators, namely Google News that collects news isolated per individual language. The ability to cross the border of a particular language would help many users to consume the breadth of news reporting by joining information in their mother tongue with information from the rest of the world.

Workshop Objectives:
The XLT workshop is aimed at techniques, which strive for flexibility making them applicable across languages and language varieties with less manual effort and manual labeled training data. Such approaches might also be beneficial for solving the pressing task of analyzing the continuously evolving natural language varieties that are not well formed. Such data typically originates from social media, like text messages, forum posts or tweets and often is highly domain dependent.

Workshop Contributions:
The Workshop on cross-lingual technologies (XLT) offers a platform for discussing algorithms and applications for statistical analysis of language resources covering many languages.

Ideal contributions cover one or more of the topics listed below:
• Unsupervised and weakly supervised learning methods for cross-lingual technologies
• Cross-lingual technologies beyond statistical machine translation
• Cross-lingual representations of linguistic structure
• Cross-lingual tasks, such as: XL document linking and comparison; XL topic modeling; XL information extraction; XL semantic distances; XL semantic parsing; XL disambiguation; XL semantic annotation;...
• Cross-lingual language resources and knowledge bases
• Information diffusion across the languages

Live content is unavailable. Log in and register to view live content