Timezone: »
Workshop Motivation:
Automatic text understanding has been an unsolved research problem for many years. This partially results from the dynamic and diverging nature of human languages, which ultimately results in many different varieties of natural language. This variations range from the individual level, to regional and social dialects, and up to seemingly separate languages and language families.
However, in recent years there have been considerable achievements in data driven approaches to computational linguistics exploiting the redundancy in the encoded information and the structures used. Those approaches are mostly not language specific or can even exploit redundancies across languages.
This progress in cross-lingual technologies is largely due to the increased availability of multilingual data in the form of static repositories or streams of documents. In addition parallel and comparable corpora like Wikipedia are easily available and constantly updated. Finally, cross-lingual knowledge bases like DBpedia can be used as an Interlingua to connect structured information across languages. This helps at scaling the traditionally monolingual tasks, such as information retrieval and intelligent information access, to multilingual and cross-lingual applications.
From the application side, there is a clear need for such cross-lingual technology and services, as a) there is a huge disparity between the mean size of languages and the median size. It turns out that 389 (or nearly 6%) of the world’s languages have at least one million speakers and account for 94% of the world’s population. By contrast, the remaining 94% of languages are spoken by only 6% of the world’s people. And b) in many areas like in the EU member states, 56% of the citizens are able to hold a conversation in one language apart from their mother tongue.
Available systems on the market are typically focused on multilingual tasks, such as machine translation, and don’t deal with cross-linguality. A good example is one of the most popular news aggregators, namely Google News that collects news isolated per individual language. The ability to cross the border of a particular language would help many users to consume the breadth of news reporting by joining information in their mother tongue with information from the rest of the world.
Workshop Objectives:
The XLT workshop is aimed at techniques, which strive for flexibility making them applicable across languages and language varieties with less manual effort and manual labeled training data. Such approaches might also be beneficial for solving the pressing task of analyzing the continuously evolving natural language varieties that are not well formed. Such data typically originates from social media, like text messages, forum posts or tweets and often is highly domain dependent.
Workshop Contributions:
The Workshop on cross-lingual technologies (XLT) offers a platform for discussing algorithms and applications for statistical analysis of language resources covering many languages.
Ideal contributions cover one or more of the topics listed below:
• Unsupervised and weakly supervised learning methods for cross-lingual technologies
• Cross-lingual technologies beyond statistical machine translation
• Cross-lingual representations of linguistic structure
• Cross-lingual tasks, such as: XL document linking and comparison; XL topic modeling; XL information extraction; XL semantic distances; XL semantic parsing; XL disambiguation; XL semantic annotation;...
• Cross-lingual language resources and knowledge bases
• Information diffusion across the languages
Author Information
Achim Rettinger (Karlsruhe Institute of Technology)
Marko Grobelnik (Jozef Stefan Institute)
Blaz Fortuna (Jozef Stefan Institute)
Xavier Carreras (Universitat Politècnica de Catalunya)
Juanzi Li (Tsinghua University)
More from the Same Authors
-
2017 Workshop: Workshop on Prioritising Online Content »
John Shawe-Taylor · Massimiliano Pontil · Nicolò Cesa-Bianchi · Emine Yilmaz · Chris Watkins · Sebastian Riedel · Marko Grobelnik -
2016 : Extracting Templates from Media Event Sequences »
Marko Grobelnik -
2013 Workshop: Knowledge Extraction from Text (KET) »
Marko Grobelnik · Blaz Fortuna · Estevam Hruschka · Michael J Witbrock -
2013 Demonstration: Cross-Lingual Technologies: Text to Logic Mapping, Search and Classification over 100 Languages »
Jan Rupnik · Andrej Muhic · Blaz Fortuna · Janez Starc · Marko Grobelnik · Michael J Witbrock -
2013 Poster: Unsupervised Spectral Learning of Finite State Transducers »
Raphael Bailly · Xavier Carreras · Ariadna Quattoni -
2013 Spotlight: Unsupervised Spectral Learning of Finite State Transducers »
Raphael Bailly · Xavier Carreras · Ariadna Quattoni -
2013 Demonstration: Semi-supervised learning for multilingual text to logic mapping »
Janez Starc · Marko Grobelnik · Michael J Witbrock -
2006 Demonstration: OntoGen »
Blaž Fortuna · Dunja Mladenic · Marko Grobelnik