Timezone: »
In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities’ descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions’ contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.
Author Information
Klim Zaporojets (Ghent University)
Lucie-Aimée Kaffee (Copenhagen University)
Johannes Deleu (Universiteit Gent)
Thomas Demeester (Ghent University)
Chris Develder (Ghent University - imec)
Chris Develder is associate professor with IDLab in the Department of Information Technology (INTEC) at Ghent University - imec, Ghent, Belgium. He received an MSc in computer science engineering in 1999, and a PhD in electrical engineering in 2003, both from Ghent University. He stayed as a research visitor at UC Davis, CA, USA (Jul.-Oct. 2007) and at Columbia University, NY, USA (2013-2015). Chris leads two research teams within IDLab, one on converting text to knowledge (NLP, mostly information extraction using machine learning), the other on data analytics and machine learning for smart grids. With his teams, he co-authored 200+ papers.
Isabelle Augenstein (University of Copenhagen)
More from the Same Authors
-
2022 : Neural Bayesian Network Understudy »
Paloma Rabaey · Cedric De Boom · Thomas Demeester -
2018 Poster: DeepProbLog: Neural Probabilistic Logic Programming »
Robin Manhaeve · Sebastijan Dumancic · Angelika Kimmig · Thomas Demeester · Luc De Raedt -
2018 Spotlight: DeepProbLog: Neural Probabilistic Logic Programming »
Robin Manhaeve · Sebastijan Dumancic · Angelika Kimmig · Thomas Demeester · Luc De Raedt