Workshop: InterNLP: Workshop on Interactive Learning for Natural Language Processing
Aida Nematzadeh: On Evaluating Neural Representations
There has been an increased interest in developing general-purpose pretrained models across different domains, such as language, vision, and multimodal. This approach is appealing because we can pretrain models on large datasets once, and then adapt them to various tasks using a smaller supervised dataset. Moreover, these models achieve impressive results on a range of benchmarks, often performing better than task-specific models. Finally, this pretraining approach processes the data passively and does not rely on actively interacting with humans. In this talk, I will first discuss what aspects of language children can learn passively and to what extent interacting with others might require developing theory of mind. Next, I discuss the need for better evaluation pipelines to better understand the shortcomings and strengths of pretrained models. In particular, I will talk about: (1) the necessity of directly measuring real-world performance (as opposed to relying on benchmark performance), (2) the importance of strong baselines, and (3) how to design probing dataset to measure certain capabilities of our models. I will focus on commonsense reasoning, verb understanding, and theory of mind as challenging domains for our existing pretrained models.