Timezone: »
Poster
FETA: Towards Specializing Foundational Models for Expert Task Applications
Amit Alfassy · Assaf Arbelle · Oshri Halimi · Sivan Harary · Roei Herzig · Eli Schwartz · Rameswar Panda · Michele Dolfi · Christoph Auer · Peter Staar · Kate Saenko · Rogerio Feris · Leonid Karlinsky
Foundational Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, the parameter capacity of FMs is still limited, leading to poor out-of-the-box performance of FMs on many expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training. This underlines the necessity to explicitly evaluate and finetune FMs on such expert tasks, arguably ones that appear the most in practical real-world applications. In this paper, we propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation, via learning to match their graphical illustrations to corresponding language descriptions. Our FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. FETA is equipped with a procedure for completely automatic annotation extraction (code would be released upon acceptance), allowing easy extension of FETA to more documentation types and application domains in the future. Our automatic annotation leads to an automated performance metric shown to be consistent with metrics computed on human-curated annotations (also released). We provide multiple baselines and analysis of popular FMs on FETA leading to several interesting findings that we believe would be very valuable to the FM community, paving the way towards real-world application of FMs for many practical expert tasks currently being `overlooked' by standard benchmarks focusing on common objects.
Author Information
Amit Alfassy (Technion, IBM Research)
Assaf Arbelle (International Business Machines)
Oshri Halimi (Technion, Technion)
Sivan Harary (IBM-Research)
Roei Herzig (Tel Aviv University)
Eli Schwartz (IBM Research AI)
Rameswar Panda (MIT-IBM Watson AI Lab)
Michele Dolfi (IBM Research Europe)
Christoph Auer (International Business Machines)
Peter Staar (IBM Research)
Kate Saenko (Boston University & MIT-IBM Watson AI Lab, IBM Research)
Rogerio Feris (MIT-IBM Watson AI Lab, IBM Research)
Leonid Karlinsky (Weizmann Institute of Science)
More from the Same Authors
-
2021 Spotlight: Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos »
Reuben Tan · Bryan Plummer · Kate Saenko · Hailin Jin · Bryan Russell -
2021 : Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation »
Aadarsh Sahoo · Rameswar Panda · Rogerio Feris · Kate Saenko · Abir Das -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency »
Samarth Mishra · Kate Saenko · Venkatesh Saligrama -
2022 : Fifteen-minute Competition Overview Video »
Kate Saenko · Samarth Mishra · Dina Bashkirova · Vitaly Ablavsky · Sarah Bargal · Rachel Lai · Piotr Teterwak · James Akl · Fadi Alladkani · Donghyun Kim · Berk Calli -
2022 Competition: VisDA 2022 Challenge: Sim2Real Domain Adaptation for Industrial Recycling »
Dina Bashkirova · Samarth Mishra · Piotr Teterwak · Donghyun Kim · Rachel Lai · Fadi Alladkani · James Akl · Vitaly Ablavsky · Sarah Bargal · Berk Calli · Kate Saenko -
2022 : Challenge Introduction »
Dina Bashkirova · Samarth Mishra · Piotr Teterwak · Donghyun Kim · Sarah Bargal · Diala Lteif · Kate Saenko -
2022 : Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark »
Vitali Petsiuk · Alexander E. Siemenn · Saisamrit Surbehera · Qi Qi Chin · Keith Tyser · Gregory Hunter · Arvind Raghavan · Yann Hicke · Bryan Plummer · Ori Kerret · Tonio Buonassisi · Kate Saenko · Armando Solar-Lezama · Iddo Drori -
2022 Poster: DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations »
Ximeng Sun · Ping Hu · Kate Saenko -
2022 Poster: Procedural Image Programs for Representation Learning »
Manel Baradad · Richard Chen · Jonas Wulff · Tongzhou Wang · Rogerio Feris · Antonio Torralba · Phillip Isola -
2022 Poster: Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens »
Elad Ben Avraham · Roei Herzig · Karttikeya Mangalam · Amir Bar · Anna Rohrbach · Leonid Karlinsky · Trevor Darrell · Amir Globerson -
2022 Poster: Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing »
Nataniel Ruiz · Sarah Bargal · Cihang Xie · Kate Saenko · Stan Sclaroff -
2022 Poster: How Transferable are Video Representations Based on Synthetic Data? »
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris -
2021 Workshop: Distribution shifts: connecting methods and applications (DistShift) »
Shiori Sagawa · Pang Wei Koh · Fanny Yang · Hongseok Namkoong · Jiashi Feng · Kate Saenko · Percy Liang · Sarah Bird · Sergey Levine -
2021 Poster: Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data »
Ashraful Islam · Chun-Fu (Richard) Chen · Rameswar Panda · Leonid Karlinsky · Rogerio Feris · Richard J. Radke -
2021 Poster: OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization »
Kuniaki Saito · Donghyun Kim · Kate Saenko -
2021 Poster: IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers »
Bowen Pan · Rameswar Panda · Yifan Jiang · Zhangyang Wang · Rogerio Feris · Aude Oliva -
2021 Poster: Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos »
Reuben Tan · Bryan Plummer · Kate Saenko · Hailin Jin · Bryan Russell -
2021 : VisDA21: Visual Domain Adaptation + Q&A »
Kate Saenko · Kuniaki Saito · Donghyun Kim · Samarth Mishra · Ben Usman · Piotr Teterwak · Dina Bashkirova · Dan Hendrycks -
2021 Poster: Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing »
Aadarsh Sahoo · Rutav Shah · Rameswar Panda · Kate Saenko · Abir Das -
2020 Poster: Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment »
Ben Usman · Avneesh Sud · Nick Dufour · Kate Saenko -
2020 Poster: Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation »
Ping Hu · Stan Sclaroff · Kate Saenko -
2020 Poster: Universal Domain Adaptation through Self Supervision »
Kuniaki Saito · Donghyun Kim · Stan Sclaroff · Kate Saenko -
2020 Poster: Auxiliary Task Reweighting for Minimum-data Learning »
Baifeng Shi · Judy Hoffman · Kate Saenko · Trevor Darrell · Huijuan Xu -
2020 Poster: AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning »
Ximeng Sun · Rameswar Panda · Rogerio Feris · Kate Saenko -
2019 : Adaptive Multi-Task Neural Networks for Efficient Inference »
Rogerio Feris -
2019 Poster: Adversarial Self-Defense for Cycle-Consistent GANs »
Dina Bashkirova · Ben Usman · Kate Saenko -
2018 Poster: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction »
Roei Herzig · Moshiko Raboh · Gal Chechik · Jonathan Berant · Amir Globerson -
2018 Poster: Delta-encoder: an effective sample synthesis method for few-shot object recognition »
Eli Schwartz · Leonid Karlinsky · Joseph Shtok · Sivan Harary · Mattias Marder · Abhishek Kumar · Rogerio Feris · Raja Giryes · Alex Bronstein -
2018 Spotlight: Delta-encoder: an effective sample synthesis method for few-shot object recognition »
Eli Schwartz · Leonid Karlinsky · Joseph Shtok · Sivan Harary · Mattias Marder · Abhishek Kumar · Rogerio Feris · Raja Giryes · Alex Bronstein -
2018 Poster: Dialog-based Interactive Image Retrieval »
Xiaoxiao Guo · Hui Wu · Yu Cheng · Steven Rennie · Gerald Tesauro · Rogerio Feris -
2018 Poster: Speaker-Follower Models for Vision-and-Language Navigation »
Daniel Fried · Ronghang Hu · Volkan Cirik · Anna Rohrbach · Jacob Andreas · Louis-Philippe Morency · Taylor Berg-Kirkpatrick · Kate Saenko · Dan Klein · Trevor Darrell -
2018 Poster: Co-regularized Alignment for Unsupervised Domain Adaptation »
Abhishek Kumar · Prasanna Sattigeri · Kahini Wadhawan · Leonid Karlinsky · Rogerio Feris · Bill Freeman · Gregory Wornell -
2016 : Invited Talk: Domain Adaption for Perception and Action (Kate Saenko, Boston University) »
Kate Saenko -
2015 Workshop: Transfer and Multi-Task Learning: Trends and New Perspectives »
Anastasia Pentina · Christoph Lampert · Sinno Jialin Pan · Mingsheng Long · Judy Hoffman · Baochen Sun · Kate Saenko -
2010 Poster: Using body-anchored priors for identifying actions in single images »
Leonid Karlinsky · Michael Dinerstein · Shimon Ullman