Workshop
Table Representation Learning Workshop (TRL)
Madelon Hulsebos 路 Haoyu Dong 路 Laurel Orr 路 Qian Liu 路 Vadim Borisov
East Meeting Room 11, 12
Sat 14 Dec, 8:30 a.m. PST
Tables are a promising modality for representation learning and generative models with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management, data analysis, and ML pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all intended for relational data. Representation learning for tables, possibly combined with other modalities such as code and text, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, data preparation, and data analysis (e.g. text-to-sql). The pre-training paradigm was shown to be effective for tabular ML (classification/regression) as well. More recently, we also observe promising potential in applying and enhancing generative models (e.g. LLMs) in the domain of structured data to improve how we process and derive insights from structured data.
The Table Representation Learning workshop has been the key venue driving this research vision and establishing a community around TRL. The goal of the third edition of TRL at NeurIPS 2024 is to:
1) showcase the latest impactful TRL research, with a particular focus on industry insights this year,
2) explore new applications, techniques and open challenges for representation learning and generative models for tabular data,
3) facilitate discussion and collaboration across the ML, NLP, and DB communities.
Schedule
Sat 8:30 a.m. - 8:40 a.m.
|
Opening notes
(
Opening/closing
)
>
SlidesLive Video |
Madelon Hulsebos 馃敆 |
Sat 8:40 a.m. - 9:20 a.m.
|
Ga毛l Varoquaux (Inria, Probabl): Tabular foundation models for analytics: challenges and progress
(
Invited talk
)
>
link
SlidesLive Video |
Gael Varoquaux 馃敆 |
Sat 9:20 a.m. - 9:30 a.m.
|
MotherNet: Fast Training and Inference via Hyper-Network Transformers
(
Oral
)
>
link
SlidesLive Video |
Andreas Mueller 路 Carlo Curino 路 Raghu Ramakrishnan 馃敆 |
Sat 9:30 a.m. - 9:40 a.m.
|
PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
(
Oral
)
>
link
SlidesLive Video |
Weihua Hu 路 Yiwen Yuan 路 Zecheng Zhang 路 Akihiro Nitta 路 Kaidi Cao 路 Vid Kocijan 路 Jinu Sunil 路 Jure Leskovec 路 Matthias Fey 馃敆 |
Sat 10:00 a.m. - 10:35 a.m.
|
Yasemin Altun (Google DeepMind): Advancements in Structure-Aware Reasoning for Tabular Data ( Invited talk ) > link | Yasemin Altun 馃敆 |
Sat 10:35 a.m. - 10:45 a.m.
|
Large Language Models Engineer Too Many Simple Features for Tabular Data
(
Oral
)
>
link
SlidesLive Video |
Jaris K眉ken 路 Lennart Purucker 路 Frank Hutter 馃敆 |
Sat 10:45 a.m. - 10:55 a.m.
|
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning ( Oral ) > link | Xinyuan Lu 路 Liangming Pan 路 Yubo Ma 路 Preslav Nakov 路 Min-Yen Kan 馃敆 |
Sat 10:45 a.m. - 10:55 a.m.
|
TabDiff: a Unified Diffusion Model for Multi-Modal Tabular Data Generation
(
Oral
)
>
link
SlidesLive Video |
Juntong Shi 路 Minkai Xu 路 Harper Hua 路 Hengrui Zhang 路 Stefano Ermon 路 Jure Leskovec 馃敆 |
Sat 10:55 a.m. - 11:05 a.m.
|
Expertise-Centric Prompting Framework for Financial Tabular Data Generation using Pre-trained Large Language Models
(
Oral
)
>
link
SlidesLive Video |
Subin Kim 路 Jungmin Son 路 Minyoung Jung 路 Youngjun Kwak 馃敆 |
Sat 11:05 a.m. - 11:15 a.m.
|
TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes
(
Oral
)
>
link
SlidesLive Video |
Aamod Khatiwada 路 Harsha Kokel 路 Ibrahim Abdelaziz 路 Subhajit Chaudhury 路 Julian T Dolby 路 Oktie Hassanzadeh 路 Zhenhan Huang 路 Tejaswini Pedapati 路 Horst Samulowitz 路 Kavitha Srinivas 馃敆 |
Sat 11:15 a.m. - 12:00 p.m.
|
Poster session 1
(
Poster Session
)
>
|
馃敆 |
Sat 1:30 p.m. - 2:10 p.m.
|
Matei Zaharia (UC Berkeley/Databricks): Lessons from building natural language query interfaces in Databricks AI/BI
(
Invited talk
)
>
SlidesLive Video |
Matei Zaharia 馃敆 |
Sat 2:10 p.m. - 2:20 p.m.
|
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
(
Oral
)
>
link
SlidesLive Video |
Satya Krishna Gorti 路 Ilan Gofman 路 Zhaoyan Liu 路 Jiapeng Wu 路 No毛l Vouitsis 路 Guangwei Yu 路 Jesse Cresswell 路 Rasa Hosseinzadeh 馃敆 |
Sat 2:20 p.m. - 2:30 p.m.
|
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
(
Oral
)
>
link
SlidesLive Video |
Karime Maamari 路 Fadhil Abubaker 路 Daniel Jaroslawicz 路 Amine Mhedhbi 馃敆 |
Sat 2:30 p.m. - 3:15 p.m.
|
Poster session 2
(
Poster Session
)
>
|
馃敆 |
Sat 3:30 p.m. - 4:10 p.m.
|
Josh Gardner (Apple):Toward Robust, Reliable, and Generalizable Tabular Data Models
(
Invited Talk
)
>
SlidesLive Video |
Josh Gardner 馃敆 |
Sat 4:10 p.m. - 4:50 p.m.
|
Panel TRL in Industry [tbc]
(
Panel
)
>
SlidesLive Video |
Xiao Ling 路 Shivam Singhal 路 Douwe Kiela 路 Maithra Raghu 路 Binyuan Hui 馃敆 |
Sat 4:50 p.m. - 5:00 p.m.
|
Closing notes
(
Opening/closing
)
>
SlidesLive Video |
Qian Liu 馃敆 |
-
|
On Short Textual Value Column Representation Using Symbol Level Language Models ( Poster ) > link | Ron Begleiter 路 Nathan Roll 馃敆 |
-
|
Lightweight Correlation-Aware Table Compression ( Poster ) > link | Mihail Stoian 路 Alexander van Renen 路 Jan Kobiolka 路 Ping-Lin Kuo 路 Josif Grabocka 路 Andreas Kipf 馃敆 |
-
|
AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler ( Poster ) > link | Changhun Kim 路 Taewon Kim 路 Seungyeon Woo 路 June Yong Yang 路 Eunho Yang 馃敆 |
-
|
RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph ( Poster ) > link | Lindsey Linxi Wei 路 Guorui Xiao 路 Magdalena Balazinska 馃敆 |
-
|
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data ( Poster ) > link | David Schnurr 路 Kai Helli 路 Noah Hollmann 路 Samuel M眉ller 路 Frank Hutter 馃敆 |
-
|
UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining ( Poster ) > link | ShengYun Peng 路 Aishwarya Chakravarthy 路 Seongmin Lee 路 Xiaojing Wang 路 Rajarajeswari Balasubramaniyan 路 Duen Horng Chau 馃敆 |
-
|
DynoClass: A Dynamic Table-Class Detection System Without the Need for Predefined Ontologies ( Poster ) > link | Haonan Wang 路 Eugene Wu 路 Kechen Liu 路 Jiaxiang Liu 馃敆 |
-
|
ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets ( Poster ) > link | Tomas Tokar 路 Scott Sanner 馃敆 |
-
|
Sparsely Connected Layers for Financial Tabular Data ( Poster ) > link | Mohammed Abdulrahman 路 Yin Wang 路 Hui Chen 馃敆 |
-
|
Automating Enterprise Data Engineering with LLMs ( Poster ) > link | Jan-Micha Bodensohn 路 Ulf Brackmann 路 Liane Vogel 路 Anupam Sanghi 路 Carsten Binnig 馃敆 |
-
|
Improving LLM Group Fairness on Tabular Data via In-Context Learning ( Poster ) > link | Valeriia Cherepanova 路 Chia-Jung Lee 路 Nil-Jana Akpinar 路 Riccardo Fogliato 路 Martin Bertran 路 Michael Kearns 路 James Zou 馃敆 |
-
|
Tabular Data Generation using Binary Diffusion ( Poster ) > link | Vitaliy Kinakh 路 Slava Voloshynovskiy 馃敆 |
-
|
AGATa: Attention-Guided Augmentation for Tabular Data in Contrastive Learning ( Poster ) > link | Moonjung Eo 路 Kyungeun Lee 路 Min-Kook Suh 路 Hyeseung Cho 路 Ye Seul Sim 路 Woohyung Lim 馃敆 |
-
|
Synthetic SQL Column Descriptions and Their Impact on Text-to-SQL Performance ( Poster ) > link | Niklas Wretblad 路 Oskar Holmstr枚m 路 Erik Larsson 路 Axel Wiks盲ter 路 Hjalmar 脰hman 路 Oscar S枚derlund 路 Ture Pont茅n 路 Martin Forsberg 路 Martin S枚rme 路 Fredrik Heintz 馃敆 |
-
|
Enhancing Table Representations for Similar Table Recommendation with Synthetic Data Generation ( Poster ) > link | Dayu Yang 路 Natawut Monaikul 路 Amanda Ding 路 Bozhao Tan 路 Kishore Mosaliganti 路 Giridharan Iyengar 馃敆 |
-
|
RES-RAG: Residual-aware RAG for Realistic Tabular Data Generation ( Poster ) > link | Liancheng Fang 路 Aiwei Liu 路 Hengrui Zhang 路 Henry Zou 路 Weizhi Zhang 路 Philip S Yu 馃敆 |
-
|
Tabby: Tabular Adaptation for Language Models ( Poster ) > link | Sonia Cromp 路 Satya Sai Srinath Namburi 路 Catherine Cao 路 Mohammed Alkhudhayri 路 Samuel Guo 路 Nicholas Roberts 路 Frederic Sala 馃敆 |
-
|
Recurrent Interpolants for Probabilistic Time Series Prediction ( Poster ) > link | Yu Chen 路 Marin Bilo拧 路 Sarthak Mittal 路 Wei Deng 路 Kashif Rasul 路 Anderson Schneider 馃敆 |
-
|
TARGET: Benchmarking Table Retrieval for Generative Tasks ( Poster ) > link | Xingyu Ji 路 Aditya Parameswaran 路 Madelon Hulsebos 馃敆 |
-
|
Data-Centric Text-to-SQL with Large Language Models ( Poster ) > link | Zachary Huang 路 Shuo Zhang 路 Kechen Liu 路 Eugene Wu 馃敆 |
-
|
Relational Deep Learning: Graph Representation Learning on Relational Databases ( Poster ) > link |
12 presentersJoshua Robinson 路 Rishabh Ranjan 路 Weihua Hu 路 Kexin Huang 路 Jiaqi Han 路 Alejandro Dobles 路 Matthias Fey 路 Jan Eric Lenssen 路 Yiwen Yuan 路 Zecheng Zhang 路 Xinwei He 路 Jure Leskovec |
-
|
TabFlex: Scaling Tabular Learning to Millions with Linear Attention ( Poster ) > link | Yuchen Zeng 路 Wonjun Kang 路 Andreas Mueller 馃敆 |
-
|
SynQL: Synthetic Data Generation for In-Domain, Low-Resource Text-to-SQL Parsing ( Poster ) > link | Denver Baumgartner 路 Tomasz Kornuta 馃敆 |
-
|
Augmenting Small-size Tabular Data with Class-Specific Energy-Based Models ( Poster ) > link | Andrei Margeloiu 路 Xiangjian Jiang 路 Nikola Simidjievski 路 Mateja Jamnik 馃敆 |
-
|
GAMformer: Exploring In-Context Learning for Generalized Additive Models ( Poster ) > link | Andreas Mueller 路 Julien Siems 路 Harsha Nori 路 David Salinas 路 Arber Zela 路 Rich Caruana 路 Frank Hutter 馃敆 |
-
|
Towards Optimizing SQL Generation via LLM Routing ( Poster ) > link | Mohammadhossein Malekpour 路 Nour Shaheen 路 Foutse Khomh 路 Amine Mhedhbi 馃敆 |
-
|
SALT: Sales Autocompletion Linked Business Tables Dataset ( Poster ) > link | Tassilo Klein 路 Clemens Biehl 路 Margarida Costa 路 Andre Sres 路 Jonas Kolk 路 Johannes Hoffart 馃敆 |
-
|
Learnable Numerical Input Normalization for Tabular Representation Learning based on B-splines ( Poster ) > link | Min-Kook Suh 路 Moonjung Eo 路 Ye Seul Sim 路 Woohyung Lim 馃敆 |
-
|
PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization ( Poster ) > link | Marco Spinaci 路 Marek Polewczyk 路 Johannes Hoffart 路 Markus Kohler 路 Sam Thelin 路 Tassilo Klein 馃敆 |
-
|
Multi-Stage QLoRA with Augmented Structured Dialogue Corpora: Efficient and Improved Conversational Healthcare AI ( Poster ) > link | Dasun Wickrama Arachchi Athukoralage 路 Thushari Atapattu 馃敆 |
-
|
Enhancing Biomedical Schema Matching with LLM-based Training Data Generation ( Poster ) > link | Yurong Liu 路 A茅cio Santos 路 Eduardo Pena 路 Roque Lopez 路 Eden Wu 路 Juliana Freire 馃敆 |
-
|
Scalable Representation Learning for Multimodal Tabular Transactions ( Poster ) > link | Natraj Raman 路 Sumitra Ganesh 路 Manuela Veloso 馃敆 |
-
|
Benchmarking table comprehension in the wild ( Poster ) > link | Yikang Pan 路 Yi Zhu 路 Rand Xie 路 Yizhi Liu 馃敆 |
-
|
Relational Data Generation with Graph Neural Networks and Latent Diffusion Models ( Poster ) > link | Valter Hudovernik 馃敆 |
-
|
Towards Localization via Data Embedding for TabPFN ( Poster ) > link | Mykhailo Koshil 路 Thomas Nagler 路 Matthias Feurer 路 Katharina Eggensperger 馃敆 |
-
|
Unmasking Trees for Tabular Data ( Poster ) > link | Calvin McCarter 馃敆 |
-
|
Matchmaker: Self-Improving Compositional LLM Programs for Table Schema Matching ( Poster ) > link | Nabeel Seedat 路 Mihaela van der Schaar 馃敆 |
-
|
Towards Agentic Schema Refinement ( Poster ) > link | Agapi Rissaki 路 Ilias Fountalis 路 Nikolaos Vasiloglou 路 Wolfgang Gatterbauer 馃敆 |
-
|
Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection ( Poster ) > link | Chuhong Mai 路 Ro-ee Tal 路 Thahir Mohamed 馃敆 |
-
|
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features ( Poster ) > link | Shi Bin Hoo 路 Samuel M眉ller 路 David Salinas 路 Frank Hutter 馃敆 |
-
|
Scaling Generative Tabular Learning for Large Language Models ( Poster ) > link | Yiming Sun 路 Xumeng Wen 路 Shun Zheng 路 Xiaowei Jia 路 Jiang Bian 馃敆 |
-
|
TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data ( Poster ) > link | Suiyao Chen 路 Jing Wu 路 Yunxiao Wang 路 Cheng Ji 路 Tianpei Xie 路 Daniel Cociorva 路 Michael Sharps 路 Cecile Levasseur 路 Hakan Brunzell 馃敆 |
-
|
LLM Embeddings Improve Test-time Adaptation to Tabular -Shifts ( Poster ) > link | Yibo Zeng 路 Jiashuo Liu 路 Henry Lam 路 Hongseok Namkoong 馃敆 |
-
|
Unlearning Tabular Data Without a "Forget Set'' ( Poster ) > link | Aviraj Newatia 路 Michael Cooper 路 Rahul Krishnan 馃敆 |
-
|
From One to Zero: RAG-IM Adapts Language Models for Interpretable Zero-Shot Predictions on Clinical Tabular Data ( Poster ) > link | Sazan Mahbub 路 Caleb Ellington 路 Sina Alinejad 路 Kevin Wen 路 Yingtao Luo 路 Ben Lengerich 路 Eric Xing 馃敆 |
-
|
Adaptivee: Adaptive Ensemble for Tabular Data ( Poster ) > link | Dawid P艂udowski 路 Katarzyna Wo藕nica 馃敆 |
-
|
Distributionally robust self-supervised learning for tabular data ( Poster ) > link | Shantanu Ghosh 路 Tiankang Xie 路 Mikhail Kuznetsov 馃敆 |
-
|
Exploration of autoregressive models for in-context learning on tabular data ( Poster ) > link | Stefan Baur 路 Sohyeong Kim 馃敆 |
-
|
TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features ( Poster ) > link | Gleb Bazhenov 路 Oleg Platonov 路 Liudmila Prokhorenkova 馃敆 |
-
|
Adapting TabPFN for Zero-Inflated Metagenomic Data ( Poster ) > link | Giulia Perciballi 路 Federica Granese 路 Ahmad Fall 路 Farida ZEHRAOUI 路 Edi Prifti 路 Jean-Daniel Zucker 馃敆 |
-
|
HySem: A context length optimized LLM pipeline for unstructured tabular extraction ( Poster ) > link | Narayanan PP 路 Anantharaman Palacode Narayana Iyer 馃敆 |
-
|
Poster session 2
(
Poster Session
)
>
|
馃敆 |