Timezone: »

Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning
David Vos · Till Döhmen · Sebastian Schelter
Event URL: https://openreview.net/forum?id=8kyYJs2YkFH »

Data wrangling tasks for data integration and cleaning arise in virtually every data-driven application scenario nowadays. Recent research indicated the astounding potential of Large Language Models (LLMs) for such tasks. The automation of data wrangling with LLMs poses additional challenges, however, as hand-tuning task and data-specific prompts for LLMs requires high expertise and manual effort. On the other hand, finetuning a whole LLM is more amenable to automation, but incurs high storage costs, as a copy of the LLM has to be maintained.In this work, we explore the potential of a lightweight alternative to finetuning an LLM, which automatically learns a continuous prompt. This approach called prefix-tuning does not require updating the original LLM parameters, and can therefore re-use a single LLM instance across tasks. At the same time, it is amenable to automation, as continuous prompts can be automatically learned with standard techniques.We evaluate prefix-tuning on common data wrangling tasks for tabular data such as entity matching, error detection, and data imputation, with promising results. We find that in six out of ten cases, prefix-tuning is within 2.3% of the performance of finetuning, even though it leverages only 0.39% of the parameter updates required for finetuning the full model. These results highlight the potential of prefix-tuning as a parameter-efficient alternative to finetuning for data integration and data cleaning with LLMs.

Author Information

David Vos (University of Amsterdam)
David Vos

I am a recent MSc. graduate in Artificial Intelligence working as a visiting researcher at the Intelligent Data Engineering Lab at the University of Amsterdam. My interests are, among other things, working on advanced data pipelines as well as performing research in natural language processing and machine learning.

Till Döhmen (University of Amsterdam)
Sebastian Schelter (University of Amsterdam)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors