Poster
LLM Dataset Inference: Detect Datasets, not Strings
Pratyush Maini · Hengrui Jia · Nicolas Papernot · Adam Dziedzic
West Ballroom A-D #6205
[
Abstract
]
Wed 11 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on *unlicensed data* from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) created significantly later than members (e.g., much more recent Wikipedia articles than ones used to train the model). This temporal shift makes membership inference appear successful. However, these MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution.Instead, we propose a new *dataset inference* method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM trained over multiple documents (such as a book) written by them, rather than one particular string.While dataset inference is also challenging, we solve it by selectively combining multiple membership inference metrics. Our approach successfully distinguishes between the train and test sets of different subsets of Pile with statistically significant p-values$< 0.1$, and no false positives.
Live content is unavailable. Log in and register to view live content