Timezone: »
Vatsal Sharan, "Sample Amplification: Increasing Dataset Size even when Learning is Impossible"
Vatsal Sharan
Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and faithfully output a larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ amplification procedure takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples'' which must be indistinguishable from $m$ samples drawn i.i.d. from $D$. We consider this sample amplification problem in two fundamental settings: the case where $D$ is an arbitrary discrete distribution supported on $k$ elements, and the case where $D$ is a $d$-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, $n$, is significantly less than what would be necessary to learn distribution $D$ to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we also formalize a number of curious directions for future research along this vein.
Author Information
Vatsal Sharan (Stanford University)
More from the Same Authors
-
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: PIDForest: Anomaly Detection via Partial Identification »
Parikshit Gopalan · Vatsal Sharan · Udi Wieder -
2019 Spotlight: PIDForest: Anomaly Detection via Partial Identification »
Parikshit Gopalan · Vatsal Sharan · Udi Wieder -
2018 Poster: Efficient Anomaly Detection via Matrix Sketching »
Vatsal Sharan · Parikshit Gopalan · Udi Wieder -
2018 Poster: A Spectral View of Adversarially Robust Features »
Shivam Garg · Vatsal Sharan · Brian Zhang · Gregory Valiant -
2018 Spotlight: A Spectral View of Adversarially Robust Features »
Shivam Garg · Vatsal Sharan · Brian Zhang · Gregory Valiant -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant