Entropy by Design: Synthetic Data at Scale
Marah Abdin
Abstract
At scale, under-constructed generation amplifies uniformity, causing synthetic data pipelines to plateau into self-similarity. We approach the problem through entropy-aware design as a system of variability levers - both structural and cognitive - preserving quality and diversity at pretraining scale.
Successful Page Load