Machine learning systems deployed in the wild are often trained on a source distribution that differs from the target distribution on which it is deployed. Unlabeled data can be a powerful source of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect many scenarios that arise naturally in real-world applications. In this work, we introduce U-WILDS, which augments the WILDS benchmark of in-the-wild distribution shifts with curated unlabeled data that would be realistically obtainable in deployment. U-WILDS contains 8 datasets spanning a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark contemporary methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on the shifts in U-WILDS is limited. To facilitate the development of methods that can work reliably on real-world distribution shifts, we provide an open-source package containing all of the relevant data loaders, model architectures, and methods.