Despite the emergence of principled methods for domain adaptation under label shift (where only the class balance changes), the sensitivity of these methods to natural-seeming covariate shifts remains precariously underexplored. Meanwhile, popular deep domain adaptation heuristics, despite showing promise on benchmark datasets, tend to falter when faced with shifts in the class balance. Moreover, it's difficult to assess the state of the field owing to inconsistencies among relevant papers in evaluation criteria, datasets, and baselines. In this paper, we introduce RLSbench, a large-scale benchmark for such relaxed label shift settings, consisting of 11 vision datasets spanning > 200 distribution shift pairs with different class proportions. We evaluate 12 popular domain adaptation methods, demonstrating a more widespread susceptibility to failure under extreme shifts in the class proportions than was previously known. We develop an effective meta-algorithm, compatible with most deep domain adaptation heuristics, that consists of the following two steps: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. Furthermore, we discover that batch-norm adaption of a model trained on source with aforementioned corrections offers a strong baseline, largely missing from prior comparisons. We hope that these findings and the availability of RLSbench will encourage researchers to include rigorously evaluate proposed methods in relaxed label shift settings.