Skip to yearly menu bar Skip to main content


Poster

Post-Hoc Reversal: Are We Selecting Models Prematurely?

Rishabh Ranjan · Saurabh Garg · Mrigank Raman · Carlos Guestrin · Zachary Lipton


Abstract: Trained models are often composed with post-hoc transforms such as temperaturescaling (TS), ensembling and stochastic weight averaging (SWA) to improveperformance, robustness, uncertainty estimation, etc. However, such transforms aretypically applied only after the base models have already been finalized by standardmeans. In this paper, we challenge this practice with an extensive empirical study.In particular, we demonstrate a phenomenon that we call post-hoc reversal, whereperformance trends are reversed after applying these post-hoc transforms. Thisphenomenon is especially prominent in high-noise settings. For example, whilebase models overfit badly early in training, both conventional ensembling and SWAfavor base models trained for more epochs. Post-hoc reversal can also suppress theappearance of double descent and mitigate mismatches between test loss and testerror seen in base models. Based on our findings, we propose post-hoc selection, asimple technique whereby post-hoc metrics inform model development decisionssuch as early stopping, checkpointing, and broader hyperparameter choices. Ourexperimental analyses span real-world vision, language, tabular and graph datasetsfrom domains like satellite imaging, language modeling, census prediction andsocial network analysis. On an LLM instruction tuning dataset, post-hoc selectionresults in $> 1.5 \times$ MMLU improvement compared to naive selection.

Live content is unavailable. Log in and register to view live content