Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Autonomous Driving

Improving Predictive Performance and Calibration by Weight Fusion in Semantic Segmentation

Timo Saemann · Ahmed Hammam · Andrei Bursuc · Christoph Stiller · Horst-Michael Gross


Averaging predictions of a deep ensemble of networks is a popular and effective method to improve predictive performance and calibration in various benchmarks and Kaggle competitions.However, the runtime and training cost of deep ensembles grow linearly with the size of the ensemble, making them unsuitable for many applications.Averaging ensemble weights instead of predictions circumvents this disadvantage during inference and is typically applied to intermediate checkpoints of a model to reduce training cost. Albeit effective, only few works have improved the understanding and the performance of weight averaging.Here, we revisit this approach and show that a simple weight fusion (WF) strategy can lead to a significantly improved predictive performance and calibration. We describe what prerequisites the weights must meet in terms of weight space, functional space and loss. Furthermore, we present a new test method (called oracle test) to measure the functional space between weights. We demonstrate the versatility of our WF strategy across state of the art segmentation CNNs and Transformers as well as real world datasets such as BDD100K and Cityscapes. We compare WF with similar approaches and show our superiority for in- and out-of-distribution data in terms of predictive performance and calibration.

Chat is not available.