Detecting Backdoors with Meta-Models
Lauro Langosco · Neel Alex · William Baker · David Quarel · Herbie Bradley · David Krueger
2023 Poster
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly
Abstract
It is widely known that it is possible to implant backdoors into neural networks,by which an attacker can choose an input to produce a particular undesirable output (e.g.\ misclassify an image).We propose to use \emph{meta-models}, neural networks that take another network's parameters as input, to detect backdoors directly from model weights.To this end we present a meta-model architecture and train it on a dataset of approx.\ 4000 clean and backdoored CNNs trained on CIFAR-10.Our approach is simple and scalable, and is able to detect the presence of a backdoor with $>99\%$ accuracy when the test trigger pattern is i.i.d., with some success even on out-of-distribution backdoors.
Chat is not available.
Successful Page Load