Poster

Beyond Slow Signs in High-fidelity Model Extraction

Hanna Foerster · Robert Mullins · I Shumailov · Jamie Hayes

2024 Poster

Project Page [ Paper] [ Slides] [ OpenReview]

Abstract

Deep neural networks, costly to train and rich in intellectual property value, areincreasingly threatened by model extraction attacks that compromise their confiden-tiality. Previous attacks have succeeded in reverse-engineering model parametersup to a precision of float64 for models trained on random data with at most threehidden layers using cryptanalytical techniques. However, the process was identifiedto be very time consuming and not feasible for larger and deeper models trained onstandard benchmarks. Our study evaluates the feasibility of parameter extractionmethods of Carlini et al. [1] further enhanced by Canales-Martínez et al. [2] formodels trained on standard benchmarks. We introduce a unified codebase thatintegrates previous methods and reveal that computational tools can significantlyinfluence performance. We develop further optimisations to the end-to-end attackand improve the efficiency of extracting weight signs by up to 14.8 times com-pared to former methods through the identification of easier and harder to extractneurons. Contrary to prior assumptions, we identify extraction of weights, notextraction of weight signs, as the critical bottleneck. With our improvements, a16,721 parameter model with 2 hidden layers trained on MNIST is extracted withinonly 98 minutes compared to at least 150 minutes previously. Finally, addressingmethodological deficiencies observed in previous studies, we propose new ways ofrobust benchmarking for future model extraction attacks.

Video

Chat is not available.