Timezone: »
Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.
Author Information
Elliot Crowley (University of Edinburgh)
Gavia Gray (University of Edinburgh)
Amos Storkey (University of Edinburgh)
More from the Same Authors
-
2021 : Hamiltonian prior to Disentangle Content and Motion in Image Sequences »
Asif Khan · Amos Storkey -
2022 : Parity in predictive performance is neither necessary nor sufficient for fairness »
Justin Engelmann · Miguel Bernabeu · Amos Storkey -
2022 : Deep Class-Conditional Gaussians for Continual Learning »
Thomas Lee · Amos Storkey -
2022 : Sequence Modeling Motion-Captured Dance »
Emily Napier · Gavia Gray · Sageev Oore -
2022 Poster: Hamiltonian Latent Operators for content and motion disentanglement in image sequences »
Asif Khan · Amos Storkey -
2021 Poster: Gradient-based Hyperparameter Optimization Over Long Horizons »
Paul Micaelli · Amos Storkey -
2020 Poster: Self-Supervised Relational Reasoning for Representation Learning »
Massimiliano Patacchiola · Amos Storkey -
2020 Spotlight: Self-Supervised Relational Reasoning for Representation Learning »
Massimiliano Patacchiola · Amos Storkey -
2020 Poster: Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels »
Massimiliano Patacchiola · Jack Turner · Elliot Crowley · Michael O'Boyle · Amos Storkey -
2020 Spotlight: Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels »
Massimiliano Patacchiola · Jack Turner · Elliot Crowley · Michael O'Boyle · Amos Storkey -
2019 Poster: Zero-shot Knowledge Transfer via Adversarial Belief Matching »
Paul Micaelli · Amos Storkey -
2019 Spotlight: Zero-shot Knowledge Transfer via Adversarial Belief Matching »
Paul Micaelli · Amos Storkey -
2019 Poster: Learning to Learn By Self-Critique »
Antreas Antoniou · Amos Storkey -
2015 Poster: Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling »
Xiaocheng Shang · Zhanxing Zhu · Benedict Leimkuhler · Amos Storkey -
2014 Workshop: NIPS Workshop on Transactional Machine Learning and E-Commerce »
David Parkes · David H Wolpert · Jennifer Wortman Vaughan · Jacob D Abernethy · Amos Storkey · Mark Reid · Ping Jin · Nihar Bhadresh Shah · Mehryar Mohri · Luis E Ortiz · Robin Hanson · Aaron Roth · Satyen Kale · Sebastien Lahaie -
2012 Poster: Continuous Relaxations for Discrete Hamiltonian Monte Carlo »
Zoubin Ghahramani · Yichuan Zhang · Charles Sutton · Amos Storkey -
2012 Spotlight: Continuous Relaxations for Discrete Hamiltonian Monte Carlo »
Zoubin Ghahramani · Yichuan Zhang · Charles Sutton · Amos Storkey -
2012 Poster: The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes »
Simon Lyons · Amos Storkey · Simo Sarkka -
2011 Poster: Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability »
David Reichert · Peggy Series · Amos Storkey -
2011 Spotlight: Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability »
David Reichert · Peggy Series · Amos Storkey -
2010 Poster: Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model »
David Reichert · Peggy Series · Amos Storkey -
2010 Poster: Sparse Instrumental Variables (SPIV) for Genome-Wide Studies »
Felix V Agakov · Paul McKeigue · Jon Krohn · Amos Storkey -
2007 Poster: Continuous Time Particle Filtering for fMRI »
Lawrence Murray · Amos Storkey -
2007 Poster: Modelling motion primitives and their timing in biologically executed movements »
Ben H Williams · Marc Toussaint · Amos Storkey -
2006 Poster: Learning Structural Equation Models for fMRI »
Amos Storkey · Enrico Simonotto · Heather Whalley · Stephen Lawrie · Lawrence Murray · David McGonigle -
2006 Poster: Mixture Regression for Covariate Shift »
Amos Storkey · Masashi Sugiyama