Timezone: »
Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training. (1) We find that barely over 50% of parameters help during any given iteration. (2) Some entire layers hurt overall, moving on average against the training gradient, a phenomenon we hypothesize may be due to phase lag in an oscillatory training process. (3) Finally, increments in learning proceed in a synchronized manner across layers, often peaking on identical iterations.
Author Information
Janice Lan (Uber AI)
Rosanne Liu (Uber AI Labs)
Hattie Zhou (Uber)
Jason Yosinski (Uber AI; Recursion)
Dr. Jason Yosinski is a machine learning researcher, was a founding member of Uber AI Labs, and is scientific adviser to Recursion Pharmaceuticals and several other companies. His work focuses on building more capable and more understandable AI. As scientists and engineers build increasingly powerful AI systems, the abilities of these systems increase faster than does our understanding of them, motivating much of his work on AI Neuroscience: an emerging field of study that investigates fundamental properties and behaviors of AI systems. Dr. Yosinski completed his PhD as a NASA Space Technology Research Fellow working at the Cornell Creative Machines Lab, the University of Montreal, Caltech/NASA Jet Propulsion Laboratory, and Google DeepMind. His work on AI has been featured on NPR, Fast Company, the Economist, TEDx, XKCD, and on the BBC. Prior to his academic career, Jason cofounded two web technology companies and started a program in the Los Angeles school district that teaches students algebra via hands-on robotics. In his free time, Jason enjoys cooking, sailing, motorcycling, reading, paragliding, and sometimes pretending he's an artist.
More from the Same Authors
-
2022 : Chemistry Insights for Large Pretrained GNNs »
Janice Lan · Katherine Xu -
2022 Poster: Spherical Channels for Modeling Atomic Interactions »
Larry Zitnick · Abhishek Das · Adeesh Kolluru · Janice Lan · Muhammed Shuaibi · Anuroop Sriram · Zachary Ulissi · Brandon Wood -
2020 Poster: Supermasks in Superposition »
Mitchell Wortsman · Vivek Ramanujan · Rosanne Liu · Aniruddha Kembhavi · Mohammad Rastegari · Jason Yosinski · Ali Farhadi -
2019 : Panel - The Role of Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez »
Aparna Lakshmiratan · Finale Doshi-Velez · Surya Ganguli · Zachary Lipton · Michela Paganini · Anima Anandkumar · Jason Yosinski -
2019 Poster: Hamiltonian Neural Networks »
Sam Greydanus · Misko Dzamba · Jason Yosinski -
2019 Poster: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask »
Hattie Zhou · Janice Lan · Rosanne Liu · Jason Yosinski -
2018 : Jason Yosinski, "Good and bad assumptions in model design and interpretability" »
Jason Yosinski -
2018 Poster: Faster Neural Networks Straight from JPEG »
Lionel Gueguen · Alex Sergeev · Ben Kadlec · Rosanne Liu · Jason Yosinski -
2018 Poster: An intriguing failing of convolutional neural networks and the CoordConv solution »
Rosanne Liu · Joel Lehman · Piero Molino · Felipe Petroski Such · Eric Frank · Alex Sergeev · Jason Yosinski -
2017 Symposium: Interpretable Machine Learning »
Andrew Wilson · Jason Yosinski · Patrice Simard · Rich Caruana · William Herlands -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 Demonstration: Adventures with Deep Generator Networks »
Jason Yosinski · Anh Nguyen · Jeff Clune · Douglas K Bemis -
2016 Poster: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks »
Anh Nguyen · Alexey Dosovitskiy · Jason Yosinski · Thomas Brox · Jeff Clune -
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Demonstration: Playing with Convnets »
Jason Yosinski · Hod Lipson -
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson