Timezone: »

Computing the Bayes-optimal classifier and exact maximum likelihood estimator with a semi-realistic generative model for jet physics
Kyle Cranmer · Matthew Drnevich · Lauren Greenspan · Sebastian Macaluso · Duccio Pappadopulo

Deep learning techniques have proven to be extremely effective in studying complicated, collimated sprays of particles found in high energy particle collisions known as jets. As with most realistic classification tasks, the Bayes-optimal classifier is unknown or intractable, even when trained with simulated data. Here we consider Ginkgo, a semi-realistic simulator for jets that captures the essential physics and produces data with similar features and format. By using a recently-developed hierarchical trellis data structure and dynamic programming algorithm, we are able to exactly marginalize over the combinatorically large space of latent variables associated to this generative model. This allows us to compute the Bayes-optimal classifier and the exact maximum likelihood estimator for this model, which can serve as a powerful benchmarking tool for studying the performance of machine learning approaches to these problems.

Author Information

Kyle Cranmer (University of Wisconsin-Madison)

Kyle Cranmer is an Associate Professor of Physics at New York University and affiliated with NYU's Center for Data Science. He is an experimental particle physicists working, primarily, on the Large Hadron Collider, based in Geneva, Switzerland. He was awarded the Presidential Early Career Award for Science and Engineering in 2007 and the National Science Foundation's Career Award in 2009. Professor Cranmer developed a framework that enables collaborative statistical modeling, which was used extensively for the discovery of the Higgs boson in July, 2012. His current interests are at the intersection of physics and machine learning and include inference in the context of intractable likelihoods, development of machine learning models imbued with physics knowledge, adversarial training for robustness to systematic uncertainty, the use of generative models in the physical sciences, and integration of reproducible workflows in the inference pipeline.

Matthew Drnevich (New York University)
Lauren Greenspan (New York University)
Sebastian Macaluso (New York University)
Duccio Pappadopulo (Bloomberg)

More from the Same Authors