Timezone: »
The lack of accessibility to data on policing has severely limited researchers’ ability to conduct thorough quantitative analyses on police activity and behavior, particularly with regard to predicting and explaining police violence. In the present work, we provide a new dataset that contains information on the personnel, activities, use of force, and complaints in the Chicago Police Department (CPD). The raw data, obtained from the CPD via a series of requests under the Freedom of Information Act (FOIA), consists of 35 unlinked, inconsistent, and undocumented spreadsheets. Our paper provides a cleaned, linked, and documented version of this data that can be reproducibly generated via open source code. We provide a detailed description of the dataset contents, the procedures for cleaning the data, and summary statistics. The data have a rich variety of uses, such as prediction (e.g., predicting misconduct from officer traits, experience, and assigned units), network analysis (e.g., detecting communities within the social network of officers co-listed on complaints), spatiotemporal data analysis (e.g., investigating patterns of officer shooting events), causal inference (e.g., tracking the effects of new disciplinary practices, new training techniques, and new oversight on complaints and use of force), and much more. Access to this dataset will enable the machine learning community to meaningfully engage with the problem of police violence.
Author Information
Thibaut Horel (Massachusetts Institute of Technology)
Lorenzo Masoero (MIT)
Raj Agrawal (MIT)
Daria Roithmayr (University of Southern California)
Trevor Campbell (UBC)
More from the Same Authors
-
2020 Poster: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond »
Charles Margossian · Aki Vehtari · Daniel Simpson · Raj Agrawal -
2018 : Poster Session »
Lorenzo Masoero · Tammo Rukat · Runjing Liu · Sayak Ray Chowdhury · Daniel Coelho de Castro · Claudia Wehrhahn · Feras Saad · Archit Verma · Kelvin Hsu · Irineo Cabreros · Sandhya Prabhakaran · Yiming Sun · Maxime Rischard · Linfeng Liu · Adam Farooq · Jeremiah Liu · Melanie F. Pradier · Diego Romeres · Neill Campbell · Kai Xu · Mehmet M Dundar · Tucker Keuter · Prashnna Gyawali · Eli Sennesh · Alessandro De Palma · Daniel Flam-Shepherd · Takatomi Kubo -
2018 Workshop: All of Bayesian Nonparametrics (Especially the Useful Bits) »
Diana Cai · Trevor Campbell · Michael Hughes · Tamara Broderick · Nick Foti · Sinead Williamson -
2016 Workshop: Practical Bayesian Nonparametrics »
Nick Foti · Tamara Broderick · Trevor Campbell · Michael Hughes · Jeffrey Miller · Aaron Schein · Sinead Williamson · Yanxun Xu -
2016 Poster: Coresets for Scalable Bayesian Logistic Regression »
Jonathan Huggins · Trevor Campbell · Tamara Broderick -
2016 Poster: Edge-exchangeable graphs and sparsity »
Diana Cai · Trevor Campbell · Tamara Broderick -
2015 Poster: Streaming, Distributed Variational Inference for Bayesian Nonparametrics »
Trevor Campbell · Julian Straub · John Fisher III · Jonathan How -
2013 Poster: Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture »
Trevor Campbell · Miao Liu · Brian Kulis · Jonathan How · Lawrence Carin