Workshop: Advances and Opportunities: Machine Learning for Education

Kumar Garg, Neil Heffernan, Kayla Meyers

2020-12-11T05:30:00-08:00 - 2020-12-11T14:10:00-08:00
Abstract: This workshop will explore how advances in machine learning could be applied to improve educational outcomes.

Such an exploration is timely given: the growth of online learning platforms, which have the potential to serve as testbeds and data sources; a growing pool of CS talent hungry to apply their skills towards social impact; and the chaotic shift to online learning globally during COVID-19, and the many gaps it has exposed.

The opportunities for machine learning in education are substantial, from uses of NLP to power automated feedback for the substantial amounts of student work that currently gets no review, to advances in voice recognition diagnosing errors by early readers.

Similar to the rise of computational biology, recognizing and realizing these opportunities will require a community of researchers and practitioners that are bilingual: technically adept at the cutting-edge advances in machine learning, and conversant in most pressing challenges and opportunities in education.

With representation from senior representatives from industry, academia, government, and education, this workshop is a step in that community-building process, with a focus on three things:
1. identifying what learning platforms are of a size and instrumentation that the ML community can leverage,
2. building a community of experts bringing rigorous theoretical and methodological insights across academia, industry, and education, to facilitate combinatorial innovation,
3. scoping potential Kaggle competitions and “ImageNets for Education,” where benchmark datasets fine tuned to an education goal can fuel goal-driven algorithmic innovation.

In addition to bringing speakers across verticals and issue areas, the talks and small group conversations in this workshop will be designed for a diverse audience--from researchers, to industry professionals, to teachers and students. This interdisciplinary approach promises to generate new connections, high-potential partnerships, and inspire novel applications for machine learning in education.

​This workshop is not the first Machine Learning for Education workshop; there has been several (, and the existence of these others speaks to recognition of the the obvious importance that ML will have for education moving forward!



Chat is not available.


2020-12-11T05:25:00-08:00 - 2020-12-11T05:30:00-08:00
Welcome address
Kumar Garg
2020-12-11T05:30:00-08:00 - 2020-12-11T05:40:00-08:00
Opening Remarks from National Science Foundation Director Sethuraman Panchanathan
Sethuraman Panchanathan
2020-12-11T05:45:00-08:00 - 2020-12-11T06:45:00-08:00
Panel discussion on effective partnerships to leverage machine learning and improve education
Kumar Garg, Steve Ritter, Heejae Lim, Jeremy Roschelle
Moderator: Kumar Garg, Managing Director and Head of Partnerships, Schmidt Futures Panelists include: Steve Ritter, Founder & Chief Scientist, Carnegie Learning Heejae Lim, Founder & CEO, TalkingPoints Jeremy Roschelle, Executive Director , Digital Promise
2020-12-11T06:45:00-08:00 - 2020-12-11T07:15:00-08:00
Carolyn Rosé, Professor of Human-Computer Interaction at Carnegie Mellon University, The power of intelligent conversation systems in collaborative learning
Carolyn Rosé
2020-12-11T07:15:00-08:00 - 2020-12-11T07:30:00-08:00
Jacob Whitehill, Assistant Professor of Computer Science at Worcester Polytechnic Institute, Using machine learning to create scientific instruments for classroom observation
Jacob Whitehill
2020-12-11T07:30:00-08:00 - 2020-12-11T07:50:00-08:00
Sidney D'Mello, Associate Professor in the Institute of Cognitive Sciences at the University of Colorado Boulder, Towards Natural Social Interaction: Multiparty, Multimodal Machine Learning for Education
Sidney D'Mello
2020-12-11T07:50:00-08:00 - 2020-12-11T08:50:00-08:00
Panel discussion on ImageNets for education
Kumar Garg, John Whitmer, Aigner Picou, Scott Crossley
Moderator: Kumar Garg, Managing Director and Head of Partnerships, Schmidt Futures Panelists: John Whitmer, Former Senior Director of Data Science & Analytics, ACTnext Aigner Piccou, Program Director, The Learning Agency Lab
2020-12-11T09:00:00-08:00 - 2020-12-11T09:30:00-08:00
Spotlight on ImageNets for Education
In 2007, Professor Fei-Fei Li started assembling a massive dataset of 14 million pictures, labeled with the objects that appeared in those images. This dataset, dubbed ImageNet, spurred dramatic progress over the next decade in computer vision, the field of artificial intelligence that trains computers to understand images and videos. Such datasets can serve as “benchmark” challenges that researchers compete on, and incentivize advancements in fundamental and domain-specific fields. We sent solicited ideas for a potential dataset that could drive a similarly transformative impact in education. Applicants submitted 300 word abstracts, as we selected a few to showcase. Please use this time to listen to the recordings at the bottom of the schedule to learn more about the benchmark data set ideas.
2020-12-11T09:30:00-08:00 - 2020-12-11T09:40:00-08:00
Joon Suh Choi, PhD Candidate at Georgia State University on research on ARTE
JoonSuh Choi
2020-12-11T09:40:00-08:00 - 2020-12-11T10:10:00-08:00
Zachary Pardos, Associate Professor, Graduate School of Education, University of California, Berkeley, "Neural course embedding for recommendation"
Zachary Pardos
2020-12-11T10:10:00-08:00 - 2020-12-11T10:30:00-08:00
Alina von Davier, Chief of Assessment, Duolingo, Machine learning and next generation assessments
Alina A von Davier
2020-12-11T10:30:00-08:00 - 2020-12-11T11:30:00-08:00
Panel discussion of talent pipeline into education research and the learning engineering field
Kumar Garg, Richard Tang, Ajoy Vase, Ken Koedinger
Moderator: Kumar Garg Panelists: Richard Tang, Student, University of California, Berkeley Ajoy Vase, COO, the Learning Collider at Teachers College, Columbia University Ken Koedinger, Professor of Human Computer Interaction and Psychology, Carnegie Mellon University Q&A to follow
2020-12-11T11:40:00-08:00 - 2020-12-11T12:00:00-08:00
Remarks from Burr Settles, Research Director, DuoLingo
Burr Settles
2020-12-11T12:00:00-08:00 - 2020-12-11T12:10:00-08:00
Remarks from Candace Marie Thille, Director of Learning Sciences,
Candace Marie Thille
2020-12-11T12:15:00-08:00 - 2020-12-11T12:30:00-08:00
Ryan Baker, Assistant Professor of Economics and Education at the University of Pennsylvania, Predicting students’ affect and motivation through meta-cognitive data
Ryan S. Baker
2020-12-11T12:30:00-08:00 - 2020-12-11T12:40:00-08:00
Discussion on how young technologists can contribute to learning engineering
Kumar Garg, Michelle Park, Katherine Binney, Jonathan J Mak
2020-12-11T12:40:00-08:00 - 2020-12-11T13:00:00-08:00
Remarks from Bryan Richardson, Senior Program Officer, the Bill & Melinda Gates Foundation’s K-12 program
Bryan Richardson
2020-12-11T13:00:00-08:00 - 2020-12-11T14:00:00-08:00
Panel discussion on minimizing bias in machine learning in education
Neil Heffernan, Ope A. Osoba, Emma Brunskill, Kathryn Fisler
Moderator: Neil Heffernan, William Smith Dean's Professor of Computer Science at Worcester Polytechnic Institute, and Co-Founder of ASSISTments Panelists: Osonde Osoba, Senior Information Scientist, RAND Corporation Emma Brunskill, Assistant Professor in the Computer Science Department, Stanford University Kathi Fisler, Research Professor, Brown University
2020-12-11T14:00:00-08:00 - 2020-12-11T14:15:00-08:00
Closing remarks from Fei-Fei Li, Sequoia Professor of Computer Science, Stanford University & Co-Director of Stanford’s Human-Centered AI Institute
Li Fei-Fei
Will be followed by a 10 minutes Q+A
ImageNets for Math Handwriting Recognition
Zac Hancock, Chase Thomas
Authors: Zac Hancock, Michael Chifala, Callie Federer, Jiamin He, & Quinn N Lathrop One of the best ways to learn and practice math is by hand on paper. Digital math applications can take advantage of this natural interaction by including a handwriting recognition capability. We introduce a dataset that can be used to create such models to bridge math learners and digital applications. Given the importance of mathematical expressions across all scientific branches, including physics, engineering, and economics, this dataset can become an important resource for advancing the use of machine learning for the benefit of education. Our dataset (available at consists of 100,000 images of handwritten math expressions within calculus. The images are synthetically generated which affords 100% correct pixel-level tagging and results in realistic images capable of training models whose performance generalize to real images. It has a very permissive license, the full collaboration tools of Kaggle, and standard data formats that increase generalizability and usability. The dataset offers something to all levels, from beginners building simple character recognition models to experts who wish to predict pixel-by-pixel masks with object detection models and decode the complex structure of math expressions. The most similar dataset is CROHME, which provides digital ink with stroke data. Our dataset differs in that it focuses on images of math and covers a targeted scope of limits expressions. Also, because our dataset is generated, this scope of math could be changed as needed and the size of the dataset is limited only by practicalities. The ease of use and richness of the dataset will hopefully excite ML researchers within education and draw new ML researchers to the field. Applications beyond handwriting recognition include translating students’ math to pdfs and automated grading for instructors. ML capabilities built using this dataset would benefit many educational institutions by helping to connect the natural mode of math learners and digital educational applications
ImageNets for Reading
John Gabrieli, Perpetual Baffour
ImageNets for Teaching CS
Tiffany Barnes, Thomas Price, Jim Larimore
Abstract: In this breakout session, we propose the idea of an "ImageNet for Teaching Computer Science." The proposed idea involves collecting a large set of labeled programming datasets from classrooms, using a shared format, and developing a set of benchmarks and challenges that will facilitate research for K-20 computing education. This data would benefit a growing research community at the intersection of computing education and learning analytics, with implications for students across many fields that teach computing. Rationale: Programming data is ideal for learning analytics/edu data mining, since it is rich, capturing students' every state and action as they work, and the data is structured by syntax rules. Advances in programming analysis techniques for open-ended, sequential, and semi-structured data will have broad applications across educational domains. However, recent advances in deep learning require larger datasets and more meaningful labels than those typically available from individual classrooms, necessitating cross-institutional data collection and labeling efforts. Background and Progress: A series of workshops from CS-SPLICE ( and CSEDM ( have brought together the research community to develop infrastructure and analysis techniques for programming data. The community has developed the shared ProgSnap2 format for programming log data (, which is already used by 10+ datasets, comprising 750,000+ program snapshots in various languages (many of the datasets can be found on Researchers have used this data to develop automated support (e.g. hints, feedback, curated examples), predict student success, and personalize interventions. The CSEDM Data Challenge ( is a recurring data mining competition (held 2019, planned 2021) to gain insight from classroom programming data, which has helped to define shared machine learning benchmarks on common datasets. Next Steps: The key challenges will be collecting diverse existing datasets, and creating infrastructure to support collecting and labeling new data. This will allow us to tackle novel research challenges, such as generalizing algorithms and labels across problems -- for example detecting knowledge components, strategies, or misconceptions on one problem using data from others. The CS-SPLICE and CSEDM communities include developers of many widely-used educational programming platforms and will be important stakeholders in driving the work forward. Acknowledgements: This reflects joint work by Thomas Price, Tiffany Barnes, Min Chi, Samiha Marwan, Yang Shi, Preya Shabrina, and Ye Mao at NC State University. It presents and builds on ideas and foundational work by the CS-SPLICE and CSEDM teams.
ImageNets for the Whole Child
Daniel Jarratt, Paola Martinez
ImageNets for Math Errors
Nishchal Shukla, Sam Ching