Toloka AI

Expo Workshop

Room 290

Why do researchers need to deploy their machine learning (ML) models? What is the difference between the models prototyped by researchers and the ones running in production? According to rough estimations, around 87% of data science projects never make it into production (1). One of the reasons is a difference between a skill set needed for designing a new model and a skill set for deploying this model in production for the end users. The former includes the ability to come up with new ideas and prototype them fast, while the latter focuses on stability, scalability and, importantly, integration with the existing processes. Thus, training and deploying machine learning models becomes a major challenge for many companies, either big or small. In this workshop, we will focus on key challenges that most researchers have to overcome on a way to production. We will start with a panel discussion about different perspectives on how research findings should be used in production. Then, working together in groups, we will discuss common steps of a pipeline of implementing a ML model, such as collecting the data, exploratory data analysis, feature engineering, model selection, model deployment, and model serving. Our intent is to form these groups in a mixed way, connecting academic researchers with industrial software engineers. In this way, the participants will be able to share their experience in optimizing each step of the ML pipeline and exchange best practices. We will conclude with a discussion on utilizing research findings for making better products and applications.

KEY TAKE-AWAYS
- What components are needed to bring research models Into production
- How you can set up reusable and easy-to-upgrade pipelines
- How to optimize each step of ML pipeline