Skip to yearly menu bar Skip to main content

Lightning Talk
Workshop: Data Centric AI

YMIR: A Rapid Data Development Platform for Long-tailed Vision Applications


This paper introduces an open source platform for rapid development of long-tailed computer vision applications. The platform puts efficient dataset development at the center of the machine learning development process, integrates active learning methods, data and model version control, and uses concepts such as projects to enable fast iteration of multiple task specific datasets in parallel. We make it an open platform by abstracting the development process into core states and operations, and design open APIs to integrate third party tools as implementations of the operations. This open design reduces our development cost and at the same time reduces adoption cost for ML teams with existing tools for part of the development process. The platform is targeted to open source in the coming weeks and is already used internally to meet the ever increasing demand of custom computer vision applications from customers.