Skip to yearly menu bar Skip to main content

Workshop: Data Centric AI

Unleashing the Power of Industrial Big Data through Scalable Manual Labeling


Big Data plays a central role in the remarkable results achieved by Machine Learning (ML) and especially Deep Learning (DL) in the recent years. However, the difficulty in obtaining a reasonable amount of labeled samples limits ML/DL application in various domains, including industrial equipment and system monitoring. In this paper the need for methods that turn manual labeling into a scalable process is highlighted. A real world problem is analyzed for which weak supervision methods, successfully employed in other domains, did not produce acceptable results. An alternative approach based on clustering ensembles is described and tested, achieving good performance.