Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Autonomous Driving

DriveCLIP: Zero-shot transfer for distracted driving activity understanding using CLIP

Md Zahid Hasan · Ameya Joshi · Mohammed Shaiqur Rahman · Venkatachalapathy Archana · Anuj Sharma · Chinmay Hegde · Soumik Sarkar


Distracted driving action recognition from naturalistic driving is crucial for both driver and pedestrian's safe and reliable experience. However, traditional computer vision techniques sometimes require a lot of supervision in terms of a large amount of annotated training data to detect distracted driving activities. Recently, the vision-language models have offered large-scale visual-textual pretraining that can be adapted to unsupervised task-specific learning like distracted activity recognition. The contrastive image-text pretraining models like CLIP have shown significant promise in learning natural language-guided visual representations. In this paper, we propose a CLIP-based driver activity recognition framework that predicts whether a driver is distracted or not while driving. CLIP's vision embedding offers zero-shot transfer, which can identify distracted activities by the driver from the driving videos. Our result suggests this framework offers SOTA performance on zero-shot transfer for predicting the driver's state on three public datasets. We also developed DriveCLIP, a classifier on top of the CLIP's visual representation for distracted driving detection tasks, and reported the results here.

Chat is not available.