Timezone: »

MTL-TransMODS: Cascaded Multi-Task Learning for Moving Object Detection and Segmentation with Unified Transformers

Recently, transformer-based networks have achieved state-of-the-art performance in computer vision tasks. In this paper, we propose a new cascaded MTL transformer-based framework, termed MTL-TransMODS, that tackles the moving object detection and segmentation tasks due to its importance for Autonomous Driving tasks. A critical problem in this task is how to model the spatial correlation in each frame and the temporal relationship across multiple frames to capture the motion cues. MTL-TransMODS, introducing a vision transformer to employ the temporal and spatial associations, and tackle both tasks using only one fully shared transformer architecture with unified queries. Extensive experiments demonstrate the superiority of our MTL-TransMODS over state-of-the-art methods on the KittiMoSeg dataset \cite{rashed2019fusemodnet}. Results show 0.3\% mAP improvement for Moving Object Detection, and 5.7\% IoU improvement for Moving Object Segmentation, over the state-of-the-art techniques.

Author Information

Eslam MOHAMED-ABDELRAHMAN (Valeo (F22 Building, Smart Village، Cairo - Alexandria Desert Rd, Giza))
Ahmad El Sallab (Valeo)

More from the Same Authors