Skip to yearly menu bar Skip to main content


ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment

Joseph DelPreto · Chao Liu · Yiyue Luo · Michael Foshey · Yunzhu Li · Antonio Torralba · Wojciech Matusik · Daniela Rus

Hall J (level 1) #1025

Keywords: [ motion tracking ] [ learning pipelines ] [ robot assistants ] [ kitchen activities ] [ activities of daily living ] [ tactile sensing ] [ muscle activity ] [ cameras ] [ Wearable sensors ] [ eye tracking ] [ multimodal dataset ] [ microphones ] [ EMG ] [ gaze ] [ body tracking ] [ multimodal recording ] [ Attention ] [ depth ] [ audio ] [ experimental design ] [ Neural Networks ] [ Video ] [ human subjects ] [ machine learning ] [ RGBD ] [ recording software ] [ open-source ] [ joint angles ]


This paper introduces ActionSense, a multimodal dataset and recording framework with an emphasis on wearable sensing in a kitchen environment. It provides rich, synchronized data streams along with ground truth data to facilitate learning pipelines that could extract insights about how humans interact with the physical world during activities of daily living, and help lead to more capable and collaborative robot assistants. The wearable sensing suite captures motion, force, and attention information; it includes eye tracking with a first-person camera, forearm muscle activity sensors, a body-tracking system using 17 inertial sensors, finger-tracking gloves, and custom tactile sensors on the hands that use a matrix of conductive threads. This is coupled with activity labels and with externally-captured data from multiple RGB cameras, a depth camera, and microphones. The specific tasks recorded in ActionSense are designed to highlight lower-level physical skills and higher-level scene reasoning or action planning. They include simple object manipulations (e.g., stacking plates), dexterous actions (e.g., peeling or cutting vegetables), and complex action sequences (e.g., setting a table or loading a dishwasher). The resulting dataset and underlying experiment framework are available at Preliminary networks and analyses explore modality subsets and cross-modal correlations. ActionSense aims to support applications including learning from demonstrations, dexterous robot control, cross-modal predictions, and fine-grained action segmentation. It could also help inform the next generation of smart textiles that may one day unobtrusively send rich data streams to in-home collaborative or autonomous robot assistants.

Chat is not available.