NeurIPS Expo Demonstration Reasoning through Multimodal End-to-End Decision Transformer Networks and Vision Language Action (VLA) models

Expo Demonstration

Reasoning through Multimodal End-to-End Decision Transformer Networks and Vision Language Action (VLA) models

Ron Tindall

Upper Level Room 29A-D

[ Abstract ]

Tue 2 Dec noon PST — 3 p.m. PST

Abstract:

This demonstration showcases the live output and visualization capabilities of an edge-integrated VLA model for path planning in automated driving scenarios. By harnessing raw multimodal sensor inputs, including visual and voice data, the VLA model processes information in real time to generate safe, explainable, and repeatable driving trajectories. The system operates on a Snapdragon Ride Elite SoC platform and incorporates safety guardrails, enabling robust decision-making and transparent reasoning. Attendees will observe how end-to-end AI networks interpret complex environmental cues to deliver actionable driving paths, with a special focus on complex use cases involving vulnerable road users and other actors on the road. This demonstration highlights advances in multimodal reasoning and edge deployment for next-generation intelligent mobility solutions.

Chat is not available.