NeurIPS Expo Demonstration Soft Prompts for On-Device Content Moderation

Expo Demonstration

Soft Prompts for On-Device Content Moderation

Ron Tindall

Upper Level Room 29A-D

[ Abstract ]

Tue 2 Dec noon PST — 3 p.m. PST

Abstract:

We demonstrate the first on-device integration of a safety-aligned large language model (LLM) using soft prompt distillation, powered by our proposed TV-DiSP framework. Our system showcases how a mobile device can run a quantized LLM equipped with learned soft prompts to moderate harmful or toxic content in real-time. The demo highlights the difference in LLM outputs with and without our soft prompts when subjected to adversarial or unsafe inputs, enabling efficient and safe deployment of LLMs on edge devices. x000D
x000D
LLMs are known to produce unsafe or toxic outputs when prompted harmfully. Traditional safety mechanisms rely on dual-model architectures—pairing a base LLM with a separate guard model—which are memory and computationally expensive and unsuitable for deployment on resource-constrained devices like smartphones. The challenge is to achieve robust safety alignment without compromising latency, memory, or model utility in edge environments.

Chat is not available.