Skip to yearly menu bar Skip to main content


Poster

Hybrid Mamba: An Promising In-Context RL for Long-Term Decision

Sili Huang · Jifeng Hu · Zhejian Yang · Liwei Yang · Tao Luo · Hechang Chen · Lichao Sun · Bo Yang

[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we propose a Hybrid Mamba (HM) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, HM first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that HM achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of HM in the long-term task is 28$\times$ times faster than the transformer-based baselines.

Live content is unavailable. Log in and register to view live content