Skip to yearly menu bar Skip to main content

Contributed Talk
Workshop: Generalization in Planning (GenPlan '23)

Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines

Yu-An Lin · Chen-Tao Lee · Guan-Ting Liu · Pu-Jen Cheng · Shao-Hua Sun

Keywords: [ Reinforcement Learning ] [ State Machines ] [ Programmatic Reinforcement Learning ]

[ ] [ Project Page ]
Sat 16 Dec 12:05 p.m. PST — 12:15 p.m. PST


Deep reinforcement learning excels in various domains but lacks generalizability and interoperability. Programmatic RL (Trivedi et al., 2021; Liu et al., 2023) methods reformulate solving RL tasks as synthesizing interpretable programs that can be executed in the environments. Despite encouraging results, these methods are limited to short-horizon tasks. On the other hand, representing RL policies using state machines (Inala et al., 2020) can inductively generalize to long-horizon tasks; however, it struggles to scale up to acquire diverse and complex behaviors and is difficult to be interpreted by human users. This work proposes Program Machine Policies (POMPs), which bridge the advantages of programmatic RL and state machine policies, allowing for representing complex behaviors and addressing long-horizon tasks. Specifically, we introduce a method that can retrieve a set of effective, diverse, compatible programs. Then, we use these programs as modes of a state machine and learn a transition function to transition among mode programs, allowing for capturing long-horizon repetitive behaviors. Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks and demonstrates the ability to inductively generalize to even longer horizons without any fine-tuning. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programs as modes.

Chat is not available.