Skip to yearly menu bar Skip to main content

Workshop: Advances in Programming Languages and Neurosymbolic Systems (AIPLANS)

Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization

Róbert Csordás · Kazuki Irie · Jürgen Schmidhuber


Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization. The situation is especially frustrating for algorithmic tasks, where they often fail to find intuitive solutions that can be simply expressed in terms of attention patterns. Here we propose two modifications to the Transformer architecture, copy gate and geometric attention, which facilitate learning such intuitive and interpretable solutions to algorithmic problems. Our novel Transformer, called Transformer Control Flow (TCF) achieves 100% length generalization accuracy on the classic compositional table lookup task. The resulting attention and gating patterns are interpretable, demonstrating that the model implements adaptive control flow.