Humans are exposed to a continuous stream of sensory data, yet understand the world in terms of discrete concepts. A large body of work has focused on chunking sensory data in time, i.e. finding event boundaries, typically identified by model prediction errors. Similarly, chucking sensory data in space is the problem at hand when building spatial maps for navigation. In this work, we argue that a single mechanism underlies both, which is building a hierarchical generative model of perception and action, where chunks at a higher level are formed by segments surpassing a certain information distance at the level below. We demonstrate how this can work in the case of robot navigation, and discuss how this could relate to human cognition in general.