Timezone: »
Reinforcement Learning (RL) algorithms typically utilize learning and/or planning techniques to derive effective policies. The integration of both approaches has proven to be highly successful in addressing complex sequential decision-making challenges, as evidenced by algorithms such as AlphaZero and MuZero, which consolidate the planning process into a parametric search-policy. AIXI, the most potent theoretical universal agent, leverages planning through comprehensive search as its primary means to find an optimal policy. Here we define an alternative universal agent, which we call Self-AIXI, that on the contrary to AIXI, maximally exploits learning to obtain good policies. It does so by self-predicting its own stream of action data, which is generated, similarly to other TD(0) agents, by taking an action maximization step over the current on-policy (universal mixture-policy) Q-value estimates. We prove that Self-AIXI converges to AIXI, and inherits a series of properties like maximal Legg-Hutter intelligence and the self-optimizing property.
Author Information
Elliot Catt (Google Deepmind)
Jordi Grau-Moya (DeepMind)
Marcus Hutter (DeepMind)
Matthew Aitchison (Deepmind)
Tim Genewein (DeepMind)
Grégoire Delétang (DeepMind)
Kevin Li (Google DeepMind)
Joel Veness (Deepmind)
More from the Same Authors
-
2021 : Blindness of score-based methods to isolated components and mixing proportions »
Kevin Li · Heishiro Kanagawa -
2022 : Score-based generative model learnmanifold-like structures with constrained mixing »
Kevin Li · Ben Moran -
2023 : Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models »
Amal Rannen-Triki · Jorg Bornschein · Razvan Pascanu · Alexandre Galashov · Michalis Titsias · Marcus Hutter · András György · Yee Whye Teh -
2023 Poster: Bounded rationality in structured density estimation »
Tianyuan Teng · Kevin Li · Hang Zhang -
2020 Poster: COT-GAN: Generating Sequential Data via Causal Optimal Transport »
Tianlin Xu · Kevin Li · Michael Munn · Beatrice Acciaio -
2020 Poster: A Combinatorial Perspective on Transfer Learning »
Jianan Wang · Eren Sezener · David Budden · Marcus Hutter · Joel Veness -
2020 Poster: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2020 Spotlight: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2020 Poster: Online Learning in Contextual Bandits using Gated Linear Networks »
Eren Sezener · Marcus Hutter · David Budden · Jianan Wang · Joel Veness -
2020 Poster: Gaussian Gated Linear Networks »
David Budden · Adam Marblestone · Eren Sezener · Tor Lattimore · Gregory Wayne · Joel Veness -
2020 Poster: Logarithmic Pruning is All You Need »
Laurent Orseau · Marcus Hutter · Omar Rivasplata -
2020 Spotlight: Logarithmic Pruning is All You Need »
Laurent Orseau · Marcus Hutter · Omar Rivasplata -
2019 Poster: A neurally plausible model for online recognition and postdiction in a dynamical environment »
Kevin Li · Maneesh Sahani -
2018 : Panel disucssion »
Max Welling · Tim Genewein · Edwin Park · Song Han -
2018 : TBC 12 »
Tim Genewein -
2018 : Neural network compression in the wild: why aiming for high compression factors is not enough »
Tim Genewein