Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update. However, previously presented solutions are not applicable in general, either due to lack of expressive power, computational constraint or constraints on the policy or environment. Furthermore, many BCs rely on the actions of policies. We discuss and demonstrate how these BCs can be misleading, especially in stochastic environments, and propose a novel solution based on what states policies visit. We run experiments to evaluate the quality of the proposed BC against baselines and evaluate their use in studying training algorithms, novelty search and trust-region policy optimization.
Anssi Kanervisto (University of Eastern Finland)
Ville Hautamäki (National University of Singapore)
Ville HAUTAMÄKI (PhD), Associate Professor, received the M.Sc. degree in Computer Science from the University of Joensuu (currently known as the University of Eastern Finland), Finland in 2005. He received the Ph.D. degree in Computer Science from the same university in 2008. He worked from 2009 to 2011 as a research fellow at the Institute for Infocomm Research, A*STAR, Singapore. Then in 2011 to 2014 he worked as a PI on an Academy of Finland funded project at the University of Eastern Finland, funded by the Academy of Finland. In 2013 he worked for six months as a visiting scholar in Georgia Institute of Technology, USA. In 2014, he won a Nokia Visiting Professor grant to visit Georgia Institute of Technology. In 2015, he was a PI of a Finnish ministry defense funded automatic foreign accent recognition project. In 2016, his team won 3rd place in an international AI reinforcement challenge VizDoom. Four of his PhD students have graduated. He has served in the COST Action IC1206. Currently he is on leave of absence from a senior researcher position at the University of Eastern Finland and now he is with the National University of Singapore. He has served as an associate editor in Digital Speech Processing and is serving currently as an associate editor of IEEE Signal Processing Letters. He has been a member of the IEEE Speech and Language Processing Technical Committee (SLTC). Recently he PI’d of an Academy of Finland funded deep learning project and Finnish ministry of defense funded DeepFake detection project, and co-PI’d of a CZI funded deep learning project. He supervises and co-supervises three PhD students.
More from the Same Authors
2022 : Fifteen-minute Competition Overview Video »
Byron Galbraith · Anssi Kanervisto · Steven Wang · Stephanie Milani · Sharada Mohanty · Rohin Shah · Karolis Ramanauskas · Brandon Houghton
2022 : Imitating Human Behaviour with Diffusion Models »
Tim Pearce · Tabish Rashid · Anssi Kanervisto · David Bignell · Mingfei Sun · Raluca Georgescu · Sergio Valcarcel Macua · Shan Zheng Tan · Ida Momennejad · Katja Hofmann · Sam Devlin
2022 Competition: The MineRL BASALT Competition on Fine-tuning from Human Feedback »
Anssi Kanervisto · Stephanie Milani · Karolis Ramanauskas · Byron Galbraith · Steven Wang · Brandon Houghton · Sharada Mohanty · Rohin Shah