Workshop
InformationTheoretic Principles in Cognitive Systems (InfoCog)
Noga Zaslavsky · Rava Azeredo da Silveira · Ronit Bustin · Ron M. Hecht
Room 215  216
Information theory provides a mathematical framework allowing to formulate and quantify the basic limitations of data compression and communication. The notions of data compression and communication, based in analog and digital communication, are also relevant toother domains; as such, information theory spans a number of research fields. Aiming to formulate, understand, and quantify the storage and processing of information is a thread that ties together these disparate fields, and especially the study of cognition in humans and machines. Specifically, the desire to reach an integrative computational theory of human and artificial cognition, is attempted by leveraging informationtheoretic principles as bridges between various cognitive functions and neural representations. Insights from information theoretic formalization have also led to tangible outcomes which have influenced the operation of artificial intelligent systems. One example is the information bottleneck (IB) approach, yielding insights on learning in neural networks (NN), as well as tools for slow feature analysis and speech recognition. A central application of the IB approach on NN, is through the view of data transfer between layers as an autoencoder. The approach then uses a variational approximation of the IB to produce an objective for minimization that is feasible and results in efficient training (a.k.a. variational IB(VIB)). In the other direction, the variational autoencoder (VAE) framework has also been used to explain cognitive functions, as done for example in. The IB approach has also been applied to emergent communication (EC) in both humans and machines, using a vector quantization VIB(VQVIB) method, that extends the aforementioned VIB method. Another example is the tradeoff between information and value in the context of sequential decision making. This corresponding formalism has led to tangible methods in the solution of sequential decision making problems and was even used in an experimental study of mouse navigation and study of drivers' eye gaze patterns and study of drivers' language models. In aiming at understanding machine learning (ML), specifically in the context of NNs, or cognition, we need theoretical principles (hypotheses) that can be tested. To quote Shannon: I personally believe that many of the concepts of information theory will prove useful in these other fieldsand, indeed, some results are already quite promisingbut the establishing of such applications is not a trivial matter of translating words to a new domain, but rather the slow tedious process of hypothesis and experimental verification. If, for example, the human being acts in some situations like an ideal decoder, this is an experimental and not a mathematical fact, and as such must be tested under a wide variety of experimental situations. Today, both ML and cognition can entertain huge amounts of data. Establishing quantitative theories and corresponding methods for computation can have a massive impact on progress in these fields. Broadly, this workshop aims to further the understanding of information flow in cognitive processes and neural networks models of cognition. More concretely, this year’s workshop goals are twofold. On the one hand we wish to provide a fruitful platform for discussions relating to formulations of storage and processing of information either in human or artificial cognition systems, via informationtheoretic measures, as those formalisms mentioned above. Specifically, the workshop comes to allow information theory researchers to take part in such discussions, allowing firsthand sharing of knowledge and ideas. On the other hand, we hope this workshop can advance, sharpen and enhance the research done around the computation of information theoretic quantities, specifically for the needs and benefits of cognition research. The two aims of the workshop are not independent of one another  any information theoretic formalism that we wish to experimentally verify has to be, in some sense, computationally feasible. Moreover, we wish that computation and estimation methods are developed in a way that is tailored to the open questions in human and artificial cognition. The proposed workshop focuses on bringing together researchers interested in integrating informationtheoretic approaches with researchers focused on the computation/estimation of informationtheoretic quantities, with the aim of tightening the collaboration between the two communities. Researchers interested in integrating informationtheoretic approaches come from cognitive science, neuroscience, linguistics, economics, and beyond. Efforts in the computation/estimation of informationtheoretic quantities are pursued for many reasons, and is a line of research gaining increasing attention due to advances in ML. Furthermore, these researchers have created in recent years new methods to measure informationrelated quantities.
Schedule
Fri 6:15 a.m.  6:30 a.m.

Poster Organization
(
Hanging posters (please hang your poster)
)

🔗 
Fri 6:30 a.m.  6:40 a.m.

Opening Remarks
(
opening remarks
)
SlidesLive Video 
Noga Zaslavsky 🔗 
Fri 6:40 a.m.  7:10 a.m.

The Physics of Science
(
Invited talk
)
SlidesLive Video What are the principles that underwrite sentient behaviour? This presentation uses the free energy principle to furnish an account in terms of active inference. First, we will try to understand sentience from the point of view of physics; in particular, the properties that selforganising systems—that distinguish themselves from their lived world—must possess. We then rehearse the same story from the point of view of a neurobiologist, trying to understand functional brain architectures. The narrative starts with a heuristic proof suggesting that life—or biological selforganization—is an inevitable and emergent property of any dynamical system that possesses a Markov blanket. This conclusion is based on the following arguments: if a system can be differentiated from its external milieu, then its internal and external states must be conditionally independent. These independencies induce a Markov blanket that separates internal and external states. Crucially, this equips internal states with an information geometry, pertaining to probabilistic beliefs about something; namely external states. This free energy is the same quantity that is optimized in Bayesian inference and machine learning (where it is known as an evidence lower bound). In short, internal states will appear to infer—and act on—their world to preserve their integrity. This leads to a Bayesian mechanics, which can be neatly summarised as selfevidencing. In the second half of the talk, we will unpack these ideas using simulations of Bayesian belief updating in the brain and relate them to predictive processing and sentient behaviour. Key words: active inference ∙ autopoiesis ∙ cognitive ∙ dynamics ∙ free energy ∙ epistemic value ∙ selforganization. 
Karl Friston 🔗 
Fri 7:10 a.m.  7:20 a.m.

States as goaldirected concepts: an epistemic approach to staterepresentation learning
(
Oral
)
link
SlidesLive Video Our goals shape how we represent our experience. For example, when we are hungry, we tend to view objects in our environment according to whether or not they are edible (or tasty). Alternatively, when we are cold, we may view the very same objects according to their ability to produce heat. Computational theories of learning in cognitive systems, such as reinforcement learning, use the notion of "staterepresentation" to describe how agents selectively construe and focus on behaviorallyrelevant features of their environment. However, these approaches typically assume "groundtruth" state representations that are known by the agent, and reward functions that need to be learned. Here we suggest an alternative approach in which staterepresentations are not assumed veridical, or even predefined, but rather emerge from the agent's goals through interaction with its environment. We illustrate this novel perspective by inferring the goals driving rat behavior in an odorguided choice task and discuss potential implications for developing, from first principles, an informationtheoretic account of goaldirected state representation learning. 
Nadav Amir · Yael Niv · Angela Langdon 🔗 
Fri 7:20 a.m.  7:50 a.m.

Human Information Processing in Complex Networks
(
Invited talk
)
SlidesLive Video 
Danielle S Bassett 🔗 
Fri 7:50 a.m.  8:00 a.m.

Discrete, compositional, and symbolic representations through attractor dynamics
(
Oral
)
link
SlidesLive Video Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractorsupported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information. 
Andrew Nam · Eric Elmoznino · Nikolay Malkin · Chen Sun · Yoshua Bengio · Guillaume Lajoie 🔗 
Fri 8:00 a.m.  8:30 a.m.

Coffee Break + posters
(
coffee break
)

🔗 
Fri 8:30 a.m.  9:00 a.m.

Resourcerational prediction in real and artificial neural networks
(
Invited talk
)
SlidesLive Video Sensory prediction is vital to organisms, and humans have engineered complex neural networks to predict, but it is difficult to benchmark how well real and artificial agents predict given that ground truth is often unknown. We utilize socalled epsilonMachines, a special type of hidden Markov model, to calibrate how well real and artificial agents predict. First we show that large random epsilonMachines produce output that artificial agents do not predict very well, though they come close to limits set by Fano's inequality. But then, we note that newly collected data shows that neurons in a dish and humans are resourcerational predictors, meaning that they predict as well as possible given their limited memory an outgrowth of ratedistortion theory. This allows us insight into artificial neural networks as well, and we find that LSTMs predict as well as possible given limited memory in challenging (undersampled) conditions. Altogether, we advance the idea that epsilonMachines can be used to benchmark the performance of predictive agents and also the idea that these agents might be only boundedly optimal at prediction because they are subject to limitations on memory. 
Sarah Marzen 🔗 
Fri 9:00 a.m.  9:10 a.m.

Lossy Compression and the Granularity of Causal Representation
(
Oral
)
link
SlidesLive Video A given causal system can be represented in a variety of ways. How do agents determine which variables to include in their causal representations, and at what level of granularity? Using techniques from information theory, we develop a formal theory according to which causal representations reflect a tradeoff between compression and informativeness. We then show, across three studies (N=1,391), that participants’ choices over causal models demonstrate a preference for more compressed causal models when all other factors are held fixed, with some further tolerance for lossy compressions. 
David Kinney · Tania Lombrozo 🔗 
Fri 9:10 a.m.  9:40 a.m.

Information Theory for Representation Learning
(
Invited talk
)
SlidesLive Video I'll give an overview of how information theoretic principles have been used to motivate and advance representation learning. By combining variational bounds on information theoretic quantities like mutual information with the expressiveness and learnability of modern deep neural networks, information theory can guide the search for useful representations in a wide array of settings including unsupervised learning, supervised learning, bayesian inference and prediction. The emphasis will be on how the modern tools of deep learning can now turn the principled information theoretically motivated objectives across a broad range of interdisciplinary fields into a reality. 
Alemi 🔗 
Fri 9:40 a.m.  9:43 a.m.

What can AI Learn from Human Exploration? IntrinsicallyMotivated Humans and Agents in OpenWorld Exploration
(
Spotlight
)
link
SlidesLive Video What drives exploration? Understanding intrinsic motivation is a longstanding question in both cognitive science and artificial intelligence (AI); numerous exploration objectives have been proposed and tested in human experiments and used to train reinforcement learning (RL) agents. However, experiments in the former are often in simplistic environments that do not capture the complexity of real world exploration. On the other hand, experiments in the latter use more complex environments, yet the trained RL agents fail to come close to human exploration efficiency. To study this gap, we propose a framework for directly comparing human and agent exploration in an openended environment, Crafter. We study how well commonlyproposed information theoretic intrinsic objectives relate to actual human and agent behaviors, finding that human and intrinsicallymotivated RL agent exploration success consistently show positive correlation with Entropy and Empowerment. However, only human exploration shows significant correlation with Information Gain. In a preliminary analysis of verbalizations, we find that children's verbalizations of goals show a strong positive correlation with Empowerment, suggesting that goalsetting may be an important aspect of efficient exploration. 
Alison Gopnik · Pieter Abbeel · Maria Rufova · Alyssa L Dayan · Eliza Kosoy · Yuqing Du 🔗 
Fri 9:43 a.m.  9:46 a.m.

Active Vision with Predictive Coding and Uncertainty Minimization
(
Spotlight
)
link
SlidesLive Video We present an endtoend procedure for embodied visual exploration based on two biologically inspired computations: predictive coding and uncertainty minimization. The procedure can be applied in a taskindependent and intrinsically driven manner. We evaluate our approach on an active vision task, where an agent must actively sample its visual environment to gather information. We show that our model is able to build unsupervised representations that allow it to actively sample and efficiently categorize sensory scenes. We further show that using these representations as input for downstream classification leads to superior data efficiency and learning speed compared to other baselines, while also maintaining lower parameter complexity. Finally, the modularity of our model allows us to analyze its internal mechanisms and to draw insight into the interactions between perception and action during exploratory behavior. 
Abdelrahman Sharafeldin · Nabil Imam · Hannah Choi 🔗 
Fri 9:46 a.m.  9:49 a.m.

Natural Language Systematicity from a Constraint on Excess Entropy
(
Spotlight
)
link
SlidesLive Video Natural language is systematic: utterances are composed of individually meaningful parts which are typically concatenated together. We argue that naturallanguagelike systematicity arises in codes when they are constrained by excess entropy, the mutual information between the past and the future of a process. In three examples, we show that codes with naturallanguagelike systematicity have lower excess entropy than matched alternatives. 
Richard Futrell 🔗 
Fri 9:49 a.m.  9:52 a.m.

The PerceptionUncertainty Tradeoff in Generative Restoration Models
(
Spotlight
)
link
SlidesLive Video Generative models have achieved remarkable performance in restoration tasks, producing results nearly indistinguishable from real data. However, they are prone to generating artifacts or hallucinations not present in the original input, inducing estimation uncertainty. Notably, the extent of hallucination seems to increase with the perceptual quality of the generative model. This paper explores this phenomenon using informationtheoretic tools to uncover an inherent tradeoff between perception and uncertainty. Our mathematical analysis shows that the uncertainty of the restoration algorithm, as measured by error entropy, grows in tandem with the improvement in perceptual quality. Employing R'enyi divergence as a perception measure, we derive lower and upper bounds for the tradeoff, locating estimators into distinct performance categories. Furthermore, we establish a relationship between estimation distortion and uncertainty, through which we provide a fresh perspective on the perceptiondistortion tradeoff. Our work presents a principled analysis of uncertainty, emphasizing its interplay with perception and distortion, and the limitations of generative models in restoration tasks. 
Regev Cohen · Ehud Rivlin · Daniel Freedman 🔗 
Fri 9:52 a.m.  9:55 a.m.

An InformationTheoretic Understanding of Maximum Manifold Capacity Representations
(
Spotlight
)
link
SlidesLive Video Maximum Manifold Capacity Representations (MMCR) is a recent multiview selfsupervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting for at least two reasons. Firstly, MMCR is an oddity in the zoo of MVSSL methods: it is not (explicitly) contrastive, applies no masking, performs no clustering, leverages no distillation, and does not (explicitly) reduce redundancy. Secondly, while many selfsupervised learning (SSL) methods originate in information theory, MMCR distinguishes itself by claiming a different origin: a statistical mechanical characterization of the geometry of linear separability of data manifolds. However, given the rich connections between statistical mechanics and information theory, and given recent work showing how many SSL methods can be understood from an informationtheoretic perspective, we conjecture that MMCR can be similarly understood from an informationtheoretic perspective. In this paper, we leverage tools from high dimensional probability and information theory to demonstrate that an optimal solution to MMCR's nuclear normbased objective function is the same optimal solution that maximizes a wellknown lower bound on mutual information. 
Rylan Schaeffer · Berivan Isik · Victor Lecomte · Mikail Khona · Yann LeCun · Andrey Gromov · Ravid ShwartzZiv · Sanmi Koyejo 🔗 
Fri 9:55 a.m.  9:58 a.m.

Cognitive Information Filters: Algorithmic Choice Architecture for Boundedly Rational Choosers
(
Spotlight
)
link
SlidesLive Video We introduce cognitive information filters as an algorithmic approach to mitigating information overload using choice architecture: We develop an informationtheoretic model of boundedly rational multiattribute choice and leverage it to programmatically select information that is effective in inducing desirable behavioral outcomes. By inferring preferences and cognitive constraints from boundedly rational behavior, our methodology can optimize for revealed welfare and hence promises better alignment with boundedly rational users than recommender systems optimizing for imperfect welfare proxies such as engagement. This has implications beyond economics, for example for alignment research in artificial intelligence. 
Stefan Bucher · Peter Dayan 🔗 
Fri 10:00 a.m.  11:30 a.m.

Lunch break
(
lunch break
)

🔗 
Fri 11:30 a.m.  12:00 p.m.

An information perspective on language, cumulative culture, and human uniqueness
(
Invited talk
)
SlidesLive Video 
Noah Goodman 🔗 
Fri 12:00 p.m.  12:10 p.m.

Information theoretic study of the neural geometry induced by category learning
(
Oral
)
link
SlidesLive Video Categorization is an important topic both for biological and artificial neural networks. Here, we take an information theoretic approach to assess the efficiency of the representations induced by category learning. We show that one can decompose the relevant Bayesian cost into two components, one for the coding part and one for the decoding part. Minimizing this cost implies maximizing the mutual information between the set of categories and the neural activities. We analytically show that this mutual information can be written as the sum of two terms that can be interpreted as (i) finding an appropriate representation space, and, (ii) building a representation with the appropriate metrics, based on the neural Fisher information on this space. One main consequence is that category learning induces an expansion of neural space near decision boundaries. Finally, we provide numerical illustrations that show how Fisher information of the coding neural population aligns with the boundaries between categories. 
Laurent BONNASSEGAHOT · JeanPierre Nadal 🔗 
Fri 12:10 p.m.  12:40 p.m.

Clustering and phase transitions in selfattention dynamics
(
Invited talk
)
SlidesLive Video 
Yury Polyanskiy 🔗 
Fri 12:40 p.m.  1:30 p.m.

CDR: An InformationTheoretic Framework for Cognitive Dimension Reduction
(
Poster
)
link
We introduce Cognitive Dimension Reduction (CDR), a framework that sheds light on how individuals simplify the multidimensional world to guide decisionmaking and comprehension. Our proposal posits that cognitive limitations prompt the adoption of simplified models, reducing the environment to a subset of dimensions. Within these limitations, we propose that individuals exploit both environment structure and goal relevance. Employing Information Theory, we formalize these principles and develop a model that explains how environmental and cognitive factors influence dimension reduction. Furthermore, we present an experimental method for CDR assessment and initial findings that support it. 
Maya Leshkowitz 🔗 
Fri 12:40 p.m.  1:30 p.m.

Balancing utility and cognitive cost in social representation
(
Poster
)
link
To successfully navigate its environment, an agent must construct and maintain representations of the other agents that it encounters. Such representations are useful for many tasks, but they are not without cost. As a result, agents must make decisions regarding how much information they choose to represent about the agents in their environment. Using selective imitation as an example task, we motivate the problem of finding agent representations that optimally trade off between downstream utility and information cost, and illustrate two example approaches to resourceconstrained social representation. 
Max TaylorDavies · Christopher G Lucas 🔗 
Fri 12:40 p.m.  1:30 p.m.

A Work in Progress: Tighter Bounds on the Information Bottleneck for Deep Learning
(
Poster
)
link
The field of Deep Neural Nets (DNNs) is still evolving and new architectures are emerging to better extract information from available data. The Information Bottleneck, IB, offers an optimal information theoretic framework for data modeling. However, IB is intractable in most settings. In recent years attempts were made to combine deep learning with IB both for optimization and to explain the inner workings of deep neural nets. VAE inspired variational approximations such as VIB became a popular method to approximate bounds on the required mutual information computations. This work continues this direction by introducing a new tractable variational upper bound for the IB functional which is empirically tighter than previous bounds. When used as an objective function it enhances the performance of previous IBinspired DNNs in terms of test accuracy and robustness to adversarial attacks across several challenging tasks. Furthermore, the utilization of information theoretic tools allows us to analyze experiments and confirm theoretical predictions in real world problems. 
Nir Weingarten · Moshe Butman · Ran GiladBachrach 🔗 
Fri 12:40 p.m.  1:30 p.m.

Finding Relevant Information in Saliency Related Neural Networks
(
Poster
)
link
Over the last few years, many saliency models have shifted to using Deep Learning (DL). DL models can be viewed in this context as a doubleedged sword. On the one hand, they boost estimation performance but at the same time have less explanatory power than more explicit models. This drop in explanatory power is why DL models are often dubbed implicit models. Explainable AI (XAI) techniques have been formulated to address this shortfall. They work by extracting information from the network and explaining it. Here, we demonstrate the effectiveness of the Relevant Information Approach in accounting for saliency networks. We apply this approach to saliency models based on explicit algorithms when represented as neural networks. These networks are known to contain relevant information in their neurons. We estimate the relevant information of each neuron by capturing the relevant information with respect to first layer features (intensity, red, blue) and its higherlevel manipulations. We measure relevant information by using Mutual Information (MI) between quantified features and the label. These experiments were conducted on subset of the CAT2000 dataset. 
Ron M. Hecht · Gershon Celniker · Ronit Bustin · Dan Levi · Ariel Telpaz · Omer Tsimhoni · Ke Liu 🔗 
Fri 12:40 p.m.  1:30 p.m.

One if by land, two if by sea, three if by four seas, and more to come: values of perception, prediction, communication, and common sense in decision making
(
Poster
)
link
This work is about rigorously defining the values of perception, prediction, communication, and common sense in decision making. The defined quantities are decisiontheoretic, but have informationtheoretic analogues, e.g., they share some simple but key mathematical properties with Shannon entropy and mutual information, and can reduce to these quantities in particular settings. One interesting observation is that, the value of perception without prediction can be negative, while the value of perception together with prediction and the value of prediction alone are always nonnegative. The defined quantities suggest answers to practical questions arising in the design of autonomous decisionmaking systems. Example questions include: Do we need to observe and predict the behavior of a particular agent? How important is it? What is the best order to observe and predict the agents? The defined quantities may also provide insights to cognitive science and neural science, toward the understanding of how natural decision makers make use of information gained from different sources and operations. 
Aolin Xu 🔗 
Fri 12:40 p.m.  1:30 p.m.

Information Flows Reveal Computational Mechanisms of RNNs in Contextual Decisionmaking
(
Poster
)
link
Understanding the information flow of different taskrelevant messages within recurrent circuits is crucial to comprehending how the brain works, and in turn, for diagnosing and treating brain disorders.While several information flow methods have focused on functional connectivity and modalities of communication, we do not yet have a principled approach for understanding what information flows can tell us about the effects of causal interventions.In this paper, we consider a measure called $M$information flow, proposed by Venkatesh et al. (2020), within an artificial recurrent network trained on a contextual decisionmaking task studied by Mante et al. (2013).We show that $M$information flow recapitulates the dynamics of information integration, showing specialization of individual units, and revealing how context information is incorporated to select the appropriate response without affecting the underlying circuit dynamics.We also show how $M$information flow predicts the ``behavioral outcome'' of causal interventions within the network.This leads us to believe that understanding $M$information flow within a recurrent network can inform the design of intervention studies, and in future, of stimulationbased treatments for brain disorders.

Miles Mahon · Praveen Venkatesh 🔗 
Fri 12:40 p.m.  1:30 p.m.

The DistortionPerception Tradeoff in Finite Channels with Arbitrary Distortion Measures
(
Poster
)
link
Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error.We study this distortionperception (DP) tradeoff over finitealphabet channels, for the Wasserstein$1$ distance as the perception index, and an arbitrary distortion matrix. We show that computing the DP function and the optimalreconstructions is equivalent to solving a set of linear programming problems. We prove that the DP curve is a piecewise linear function of the perception index, and derive a closedform expression for the case of binary sources.

Dror Freirich · Nir Weinberger · Ron Meir 🔗 
Fri 12:40 p.m.  1:30 p.m.

Decision confidence reflects maximum entropy reinforcement learning
(
Poster
)
link
Current computational models have not been able to account for the effect of reward in confidence reports among humans. Here we propose a mathematical framework of confidence that is able to generalize across various decision making tasks involving varying prior and reward distributions. This framework proposes a formal definition of "decision confidence" through the concept of soft optimality. We further show that the objective function in this framework is jointly maximising the reward and information entropy of the policy. We confirm the validity of our framework by testing it on a data gathered under various task conditions. 
Amelia Johnson · Michael Buice · Koosha Khalvati 🔗 
Fri 12:40 p.m.  1:30 p.m.

Optimum SelfRandom Generation Rate and Its Application to the RateDistortionPerceptionProblem
(
Poster
)
link
In this paper, we consider the ratedistortionperception (RDP) problem with respect to $f$divergences from the viewpoint of informationtheoretic random number generation. First, we address the selfrandom number generation problem, which is a subproblem of the RDP problem, and derive the general formula for the optimum achievable rate.Then, we apply our findings to the RDP problem.

Ryo Nomura 🔗 
Fri 12:40 p.m.  1:30 p.m.

Aberrant HighOrder Dependencies in Schizophrenia RestingState Functional MRI Networks
(
Poster
)
link
The human brain has a complex, intricate functional architecture. While many studies primarily emphasize pairwise interactions, delving into highorder associations is crucial for a comprehensive understanding of how functional brain networks intricately interact beyond simple pairwise connections. Analyzing highorder statistics allows us to explore the nuanced and complex relationships across the brain, unraveling the heterogeneity and uncovering patterns of multilevel overlap on the psychosis continuum. Here, we employed highorder independent component analysis (ICA) plus multivariate informationtheoretical metrics ($O$information and $S$information) to estimate highorder interaction to examine schizophrenia using restingstate fMRI. The results show that multiple brain regions networks may be altered in schizophrenia, such as temporal, subcortical, and highercognitive brain regions, and meanwhile, it also shows that revealed synergy gives more information than redundancy in diagnosing schizophrenia. All in all, we showed that highorder dependencies were altered in schizophrenia. Identification of these aberrant patterns will give us a new window to diagnose schizophrenia.

Qiang Li · Vince Calhoun · Adithya Ram Ballem · Shujian Yu · Jesús Malo · Armin Iraji 🔗 
Fri 12:40 p.m.  1:30 p.m.

Influence of the geometry of the feature space on curiosity based exploration
(
Poster
)
link
In human spatial awareness, information appears to be represented according to 3D projective geometry. It structures information integration and action planning within an internal representation space. The way different first person perspectives of an agent relate to each other, through transformations of a world model, defines a specific perception scheme for the agent. This collection of transformations makes a "group" and it characterizes a geometric space by acting on it. We propose that imbuing world models with a "geometric" structure, given by a group, is one way to capture different perception schemes of agents.We explore how changing the geometric structure of a world model impacts the behavior of an agent. In particular, we focus on how such geometrical operations transform the formal expression of epistemic value (mutual inference) in active inference as driving an agent's curiosity about its environment, and impact exploration behaviors accordingly. We used group action as a special class of policies for perspectivedependent control. We compared the Euclidean versus projective groups. We formally demonstrate that the groups induce distinct behaviors. 
Grégoire SergeantPerthuis · Nils Ruet · David Rudrauf · Dimitri Ognibene · Yvain Tisserand 🔗 
Fri 12:40 p.m.  1:30 p.m.

Large Language Models Behave (Almost) As Rational Speech Actors: Insights From Metaphor Understanding
(
Poster
)
link
What are the inner workings of large language models? Can they perform pragmatic inference? This paper attempts to characterize from a mathematical angle the cognitive processes of large language models involved in metaphor understanding. Specifically, we show that GPT models embody reasoning mechanisms that resemble the Rational Speech Act model for metaphors, which has already been used to grasp the principles of human pragmatic inference in dealing with figurative language. Our research contributes to the field of explainability and interpretability of large language models and highlights the usefulness of adopting a Bayesian model of human cognition to gain insights into the pragmatics of conversational agents. 
Gaia Carenini · Louis Bodot · Luca Bischetti · Walter Schaeken · Valentina Bambini 🔗 
Fri 12:40 p.m.  1:30 p.m.

Empowerment, Free Energy Principle and Maximum Occupancy Principle Compared
(
Poster
)
link
While the objective of reward maximization in reinforcement learning has lead to impressive achievements in several games and artificial environments, animals seem to be driven by intrinsic signals that are not purely extrinsic, such as curiosity.Several rewardfree approaches have emerged in the fields of cognitive neuroscience and artificial intelligence that primarily make use of signals different from extrinsic rewards to guide exploration and ultimately drive behavior, but a comparison between these approaches is lacking. Here we focus on two popular rewardfree approaches, known as empowerment (MPOW) and free energy principle (FEP), and a recently developed one, called maximum occupancy principle (MOP), and compare them in sequential problems and fullyobservable environments.We find that MPOW shows a preference for unstable fixed points of the dynamical system that defines the agent and environment.FEP is shown to be equivalent to reward maximization in certain cases.None of these two principles of behavior seem to consistently generate variable behavior: behavior collapses within a small repertoire of possible actionstate trajectories or fixed points. Collapse to an optimal deterministic policy can be proved in specific, recent implementations of FEP, with the only exception of policy degeneracy due to ties. In contrast, MOP consistently generates variable actionstate trajectories. In two simple environments, a balancing cartpole and a grid world, we find that both MPOW and FEP agents stick to a relatively small set of states and actions, while MOP agents generate short of exploratory and dancinglike motions. 
Ruben Moreno Bote · Jorge Ramirez Ruiz 🔗 
Fri 12:40 p.m.  1:30 p.m.

Practical estimation of ensemble accuracy
(
Poster
)
link
Ensemble learning combines several individual models to obtain better generalization performance. In this work we present a method for estimating the joint power of several classifiers without jointly optimizing them.The essence of the method is a combinatorial bound on the number of mistakes the ensemble is likely to make. The bound can be efficiently approximated in time linear in the number of samples allowing, for example, to choose a combination of classifiers that are likely to produce higher joint accuracy. Moreover, the bound applies on unlabeled data, making it both accurate and practical in modern setting of unsupervised learning. We demonstrate the method on popular largescale face recognition datasets which provide a useful playground for finegrain classification tasks using noisy data over many classes.The proposed framework fits neatly in trending practices of unsupervised learning. It is a measure of the inherent independence of a set of classifiers not relying on extra information such as another classifier or labeled data. 
Simi Haber · Yonatan Wexler 🔗 
Fri 12:40 p.m.  1:30 p.m.

Attention Schema in Neural Agents
(
Poster
)
link
Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static selection of information supported by weights. In the same way, we can imagine a higherorder informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multiagent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence. 
Dianbo Liu · Samuele Bolotta · Mike He Zhu · Zahra Sheikhbahaee · Yoshua Bengio · Guillaume Dumas 🔗 
Fri 12:40 p.m.  1:30 p.m.

Noisy Population Dynamics Lead to Efficiently Compressed Semantic Systems
(
Poster
)
link
Human languages have been argued to support efficient communication. In particular, crosslinguistic evidence suggests that semantic category systems optimally balance cognitive cost with communicative accuracy. In this paper, we show that very general population dynamics of signaling games lead to the emergence of informationtheoretically efficient meaning systems. In numerical simulations, we observe that noisy perception of meaning can result in emergent meaning systems with higher efficiency. 
Nathaniel Imel · Noga Zaslavsky · Michael Franke · Richard Futrell 🔗 
Fri 12:40 p.m.  1:30 p.m.

On Complex Network Dynamics of an InVitro Neuronal System during Rest and Gameplay
(
Poster
)
link
In this study, we characterize complex network dynamics in live in vitro neuronal systems during two distinct activity states: spontaneous rest state and engagement in a realtime (closedloop) game environment using the DishBrain system.First, we embed the spiking activity of these channels in a lowerdimensional space using various representation learning methods and then extract a subset of representative channels. Next, by analyzing these lowdimensional representations, we explore the patterns of macroscopic neuronal network dynamics during learning. Remarkably, our findings indicate that just using the lowdimensional embedding of representative channels is sufficient to differentiate the neuronal culture during the Rest and Gameplay.Notably, our investigation shows dynamic changes in the connectivity patterns within the same region and across multiple regions on the multielectrode array only during Gameplay. These findings underscore the plasticity of neuronal networks in response to external stimuli and highlight the potential for modulating connectivity in a controlled environment.The ability to distinguish between neuronal states using reduceddimensional representations points to the presence of underlying patterns that could be pivotal for realtime monitoring and manipulation of neuronal cultures.Additionally, this provides insight into how biological based information processing systems rapidly adapt and learn and may lead to new improved algorithms. 
Moein Khajehnejad · Forough Habibollahi · Alon Loeffler · Brett J. Kagan · Adeel Razi 🔗 
Fri 12:40 p.m.  1:30 p.m.

Introducing an Improved InformationTheoretic Measure of Predictive Uncertainty
(
Poster
)
link
Applying a machine learning model for decisionmaking in the real world requires to distinguish what the model knows from what it does not. A critical factor in assessing the knowledge of a model is to quantify its predictive uncertainty. Predictive uncertainty is commonly measured by the entropy of the Bayesian model average (BMA) predictive distribution. Yet, the properness of this current measure of predictive uncertainty was recently questioned. We provide new insights regarding those limitations. Our analyses show that the current measure erroneously assumes that the BMA predictive distribution is equivalent to the predictive distribution of the true model that generated the dataset. Consequently, we introduce a theoretically grounded measure to overcome these limitations. We experimentally verify the benefits of our introduced measure of predictive uncertainty. We find that our introduced measure behaves more reasonably in controlled synthetic tasks. Moreover, our evaluations on ImageNet demonstrate that our introduced measure is advantageous in realworld applications utilizing predictive uncertainty. 
Kajetan Schweighofer · Lukas Aichberger · Mykyta Ielanskyi · Sepp Hochreiter 🔗 
Fri 12:40 p.m.  1:30 p.m.

What can AI Learn from Human Exploration? IntrinsicallyMotivated Humans and Agents in OpenWorld Exploration
(
Poster
)
link
What drives exploration? Understanding intrinsic motivation is a longstanding question in both cognitive science and artificial intelligence (AI); numerous exploration objectives have been proposed and tested in human experiments and used to train reinforcement learning (RL) agents. However, experiments in the former are often in simplistic environments that do not capture the complexity of real world exploration. On the other hand, experiments in the latter use more complex environments, yet the trained RL agents fail to come close to human exploration efficiency. To study this gap, we propose a framework for directly comparing human and agent exploration in an openended environment, Crafter. We study how well commonlyproposed information theoretic objectives for intrinsic motivation relate to actual human and agent behaviours, finding that human exploration consistently shows a significant positive correlation with Entropy, Information Gain, and Empowerment. Surprisingly, we find that intrinsicallymotivated RL agent exploration does not show the same significant correlation consistently, despite being designed to optimize objectives that approximate Entropy or Information Gain. In a preliminary analysis of verbalizations, we find that children's verbalizations of goals positively correlates strongly with Empowerment, suggesting that goalsetting may be an important aspect of efficient exploration. 
Yuqing Du · Eliza Kosoy · Alyssa L Dayan · Maria Rufova · Pieter Abbeel · Alison Gopnik 🔗 
Fri 12:40 p.m.  1:30 p.m.

Variable Selection in GPDMs Using the Information Bottleneck Method
(
Poster
)
link
In computer graphics and robotics, there is an increasing need for realtime generative models of human motion. Neural networks are often the favored choice, yet their generalization properties are limited, especially on small data sets. This paper utilizes the Gaussian process dynamical model (GPDM) as an alternative. Despite their successes in various motion tasks, GPDMs face challenges like high computational complexity and the need for many hyperparameters. This work addresses these issues by integrating the information bottleneck (IB) framework with GPDMs. The IB approach aims to optimally balance data fit and generalization through measures of mutual information. Our technique uses IB variable selection as a component of GPLVM backconstraints to select features for the latent space, reduce parameter count, and increase the model's robustness to changes in latent space dimensionality. 
Jesse St. Amand · Martin Giese 🔗 
Fri 12:40 p.m.  1:30 p.m.

InformationTheoretic Generalization Error Bound of Deep Neural Networks
(
Poster
)
link
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for learning within the paradigm of informationtheoretic generalization bounds. We derive two novel hierarchical bounds on the generalization error that capture the effect of the internal representations within each layer. The first bound demonstrates that the generalization bound shrinks as the layer index of the internal representation increases. The second bound aims to quantify the contraction of the relevant information measures when moving deeper into the network. To achieve this, we leverage the strong data processing inequality (SDPI) and employ a stochastic approximation of the DNN model we can explicitly control the SDPI coefficient. These results provide a new perspective for understanding generalization in deep models. 
Haiyun He · Christina Yu · Ziv Goldfeld 🔗 
Fri 12:40 p.m.  1:30 p.m.

Information theoretic study of the neural geometry induced by category learning
(
Poster
)
link
Categorization is an important topic both for biological and artificial neural networks. Here, we take an information theoretic approach to assess the efficiency of the representations induced by category learning. We show that one can decompose the relevant Bayesian cost into two components, one for the coding part and one for the decoding part. Minimizing this cost implies maximizing the mutual information between the set of categories and the neural activities. We analytically show that this mutual information can be written as the sum of two terms that can be interpreted as (i) finding an appropriate representation space, and, (ii) building a representation with the appropriate metrics, based on the neural Fisher information on this space. One main consequence is that category learning induces an expansion of neural space near decision boundaries. Finally, we provide numerical illustrations that show how Fisher information of the coding neural population aligns with the boundaries between categories. 
Laurent BONNASSEGAHOT · JeanPierre Nadal 🔗 
Fri 12:40 p.m.  1:30 p.m.

Lossy Compression and the Granularity of Causal Representation
(
Poster
)
link
A given causal system can be represented in a variety of ways. How do agents determine which variables to include in their causal representations, and at what level of granularity? Using techniques from information theory, we develop a formal theory according to which causal representations reflect a tradeoff between compression and informativeness. We then show, across three studies (N=1,391), that participants’ choices over causal models demonstrate a preference for more compressed causal models when all other factors are held fixed, with some further tolerance for lossy compressions. 
David Kinney · Tania Lombrozo 🔗 
Fri 12:40 p.m.  1:30 p.m.

The PerceptionUncertainty Tradeoff in Generative Restoration Models
(
Poster
)
link
Generative models have achieved remarkable performance in restoration tasks, producing results nearly indistinguishable from real data. However, they are prone to generating artifacts or hallucinations not present in the original input, inducing estimation uncertainty. Notably, the extent of hallucination seems to increase with the perceptual quality of the generative model. This paper explores this phenomenon using informationtheoretic tools to uncover an inherent tradeoff between perception and uncertainty. Our mathematical analysis shows that the uncertainty of the restoration algorithm, as measured by error entropy, grows in tandem with the improvement in perceptual quality. Employing R'enyi divergence as a perception measure, we derive lower and upper bounds for the tradeoff, locating estimators into distinct performance categories. Furthermore, we establish a relationship between estimation distortion and uncertainty, through which we provide a fresh perspective on the perceptiondistortion tradeoff. Our work presents a principled analysis of uncertainty, emphasizing its interplay with perception and distortion, and the limitations of generative models in restoration tasks. 
Regev Cohen · Ehud Rivlin · Daniel Freedman 🔗 
Fri 12:40 p.m.  1:30 p.m.

Cognitive Information Filters: Algorithmic Choice Architecture for Boundedly Rational Choosers
(
Poster
)
link
We introduce cognitive information filters as an algorithmic approach to mitigating information overload using choice architecture: We develop an informationtheoretic model of boundedly rational multiattribute choice and leverage it to programmatically select information that is effective in inducing desirable behavioral outcomes. By inferring preferences and cognitive constraints from boundedly rational behavior, our methodology can optimize for revealed welfare and hence promises better alignment with boundedly rational users than recommender systems optimizing for imperfect welfare proxies such as engagement. This has implications beyond economics, for example for alignment research in artificial intelligence. 
Stefan Bucher · Peter Dayan 🔗 
Fri 12:40 p.m.  1:30 p.m.

Discrete, compositional, and symbolic representations through attractor dynamics
(
Poster
)
link
Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite capacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractorsupported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information. 
Andrew Nam · Eric Elmoznino · Nikolay Malkin · Chen Sun · Yoshua Bengio · Guillaume Lajoie 🔗 
Fri 12:40 p.m.  1:30 p.m.

Active Vision with Predictive Coding and Uncertainty Minimization
(
Poster
)
link
We present an endtoend procedure for embodied visual exploration based on two biologically inspired computations: predictive coding and uncertainty minimization. The procedure can be applied in a taskindependent and intrinsically driven manner. We evaluate our approach on an active vision task, where an agent must actively sample its visual environment to gather information. We show that our model is able to build unsupervised representations that allow it to actively sample and efficiently categorize sensory scenes. We further show that using these representations as input for downstream classification leads to superior data efficiency and learning speed compared to other baselines, while also maintaining lower parameter complexity. Finally, the modularity of our model allows us to analyze its internal mechanisms and to draw insight into the interactions between perception and action during exploratory behavior. 
Abdelrahman Sharafeldin · Nabil Imam · Hannah Choi 🔗 
Fri 12:40 p.m.  1:30 p.m.

An InformationTheoretic Understanding of Maximum Manifold Capacity Representations
(
Poster
)
link
Maximum Manifold Capacity Representations (MMCR) is a recent multiview selfsupervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting for at least two reasons. Firstly, MMCR is an oddity in the zoo of MVSSL methods: it is not (explicitly) contrastive, applies no masking, performs no clustering, leverages no distillation, and does not (explicitly) reduce redundancy. Secondly, while many selfsupervised learning (SSL) methods originate in information theory, MMCR distinguishes itself by claiming a different origin: a statistical mechanical characterization of the geometry of linear separability of data manifolds. However, given the rich connections between statistical mechanics and information theory, and given recent work showing how many SSL methods can be understood from an informationtheoretic perspective, we conjecture that MMCR can be similarly understood from an informationtheoretic perspective. In this paper, we leverage tools from high dimensional probability and information theory to demonstrate that an optimal solution to MMCR's nuclear normbased objective function is the same optimal solution that maximizes a wellknown lower bound on mutual information. 
Rylan Schaeffer · Berivan Isik · Victor Lecomte · Mikail Khona · Yann LeCun · Andrey Gromov · Ravid ShwartzZiv · Sanmi Koyejo 🔗 
Fri 12:40 p.m.  1:30 p.m.

States as goaldirected concepts: an epistemic approach to staterepresentation learning
(
Poster
)
link
Our goals shape how we represent our experience. For example, when we are hungry, we tend to view objects in our environment according to whether or not they are edible (or tasty). Alternatively, when we are cold, we may view the very same objects according to their ability to produce heat. Computational theories of learning in cognitive systems, such as reinforcement learning, use the notion of "staterepresentation" to describe how agents selectively construe and focus on behaviorallyrelevant features of their environment. However, these approaches typically assume "groundtruth" state representations that are known by the agent, and reward functions that need to be learned. Here we suggest an alternative approach in which staterepresentations are not assumed veridical, or even predefined, but rather emerge from the agent's goals through interaction with its environment. We illustrate this novel perspective by inferring the goals driving rat behavior in an odorguided choice task and discuss potential implications for developing, from first principles, an informationtheoretic account of goaldirected state representation learning. 
Nadav Amir · Yael Niv · Angela Langdon 🔗 
Fri 12:40 p.m.  1:30 p.m.

Natural Language Systematicity from a Constraint on Excess Entropy
(
Poster
)
link
Natural language is systematic: utterances are composed of individually meaningful parts which are typically concatenated together. We argue that naturallanguagelike systematicity arises in codes when they are constrained by excess entropy, the mutual information between the past and the future of a process. In three examples, we show that codes with naturallanguagelike systematicity have lower excess entropy than matched alternatives. 
Richard Futrell 🔗 
Fri 12:40 p.m.  1:30 p.m.

Learning Causally Emergent Representations
(
Poster
)
link
Cognitive processes usually take place at a macroscopic scale in systems characterised by emergent properties, which make the whole `more than the sum of its parts.' While recent proposals have provided quantitative, informationtheoretic metrics to detect emergence in time series data, it is often highly nontrivial to identify the relevant macroscopic variables a priori. In this paper we leverage recent advances in representation learning and differentiable information estimators to put forward a datadriven method to find emergent variables. The proposed method successfully detects emergent variables and recovers the groundtruth emergence values in a synthetic dataset. This proofofconcept paves the ground for future analyses uncovering the emergent structure of cognitive representations in biological and artificial intelligence systems. 
Christos Kaplanis · Pedro A.M Mediano · Fernando Rosas 🔗 
Fri 12:40 p.m.  1:30 p.m.

InfoCog Poster Session
(
poster session
)

🔗 
Fri 1:50 p.m.  2:00 p.m.

InformationTheoretic Generalization Error Bound of Deep Neural Networks
(
Oral
)
link
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for learning within the paradigm of informationtheoretic generalization bounds. We derive two novel hierarchical bounds on the generalization error that capture the effect of the internal representations within each layer. The first bound demonstrates that the generalization bound shrinks as the layer index of the internal representation increases. The second bound aims to quantify the contraction of the relevant information measures when moving deeper into the network. To achieve this, we leverage the strong data processing inequality (SDPI) and employ a stochastic approximation of the DNN model we can explicitly control the SDPI coefficient. These results provide a new perspective for understanding generalization in deep models. 
Haiyun He · Christina Yu · Ziv Goldfeld 🔗 
Fri 2:00 p.m.  2:55 p.m.

Information theory, cognition, and deep learning: Challenges and opportunities
(
Panel discussion
)
SlidesLive Video Panelists: Sarah Marzen, Dani S. Bassett, Noah Goodman, Stephan Mandt Moderators: Noga Zaslavsky, Rava A. da Silveira, Ronit Bustin, Ron M. Hecht 
Sarah Marzen · Stephan Mandt · Noah Goodman · Danielle S Bassett · Noga Zaslavsky · Rava Azeredo da Silveira · Ron M. Hecht · Ronit Bustin 🔗 
Fri 3:00 p.m.  3:30 p.m.

Poster Organization
(
Removing posters (please remove your poster)
)

🔗 