Workshop
Symmetry and Geometry in Neural Representations
Sophia Sanborn · Christian A Shewmake · Simone Azeglio · Nina Miolane
La Nouvelle Orleans Ballroom A+B (level 2)
In recent years, there has been a growing appreciation for the importance of respecting the topological, algebraic, or geometric structure of data in machine learning models. In parallel, an emerging set of findings in computational neuroscience suggests that the preservation of this kind of mathematical structure may be a fundamental principle of neural coding in biology. The goal of this workshop is to bring together researchers from applied mathematics and deep learning with neuroscientists whose work reveals the elegant implementation of mathematical structure in biological neural circuitry. Group theory and differential geometry were instrumental in unifying the models of 20th-century physics. Likewise, they have the potential to unify our understanding of how neural systems form useful representations of the world.
Schedule
Sat 7:00 a.m. - 7:30 a.m.
|
Pre-structured low-dimensional manifolds for rapid and efficient learning, memory, and inference in the brain
(
Invited Talk
)
>
SlidesLive Video The brain constructs and combines modular structures for flexible computation. I will describe recent progress in characterizing the rigid and low-dimensional nature of some of these representations, using theoretical approaches including fully unsupervised topological characterization of neural population codes. I will then discuss models of how these rigid and modular circuits can emerge, and how they can generate, with high capacity and high data-efficiency without rewiring recurrent circuitry, cognitive maps across different variables (e.g. spatial and non-spatial) as well as across varied input dimensions. |
Ila Fiete 🔗 |
Sat 7:30 a.m. - 7:40 a.m.
|
Expressive dynamics models with nonlinear injective readouts enable reliable recovery of latent features from neural activity
(
Contributed Talk
)
>
SlidesLive Video An emerging framework in neuroscience uses the rules that govern how a neural circuit's state evolves over time to understand the circuit's underlying computation. While these \textit{neural dynamics} cannot be directly measured, new techniques attempt to estimate them by modeling observed neural recordings as a low-dimensional latent dynamical system embedded into a higher-dimensional neural space. How these models represent the readout from latent space to neural space can affect the interpretability of the latent representation -- for example, for models with a linear readout could make simple, low-dimensional dynamics unfolding on a non-linear neural manifold appear excessively complex and high-dimensional. Additionally, standard readouts (both linear and non-linear) often lack injectivity, meaning that they don't obligate changes in latent state to directly affect activity in the neural space. During training, non-injective readouts incentivize the model to invent dynamics that misrepresent the underlying system and computation. To address the challenges presented by non-linearity and non-injectivity, we combined a custom readout with a previously developed low-dimensional latent dynamics model to create the Ordinary Differential equations autoencoder with Injective Nonlinear readout (ODIN). We generated a synthetic spiking dataset by non-linearly embedding activity from a low-dimensional dynamical system into higher-D neural activity. We show that, in contrast to alternative models, ODIN is able to recover ground-truth latent activity from these data even when the nature of the system and embedding are unknown. Additionally, we show that ODIN enables the unsupervised recovery of underlying dynamical features (e.g., fixed points) and embedding geometry (e.g., the neural manifold) over alternative models. Overall, ODIN's ability to recover ground-truth latent features with low dimensionality make it a promising method for distilling interpretable dynamics that can explain neural computation. |
Christopher Versteeg 🔗 |
Sat 7:40 a.m. - 7:50 a.m.
|
On Complex Network Dynamics of an In-Vitro Neuronal System during Rest and Gameplay
(
Contributed Talk
)
>
SlidesLive Video In this study, we focus on characterising the complex network dynamics of in vitro neuronal system of live biological cells during two distinct activity states: spontaneous rest state and engagement in a real-time (closed-loop) game environment. We use DishBrain which is a system that embodies in vitro neural networks with in silico computation using a high-density multi-electrode array. First, we embed the spiking activity of these channels in a lower-dimensional space using various representation learning methods. We then extract a subset of representative channels that are consistent across all of the neuronal preparations. Next, by analyzing these low-dimensional representations, we explore the patterns of macroscopic neuronal network dynamics during the learning process. Remarkably, our findings indicate that just using the low-dimensional embedding of representative channels is sufficient to differentiate the neuronal culture during the Rest and Gameplay conditions. Furthermore, we characterise the evolving neuronal connectivity patterns within the Dish-Brain system over time during Gameplay in comparison to the Rest condition. Notably, our investigation shows dynamic changes in the overall connectivity within the same region and across multiple regions on the multi-electrode array only during Gameplay. These findings underscore the plasticity of these neuronal networks in response to external stimuli and highlight the potential for modulating connectivity in a controlled environment. The ability to distinguish between neuronal states using reduced-dimensional representations points to the presence of underlying patterns that could be pivotal for real-time monitoring and manipulation of neuronal cultures. Additionally, this provides insight into how biological based information processing systems rapidly adapt and learn and may lead to new or improved algorithms. |
Moein Khajehnejad 🔗 |
Sat 7:50 a.m. - 8:00 a.m.
|
Geometry of abstract learned knowledge in deep RL agents
(
Contributed Talk
)
>
link
SlidesLive Video Data from neural recordings suggest that mammalian brains represent physical and abstract task-relevant variables through low-dimensional neural manifolds. In a recent electrophysiological study (Nieh et al., 2021), mice performed an evidence accumulation task while moving along a virtual track. Nonlinear dimensionality reduction of the population activity revealed that task-relevant variables were jointly mapped in an orderly manner in the low-dimensional space. Here we trained deep reinforcement learning (RL) agents on the same evidence accumulation task and found that their neural activity can be described with a low-dimensional manifold spanned by task-relevant variables. These results provide further insight into similarities and differences between neural dynamics in mammals and deep RL agents. Furthermore, we showed that manifold learning can be used to characterize the representational space of the RL agents with the potential to improve the interpretability of decision-making in RL. |
James Mochizuki-Freeman 🔗 |
Sat 8:00 a.m. - 8:20 a.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Sat 8:20 a.m. - 8:50 a.m.
|
Topological Deep Learning: Going Beyond Graph Data
(
Invited Talk
)
>
SlidesLive Video |
Mustafa Hajij 🔗 |
Sat 8:50 a.m. - 9:00 a.m.
|
Spectral Maps for Learning on Subgraphs
(
Contributed Talk
)
>
SlidesLive Video In graph learning, maps between graphs and their subgraphs frequently arise. For instance, when coarsening or rewiring operations are present along the pipeline, one needs to keep track of the corresponding nodes between the original and modified graphs. Classically, these maps are represented as binary node-to-node correspondence matrices, and used as-is to transfer node-wise features between the graphs. In this paper, we argue that simply changing this map representation can bring notable benefits to graph learning tasks. Drawing inspiration from recent progress in geometry processing, we introduce a spectral representation for maps that is easy to integrate into existing graph learning models. This spectral representation is a compact and straightforward plug-in replacement, and is robust to topological changes of the graphs. Remarkably, the representation exhibits structural properties that make it interpretable, drawing an analogy with recent results on smooth manifolds. We demonstrate the benefits of incorporating spectral maps in graph learning pipelines, addressing scenarios where a node-to-node map is not well defined, or in the absence of exact isomorphism. Our approach bears practical benefits in knowledge distillation and hierarchical learning, where we show comparable or improved performance at a fraction of the computational cost. |
Marco Pegoraro 🔗 |
Sat 9:00 a.m. - 9:10 a.m.
|
Data Augmentations in Deep Weight Spaces
(
Contributed Talk
)
>
SlidesLive Video Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies. |
Aviv Shamsian 🔗 |
Sat 9:10 a.m. - 9:20 a.m.
|
From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication
(
Contributed Talk
)
>
SlidesLive Video It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch. |
Irene Cannistraci 🔗 |
Sat 9:20 a.m. - 9:30 a.m.
|
Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers
(
Contributed Talk
)
>
SlidesLive Video The Geometric Algebra Transformer (GATr) is a versatile architecture for geometric deep learning based on projective geometric algebra. We generalize this architecture into a blueprint that allows one to construct a scalable transformer architecture given any geometric (or Clifford) algebra. We study versions of this architecture for Euclidean, projective, and conformal algebras, all of which are suited to represent 3D data, and evaluate them in theory and practice. The simplest Euclidean architecture is computationally cheap, but has a smaller symmetry group and is not as sample-efficient, while the projective model is not sufficiently expressive. Both the conformal algebra and an improved version of the projective algebra define powerful, performant architectures. |
Pim de Haan 🔗 |
Sat 9:30 a.m. - 10:00 a.m.
|
The Role of World Models in Intelligence
(
Discussion Panel
)
>
SlidesLive Video |
🔗 |
Sat 10:00 a.m. - 11:20 a.m.
|
Lunch Break
(
Lunch Break
)
>
|
🔗 |
Sat 11:20 a.m. - 11:50 a.m.
|
From Local Diffeomorphism Detection to Symbolic Representation
(
Invited Talk
)
>
SlidesLive Video |
Doris Tsao 🔗 |
Sat 11:50 a.m. - 12:20 p.m.
|
Rotation-equivariant predictive modeling reveals the functional organization of primary visual cortex
(
Invited Talk
)
>
SlidesLive Video More than a dozen excitatory cell types have been identified in the mouse primary visual cortex (V1) based on transcriptomic, morphological and in vitro electrophysiological features. However, little is known about the functional organization of visual cortex neurons and their responses properties beyond orientation selectivity. Here, we combined large-scale two-photon imaging and predictive modeling of neural responses to study the functional organization of mouse V1. We developed a rotation-equivariant model architecture, followed by a rotation-invariant clustering pipeline to map the landscape of neural function in V1. Clustering neurons based on their stimulus response function revealed a continuum with around 30 modes. Each mode represented a group of neurons that exhibited a specific combination of stimulus selectivity and nonlinear response properties such as cross-orientation inhibition, size-contrast tuning and surround suppression. Interestingly, these non-linear properties were expressed independently and all possible combinations were present in the population. Our study shows how building known symmetries into neural response models can reveal interesting insights about the organization of the visual system. |
Alexander Ecker 🔗 |
Sat 12:20 p.m. - 12:30 p.m.
|
Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds
(
Contributed Talk
)
>
SlidesLive Video While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point, for example infinitesimal augmentations, noise perturbations, or perturbations produced by a generative model, and studying how these change as they pass through a network. Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint. Our results provide new perspectives on a number of phenomena, such as the manner in which training with augmentation produces model invariance or the proposed trade-off between adversarial training and model generalization. |
Henry Kvinge 🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Physics Priors in Machine Learning
(
Invited Talk
)
>
SlidesLive Video Good neural architectures are rooted in good inductive biases (a.k.a. priors). Equivariance under symmetries is a prime example of a successful physics inspired prior which sometimes dramatically reduces the number of examples needed to learn predictive models. Diffusion based models, one of the most successful generative models, are rooted in nonequilibrium statistical mechanics. Conversely, ML methods have recently been used to solve PDEs for example in weather prediction, and to accelerate MD simulations by learning the (quantum mechanical) interactions between atoms and electrons. In this work we will try to extend this thinking to more flexible priors in the hidden variables of a neural network. In particular, we will impose wavelike dynamics in hidden variables under transformations of the inputs, which relaxes the stricter notion of equivariance. We find that under certain conditions, wavelike dynamics naturally arises in these hidden representations. We formalize this idea in a VAE-over-time architecture where the hidden dynamics is described by a Fokker-Planck (a.k.a. drift-diffusion) equation. This in turn leads to a new definition of a disentangled hidden representation of input states that can easily be manipulated to undergo transformations. |
Max Welling 🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee Break
(
Break
)
>
|
🔗 |
Sat 1:30 p.m. - 1:40 p.m.
|
Symmetry Breaking and Equivariant Neural Networks
(
Contributed Talk
)
>
SlidesLive Video Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However, the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding. |
Oumar Kaba 🔗 |
Sat 1:40 p.m. - 1:50 p.m.
|
Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks
(
Contributed Talk
)
>
SlidesLive Video The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. In this study, we present a systematic method to induce a generalized neural network and its right inverse operator, called the ridgelet transform, from a joint group invariant function on the data-parameter domain. Since the ridgelet transform is an inverse, (1) it can describe the arrangement of parameters for the network to represent a target function, which is understood as the encoding rule, and (2) it implies the universality of the network. Based on the group representation theory, we present a new simple proof of the universality by using Schur's lemma in a unified manner covering a wide class of networks, for example, the original ridgelet transform, formal deep networks, and the dual voice transform. Since traditional universality theorems were demonstrated based on functional analysis, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis. |
Sho Sonoda 🔗 |
Sat 1:50 p.m. - 2:00 p.m.
|
Towards Information Theory-Based Discovery of Equivariances
(
Contributed Talk
)
>
SlidesLive Video The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their information-processing. In parallel, principled models of complexity-constrained learning and behaviour make increasing use of information-theoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the information-theoretic lens can ``see'' the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and information-constrained adaptive behaviour. We show (in the discrete case) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual information-preserving joint compression of the channel's input and output. This information-theoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of input-output mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered. |
Hippolyte Charvin 🔗 |
Sat 2:00 p.m. - 2:05 p.m.
|
Announcements & Closing Remarks
(
Closing remarks
)
>
SlidesLive Video |
🔗 |
Sat 2:05 p.m. - 3:00 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
-
|
Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks
(
Oral
)
>
link
The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. In this study, we present a systematic method to induce a generalized neural network and its right inverse operator, called the ridgelet transform, from a joint group invariant function on the data-parameter domain. Since the ridgelet transform is an inverse, (1) it can describe the arrangement of parameters for the network to represent a target function, which is understood as the encoding rule, and (2) it implies the universality of the network. Based on the group representation theory, we present a new simple proof of the universality by using Schur's lemma in a unified manner covering a wide class of networks, for example, the original ridgelet transform, formal deep networks, and the dual voice transform. Since traditional universality theorems were demonstrated based on functional analysis, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis. |
Sho Sonoda · Hideyuki Ishi · Isao Ishikawa · Masahiro Ikeda 🔗 |
-
|
Towards Information Theory-Based Discovery of Equivariances
(
Oral
)
>
link
The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their information-processing. In parallel, principled models of complexity-constrained learning and behaviour make increasing use of information-theoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the information-theoretic lens can ``see'' the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and information-constrained adaptive behaviour. We show (in the discrete case) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual information-preserving joint compression of the channel's input and output. This information-theoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of input-output mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered. |
Hippolyte Charvin · Nicola Catenacci Volpi · Daniel Polani 🔗 |
-
|
Expressive dynamics models with nonlinear injective readouts enable reliable recovery of latent features from neural activity
(
Oral
)
>
link
An emerging framework in neuroscience uses the rules that govern how a neural circuit's state evolves over time to understand the circuit's underlying computation. While these \textit{neural dynamics} cannot be directly measured, new techniques attempt to estimate them by modeling observed neural recordings as a low-dimensional latent dynamical system embedded into a higher-dimensional neural space. How these models represent the readout from latent space to neural space can affect the interpretability of the latent representation -- for example, for models with a linear readout could make simple, low-dimensional dynamics unfolding on a non-linear neural manifold appear excessively complex and high-dimensional. Additionally, standard readouts (both linear and non-linear) often lack injectivity, meaning that they don't obligate changes in latent state to directly affect activity in the neural space. During training, non-injective readouts incentivize the model to invent dynamics that misrepresent the underlying system and computation. To address the challenges presented by non-linearity and non-injectivity, we combined a custom readout with a previously developed low-dimensional latent dynamics model to create the Ordinary Differential equations autoencoder with Injective Nonlinear readout (ODIN). We generated a synthetic spiking dataset by non-linearly embedding activity from a low-dimensional dynamical system into higher-D neural activity. We show that, in contrast to alternative models, ODIN is able to recover ground-truth latent activity from these data even when the nature of the system and embedding are unknown. Additionally, we show that ODIN enables the unsupervised recovery of underlying dynamical features (e.g., fixed points) and embedding geometry (e.g., the neural manifold) over alternative models. Overall, ODIN's ability to recover ground-truth latent features with low dimensionality make it a promising method for distilling interpretable dynamics that can explain neural computation. |
Christopher Versteeg · Andrew Sedler · Jonathan McCart · Chethan Pandarinath 🔗 |
-
|
Data Augmentations in Deep Weight Spaces
(
Oral
)
>
link
Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies. |
Aviv Shamsian · David Zhang · Aviv Navon · Yan Zhang · Miltiadis (Miltos) Kofinas · Idan Achituve · Riccardo Valperga · Gertjan Burghouts · Efstratios Gavves · Cees Snoek · Ethan Fetaya · Gal Chechik · Haggai Maron
|
-
|
Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds
(
Oral
)
>
link
While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point, for example infinitesimal augmentations, noise perturbations, or perturbations produced by a generative model, and studying how these change as they pass through a network. Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint. Our results provide new perspectives on a number of phenomena, such as the manner in which training with augmentation produces model invariance or the proposed trade-off between adversarial training and model generalization. |
Henry Kvinge · Grayson Jorgenson · Davis Brown · Charles Godfrey · Tegan Emerson 🔗 |
-
|
Spectral Maps for Learning on Subgraphs
(
Oral
)
>
link
In graph learning, maps between graphs and their subgraphs frequently arise. For instance, when coarsening or rewiring operations are present along the pipeline, one needs to keep track of the corresponding nodes between the original and modified graphs. Classically, these maps are represented as binary node-to-node correspondence matrices, and used as-is to transfer node-wise features between the graphs. In this paper, we argue that simply changing this map representation can bring notable benefits to graph learning tasks. Drawing inspiration from recent progress in geometry processing, we introduce a spectral representation for maps that is easy to integrate into existing graph learning models. This spectral representation is a compact and straightforward plug-in replacement, and is robust to topological changes of the graphs. Remarkably, the representation exhibits structural properties that make it interpretable, drawing an analogy with recent results on smooth manifolds. We demonstrate the benefits of incorporating spectral maps in graph learning pipelines, addressing scenarios where a node-to-node map is not well defined, or in the absence of exact isomorphism. Our approach bears practical benefits in knowledge distillation and hierarchical learning, where we show comparable or improved performance at a fraction of the computational cost. |
Marco Pegoraro · Riccardo Marin · Arianna Rampini · Simone Melzi · Luca Cosmo · Emanuele Rodolà 🔗 |
-
|
Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers
(
Oral
)
>
link
The Geometric Algebra Transformer (GATr) is a versatile architecture for geometric deep learning based on projective geometric algebra. We generalize this architecture into a blueprint that allows one to construct a scalable transformer architecture given any geometric (or Clifford) algebra. We study versions of this architecture for Euclidean, projective, and conformal algebras, all of which are suited to represent 3D data, and evaluate them in theory and practice. The simplest Euclidean architecture is computationally cheap, but has a smaller symmetry group and is not as sample-efficient, while the projective model is not sufficiently expressive. Both the conformal algebra and an improved version of the projective algebra define powerful, performant architectures. |
Pim de Haan · Taco Cohen · Johann Brehmer 🔗 |
-
|
On Complex Network Dynamics of an In-Vitro Neuronal System during Rest and Gameplay
(
Oral
)
>
link
In this study, we focus on characterising the complex network dynamics of in vitro neuronal system of live biological cells during two distinct activity states: spontaneous rest state and engagement in a real-time (closed-loop) game environment. We use DishBrain which is a system that embodies in vitro neural networks with in silico computation using a high-density multi-electrode array. First, we embed the spiking activity of these channels in a lower-dimensional space using various representation learning methods. We then extract a subset of representative channels that are consistent across all of the neuronal preparations. Next, by analyzing these low-dimensional representations, we explore the patterns of macroscopic neuronal network dynamics during the learning process. Remarkably, our findings indicate that just using the low-dimensional embedding of representative channels is sufficient to differentiate the neuronal culture during the Rest and Gameplay conditions. Furthermore, we characterise the evolving neuronal connectivity patterns within the Dish-Brain system over time during Gameplay in comparison to the Rest condition. Notably, our investigation shows dynamic changes in the overall connectivity within the same region and across multiple regions on the multi-electrode array only during Gameplay. These findings underscore the plasticity of these neuronal networks in response to external stimuli and highlight the potential for modulating connectivity in a controlled environment. The ability to distinguish between neuronal states using reduced-dimensional representations points to the presence of underlying patterns that could be pivotal for real-time monitoring and manipulation of neuronal cultures. Additionally, this provides insight into how biological based information processing systems rapidly adapt and learn and may lead to new or improved algorithms. |
Moein Khajehnejad · Forough Habibollahi · Alon Loeffler · Brett J. Kagan · Adeel Razi 🔗 |
-
|
Symmetry Breaking and Equivariant Neural Networks
(
Oral
)
>
link
Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However, the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding. |
Oumar Kaba · Siamak Ravanbakhsh 🔗 |
-
|
From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication
(
Oral
)
>
link
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch. |
Irene Cannistraci · Luca Moschella · Marco Fumero · Valentino Maiorca · Emanuele Rodolà 🔗 |
-
|
Learning Useful Representations of Recurrent Neural Network Weight Matrices
(
Poster
)
>
link
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. Its direct analysis, however, tends to be challenging. Is it possible to learn useful representations of RNN weights that facilitate downstream tasks? While the "Mechanistic Approach" directly 'looks inside' the RNN to predict its behavior, the "Functionalist Approach" analyzes its overall functionality---specifically, its input-output mapping. Our two novel Functionalist Approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. Our novel theoretical framework for the Functionalist Approach demonstrates conditions under which it can generate rich representations for determining the behavior of RNNs. RNN weight representations generated by Mechanistic and Functionalist approaches are compared by evaluating them in two downstream tasks. Our results show the superiority of Functionalist methods. |
Vincent Herrmann · Francesco Faccio · Jürgen Schmidhuber 🔗 |
-
|
Distance Learner: Incorporating Manifold Prior to Model Training
(
Poster
)
>
link
The manifold hypothesis (real-world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high-dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNN-based classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness and find that it not only outperforms standard classifiers by a large margin but also performs at par with classifiers trained via well-accepted standard adversarial training. |
Aditya Chetan · Nipun Kwatra 🔗 |
-
|
An Information-Theoretic Understanding of Maximum Manifold Capacity Representations
(
Poster
)
>
link
Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is interesting for at least two reasons. Firstly, MMCR is an oddity in the zoo of MVSSL methods: it is not (explicitly) contrastive, applies no masking, performs no clustering, leverages no distillation, and does not (explicitly) reduce redundancy. Secondly, while many self-supervised learning (SSL) methods originate in information theory, MMCR distinguishes itself by claiming a different origin: a statistical mechanical characterization of the geometry of linear separability of data manifolds. However, given the rich connections between statistical mechanics and information theory, and given recent work showing how many SSL methods can be understood from an information-theoretic perspective, we conjecture that MMCR can be similarly understood from an information-theoretic perspective. In this paper, we leverage tools from high dimensional probability and information theory to demonstrate that an optimal solution to MMCR's nuclear norm-based objective function is the same optimal solution that maximizes a well-known lower bound on mutual information. |
Victor Lecomte · Rylan Schaeffer · Berivan Isik · Mikail Khona · Yann LeCun · Sanmi Koyejo · Andrey Gromov · Ravid Shwartz-Ziv 🔗 |
-
|
Sample Efficient Modeling of Drag Coefficients for Satellites with Symmetry
(
Poster
)
>
link
Accurate knowledge of the atmospheric drag coefficient for a satellite in low Earth orbit is crucial to plan an orbit that avoids collisions with other spacecraft, but its calculation has high uncertainty and is very expensive to numerically compute for long-horizon predictions. Previous work has improved coefficient modeling speed with data-driven approaches, but these models do not utilize domain symmetry. This work investigates enforcing the invariance of atmospheric particle deflections off certain satellite geometries, resulting in higher sample efficiency and theoretically more robustness for data-driven methods. We train $G$-equivariant MLPs to predict the drag coefficient, where $G$ defines invariances of the coefficient across different orientations of the satellite. We experiment on a synthetic dataset computed using the numerical Test Particle Monte Carlo (TPMC) method, where particles are fired at a satellite in the computational domain. We find that our method is more sample and computationally efficient than unconstrained baselines, which is significant because TPMC simulations are extremely computationally expensive.
|
Neel Sortur · Linfeng Zhao · Robin Walters 🔗 |
-
|
AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference
(
Poster
)
>
link
In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data, dynamically learning the necessary graph structure. These graphs are often derived from a latent embedding space, which can be modeled using Euclidean, hyperbolic, spherical, or product spaces. However, currently, there is no principled differentiable method for determining the optimal embedding space. In this work, we introduce the Attentional Multi-Embedding Selection (AMES) framework, a differentiable method for selecting the best embedding space for latent graph inference through backpropagation, considering a downstream task. Our framework consistently achieves comparable or superior results compared to previous methods for latent graph inference across five benchmark datasets. Importantly, our approach eliminates the need for conducting multiple experiments to identify the optimal embedding space. Furthermore, we explore interpretability techniques that track the gradient contributions of different latent graphs, shedding light on how our attention-based, fully differentiable approach learns to choose the appropriate latent space. In line with previous works, our experiments emphasize the advantages of hyperbolic spaces in enhancing performance. More importantly, our interpretability framework provides a general approach for quantitatively comparing embedding spaces across different tasks based on their contributions, a dimension that has been overlooked in previous literature on latent graph inference. |
Yuan Lu · Haitz Sáez de Ocáriz Borde · Pietro Lió 🔗 |
-
|
Optimal packing of attractor states in neural representations
(
Poster
)
>
link
Animals' internal states reflect variables like their position in space, orientation, decisions, and motor actions—but how should these internal states be arranged? Internal states which frequently transition between one another should be close enough that transitions can happen quickly, but not so close that neural noise significantly impacts the stability of those states, and how reliably they can be encoded and decoded. In this paper, we study the problem of striking a balance between these two concerns, which we call an 'optimal packing' problem since it resembles mathematical problems like sphere packing. While this problem is generally extremely difficult, we show that symmetries in environmental transition statistics imply certain symmetries of the optimal neural representations, which allows us in some cases to exactly solve for the optimal state arrangement. We focus on two toy cases: uniform transition statistics, and cyclic transition statistics. |
John Vastola 🔗 |
-
|
Grokking in recurrent networks with attractive and oscillatory dynamics
(
Poster
)
>
link
Generalization is perhaps the most salient property of biological intelligence. In the context of artificial neural networks (ANNs), generalization has been studied through investigating the recently-discovered phenomenon of "grokking" whereby small transformers generalize on modular arithmetic tasks. We extend this line of work to continuous time recurrent neural networks (CT-RNNs) to investigate generalization in neural systems. Inspired by the card game SET, we reformulated previous modular arithmetic tasks as a binary classification task to elicit interpretable CT-RNN dynamics. We found that CT-RNNs learned one of two dynamical mechanisms characterized by either attractive or oscillatory dynamics. Notably, both of these mechanisms displayed strong parallels to deterministic finite automata (DFA). In our grokking experiments, we found that attractive dynamics generalize more frequently in training regimes with few withheld data points while oscillatory dynamics generalize more frequently in training regimes with many withheld data points. |
Keith Murray 🔗 |
-
|
Quantifying Lie Group Learning with Local Symmetry Error
(
Poster
)
>
link
Despite increasing interest in using machine learning to discover symmetries, no quantitative measure has been proposed in order to compare the performance of different algorithms. Our proposal, both intuitively and theoretically grounded, is to compare Lie groups using a local symmetry error, based on the difference between their infinitesimal actions at any possible datapoint. Namely, we use a well-studied metric to compare the induced tangent spaces. We obtain an upper bound on this metric which is uniform across datapoints, under some conditions. We show that when one of the groups is a circle group, this bound is furthermore both tight and easily computable, thus globally characterizing the local errors. We demonstrate our proposal by quantitatively evaluating an existing algorithm. We note that our proposed metric has deficiencies in comparing tangent spaces of different dimensions, as well as distinct groups whose local actions are similar. |
Vasco Portilheiro 🔗 |
-
|
How do language models bind entities in context?
(
Poster
)
>
link
To correctly use in-context information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge in-context, providing a step towards understanding general in-context reasoning in large-scale LMs. |
Jiahai Feng · Jacob Steinhardt 🔗 |
-
|
Improving Convergence and Generalization Using Parameter Symmetries
(
Poster
)
>
link
In overparametrized models, different parameter values may result in the same loss. Parameter space symmetries are loss-invariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we prove that teleportation gives overall faster time to convergence. Additionally, teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minima and generalization ability. Finally, we show that integrating teleportation into optimization-based meta-learning improves convergence over traditional algorithms that perform only local updates. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization. |
Bo Zhao · Robert Gower · Robin Walters · Rose Yu 🔗 |
-
|
Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus
(
Poster
)
>
link
Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic class associated with their electronic band structure, the first Chern class. Given their importance to next generation computing and the computational challenge of calculating them using first-principles approaches, there is a need to develop machine learning approaches to predict the characteristic classes associated with a material system. To aid in this program we introduce the *Haldane bundle dataset*, which consists of synthetically generated complex line bundles on the $2$-torus. We envision this dataset, which is not as challenging as noisy and sparsely measured real-world datasets but (as we show) still difficult for off-the-shelf architectures, to be a testing ground for architectures that incorporate the rich topological and geometric priors underlying characteristic classes.
|
Cody Tipton · Elizabeth Coda · Davis Brown · Alyson Bittner · Caitlin Hutten · Grayson Jorgenson · Tegan Emerson · Henry Kvinge 🔗 |
-
|
How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks
(
Poster
)
>
link
Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality. |
Rahul Ramesh · Mikail Khona · Robert Dick · Hidenori Tanaka · Ekdeep S Lubana 🔗 |
-
|
Structure-wise Uncertainty for Curvilinear Image Segmentation
(
Poster
)
>
link
Segmenting curvilinear structures like blood vessels and roads poses significant challenges due to their intricate geometry and weak signals. To expedite large-scale annotation, it is essential to adopt semi-automatic methods such as proofreading by human experts. In this abstract, we focus on estimating uncertainty for such tasks, so that highly uncertain, and thus error-prone structures can be identified for human annotators to verify. Unlike prior work that generates pixel-wise uncertainty maps, we believe it is essential to measure uncertainty in the units of topological structures, e.g., small pieces of connections and branches. To realize this, we employ tools from topological data analysis, specifically discrete Morse theory (DMT), to first extract the structures and then reason about their uncertainties. On multiple 2D and 3D datasets, our methodology generates superior structure-wise uncertainty maps compared to existing models. |
Saumya Gupta · Xiaoling Hu · Chao Chen 🔗 |
-
|
On the Varied Faces of Overparameterization in Supervised and Self-Supervised Learning
(
Poster
)
>
link
The quality of the representations learned by neural networks depends on several factors, including the loss function, learning algorithm, and model architecture. In this work, we use information geometric measures to assess the representation quality in a principled manner. We demonstrate that the sensitivity of learned representations to input perturbations, measured by the spectral norm of the feature Jacobian, provides valuable information about downstream generalization. On the other hand, measuring the coefficient of spectral decay observed in the eigenspectrum of feature covariance provides insights into the global representation geometry. First, we empirically establish an equivalence between these notions of representation quality and show that they are inversely correlated. Second, our analysis reveals the varying roles that overparameterization plays in improving generalization. Unlike supervised learning, we observe that increasing model width leads to higher discriminability and less smoothness in the self-supervised regime.Furthermore, we report that there is no observable double descent phenomenon in SSL with non-contrastive objectives for commonly used parameterization regimes, which opens up new opportunities for tight asymptotic analysis. Taken together, our results provide a loss-aware characterization of the different role of overparameterization in supervised and self-supervised learning. |
Matteo Gamba · Arna Ghosh · Kumar Krishna Agrawal · Blake Richards · Hossein Azizpour · Mårten Björkman 🔗 |
-
|
Geometric Epitope and Paratope Prediction
(
Poster
)
>
link
Antibody-antigen interactions play a crucial role in identifying and neutralizing harmful foreign molecules. In this paper, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information. Specifically, we compare different geometric deep learning methods applied to proteins’ inner (I-GEP) and outer (O-GEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that surface-based models are more efficient than other methods, and our O-GEP experiments have achieved state-of-the-art results with significant performance improvements. |
Marco Pegoraro · Clémentine Dominé · Emanuele Rodolà · Petar Veličković · Andreea-Ioana Deac 🔗 |
-
|
RelWire: Metric Based Graph Rewiring
(
Poster
)
>
link
Oversquashing is a major hurdle to the application of geometric deep learning and graph neural networks to real applications. Recent work has found connections between oversquashing and commute times, effective resistance, and the eigengap of the underlying graph. Graph rewiring is the most promising technique to alleviate this issue. Some prior work adds edges locally to highly negatively curved subgraphs. These local changes, however, have a small effect on global statistics such as commute times and the eigengap. Other prior work uses the spectrum of the graph Laplacian to target rewiring to increase the eigengap. These approaches, however, make large structural and topological changes to the underlying graph. We use ideas from geometric group theory to present \textsc{RelWire}, a rewiring technique based on the geometry of the graph. We derive topological connections for \textsc{RelWire}. We then rewire different real world molecule datasets and show that \textsc{RelWire} is Pareto optimal: it has the best balance between improvement in eigengap and commute times and minimizing changes in the topology of the underlying graph. |
Rishi Sonthalia · Anna Gilbert · Matthew Durham 🔗 |
-
|
Sheaf-based Positional Encodings for Graph Neural Networks
(
Poster
)
>
link
Graph Neural Networks (GNNs) work directly with graph-structured data, capitalising on relational information among entities. One limitation of GNNs is their reliance on local interactions among connected nodes. GNNs may generate identical node embeddings for similar local neighbourhoods and fail to distinguish structurally distinct graphs. Positional encodings help to break the locality constraint by informing the nodes of their global positions in the graph. Furthermore, they are required by Graph Transformers to encode structural information. However, existing positional encodings based on the graph Laplacian only encode structural information and are typically fixed. To address these limitations, we propose a novel approach to design positional encodings using sheaf theory. The sheaf Laplacian can be learnt from node data, allowing it to encode both the structure and semantic information. We present two methodologies for creating sheaf-based positional encodings, showcasing their efficacy in node and graph tasks. Our work advances the integration of sheaves in graph learning, paving the way for innovative GNN techniques that draw inspiration from geometry and topology. |
Yu He · Cristian Bodnar · Pietro Lió 🔗 |
-
|
Structural Similarities Between Language Models and Neural Response Measurements
(
Poster
)
>
link
Large language models have complicated internal dynamics, but induce representations of words and phrases whose geometry we can study. Human language processing is also opaque, but neural response measurements can provide (noisy) recordings of activations during listening or reading, from which we can extract similar representations of words and phrases. Here we study the extent to which the geometries induced by these representations, share similarities in the context of brain decoding. We find that the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging. |
Jiaang Li · Antonia Karamolegkou · Yova Kementchedjhieva · Mostafa Abdou · Sune Lehmann · Anders Søgaard 🔗 |
-
|
INRFormer: Neuron Permutation Equivariant Transformer on Implicit Neural Representations
(
Poster
)
>
link
Implicit Neural Representations (INRs) have demonstrated both precision in continuous data representation and compactness in encapsulating high-dimensional data. Yet, much of contemporary research remains centered on data reconstruction using INRs, with limited exploration into processing INRs themselves. In this paper, we endeavor to develop a model tailored to process INRs explicitly for computer vision tasks. We conceptualize INRs as computational graphs with neurons as nodes and weights as edges. To process INR graphs, we propose INRFormer consisting of the node blocks and the edge blocks alternatively. Within the node block, we further propose SlidingLayerAttention (SLA), which performs attention on nodes of three sequential INR layers. This sliding mechanism of the SLA across INR layers enables each layer's nodes to access a broader scope of the entire graph's information. In terms of the edge block, every edge's feature vector gets concatenated with the features of its two linked nodes, followed by a projection via an MLP. Ultimately, we formulate the visual recognition as INR-to-INR (inr2inr) translations. That is, INRFormer transforms the input INR, which maps coordinates to image pixels, to a target INR, which maps the coordinates to the labels. We demonstrate INRFormer on CIFAR10. |
Lei Zhou · Varun Belagali · Joseph Bae · Prateek Prasanna · Dimitris Samaras 🔗 |
-
|
From Charts to Atlas: Merging Latent Spaces into One
(
Poster
)
>
link
Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces.We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information.To this end, we introduce Relative Latent Space Aggregation (RLSA), a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an end-to-end model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by task-specific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging. |
Donato Crisostomi · Irene Cannistraci · Luca Moschella · Pietro Barbiero · Marco Ciccone · Pietro Lió · Emanuele Rodolà 🔗 |
-
|
Growing Brains in Recurrent Neural Networks for Multiple Cognitive Tasks
(
Poster
)
>
link
Recurrent neural networks (RNNs) trained on a diverse ensemble of cognitive tasks, as described by Yang et al (2019); Khona et al. (2023), have been shown to exhibit functional modularity, where neurons organize into discrete functional clusters, each specialized for specific shared computational subtasks. However, these RNNs do not demonstrate anatomical modularity, where these functionally specialized clusters also have a distinct spatial organization. This contrasts with the human brain which has both functional and anatomical modularity. Is there a way to train RNNs to make them more like brains in this regard? We apply a recent machine learning method, brain-inspired modular training (BIMT), to encourage neural connectivity to be local in space. Consequently, hidden neuron organization of the RNN forms spatial structures reminiscent of those of the brain: spatial clusters which correspond to functional clusters. Compared to standard $L_1$ regularization and absence of regularization, BIMT exhibits superior performance by optimally balancing between task performance and sparsity. This balance is quantified both in terms of the number of active neurons and the cumulative wiring length. In addition to achieving brain-like organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.
|
Ziming Liu · Mikail Khona · Ila Fiete · Max Tegmark 🔗 |
-
|
Are “Hierarchical” Visual Representations Hierarchical?
(
Poster
)
>
link
Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical'' visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at https://github.com/ethanlshen/HierNet. |
Ethan Shen · Ali Farhadi · Aditya Kusupati 🔗 |
-
|
Homological Convolutional Neural Networks
(
Poster
)
>
link
Deep learning methods have demonstrated outstanding performances on classification and regression tasks on homogeneous data types (e.g., image, audio, and text data). However, tabular data still pose a challenge, with classic machine learning approaches being often computationally cheaper and equally effective than increasingly complex deep learning architectures. The challenge arises from the fact that, in tabular data, the correlation among features is weaker than the one from spatial or semantic relationships in images or natural language, and the dependency structures need to be modeled without any prior information. In this work, we propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations to gain relational information from sparse tabular inputs. The resulting model leverages the power of convolution and is centered on a limited number of concepts from network topology to guarantee: (i) a data-centric and deterministic building pipeline; (ii) a high level of interpretability over the inference process; and (iii) an adequate room for scalability. We test our model on $18$ benchmark datasets against $5$ classic machine learning and $3$ deep learning models, demonstrating that our approach reaches state-of-the-art performances on these challenging datasets. The code to reproduce all our experiments is provided at https://github.com/FinancialComputingUCL/HomologicalCNN.
|
Antonio Briola · Yuanrong Wang · Silvia Bartolucci · Tomaso Aste 🔗 |
-
|
Visual Scene Representation with Hierarchical Equivariant Sparse Coding
(
Poster
)
>
link
We propose a hierarchical neural network architecture for unsupervised learning of equivariant part-whole decompositions of visual scenes. In contrast to the global equivariance of group-equivariant networks, the proposed architecture exhibits equivariance to part-whole transformations throughout the hierarchy, which we term hierarchical equivariance. The model achieves such internal representations via hierarchical Bayesian inference, which gives rise to rich bottom-up, top-down, and lateral information flows, hypothesized to underlie the mechanisms of perceptual inference in visual cortex. We demonstrate these useful properties of the model on a simple dataset of scenes with multiple objects under independent rotations and translations. |
Christian A Shewmake · Domas Buracas · Hansen Lillemark · Jinho Shin · Erik Bekkers · Nina Miolane · Bruno Olshausen 🔗 |
-
|
Symmetry-based Learning of Radiance Fields for Rigid Objects
(
Poster
)
>
link
In this work, we present SymObjectRF, a symmetry-based method that learns object-centric representations for rigid objects from one dynamic scene without hand-crafted annotations. SymObjectRF learns the appearance and surface geometry of all dynamic object in their canonical poses and represents individual object within its canonical pose using a canonical object field (COF). SymObjectRF imposes group equivariance on rendering pipeline by transforming 3D point samples from world coordinate to object canonical poses. Subsequently, a permutation-invariant compositional renderer combines the color and density values queried from the learned COFs and reconstructs the input scene via volume rendering. SymObjectRF is then optimized by minimizing scene reconstruction loss. We show the feasibility of SymObjectRF in learning object-centric representations both theoretically and empirically. |
Zhiwei Han · Stefan Matthes · Hao Shen · Yuanting Liu 🔗 |
-
|
Decorrelating neurons using persistence
(
Poster
)
>
link
We propose a novel way to regularise deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We explore their efficacy by performing a set of proof-of-concept experiments, for which our new regularisation terms outperform some popular ones. We demonstrate that, in these experiments, naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms. This suggests that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression. |
Rubén Ballester · Carles Casacuberta · Sergio Escalera 🔗 |
-
|
Scalar Invariant Networks with Zero Bias
(
Poster
)
>
link
Just like weights, bias terms are learnable parameters in many popular machine learning models, including neural networks. Biases are believed to enhance the representational power of neural networks, enabling them to tackle various tasks in computer vision. Nevertheless, we argue that biases can be disregarded for some image-related tasks such as image classification, by considering the intrinsic distribution of images in the input space and desired model properties from first principles. Our empirical results suggest that zero-bias neural networks can perform comparably to normal networks for practical image classification tasks. Furthermore, we demonstrate that zero-bias neural networks possess a valuable property known as scalar (multiplicative) invariance. This implies that the network's predictions remain unchanged even when the contrast of the input image is altered. We further extend the scalar invariance property to more general cases, thereby attaining robustness within specific convex regions of the input space. We believe dropping bias terms can be considered as a geometric prior when designing neural network architecture for image classification, which shares the spirit of adapting convolutions as the translational invariance prior. |
Chuqin Geng · Xiaojie Xu · Haolin Ye · Xujie Si 🔗 |
-
|
Fast Temporal Wavelet Graph Neural Networks
(
Poster
)
>
link
Spatio-temporal signals forecasting plays an important role in numerous domains, especially in neuroscience and transportation. The task is challenging due to the highly intricate spatial structure, as well as the non-linear temporal dynamics of the network. To facilitate reliable and timely forecast for the human brain and traffic networks, we propose the Fast Temporal Wavelet Graph Neural Networks (FTWGNN) that is both time- and memory-efficient for learning tasks on timeseries data with the underlying graph structure, thanks to the theories of multiresolution analysis and wavelet theory on discrete spaces. We employ Multiresolution Matrix Factorization (MMF) (Kondor et al., 2014) to factorize the highly dense graph structure and compute the corresponding sparse wavelet basis that allows us to construct fast wavelet convolution as the backbone of our novel architecture. Experimental results on real-world PEMS-BAY, METR-LA traffic datasets and AJILE12 ECoG dataset show that FTWGNN is competitive with the state-of-the-arts while maintaining a low computational footprint. Our PyTorch implementation is publicly available at https://github.com/HySonLab/TWGNN |
Duc Thien Nguyen · Tuan Nguyen · Truong Son Hy · Risi Kondor 🔗 |
-
|
Manifold-augmented Eikonal Equations: Geodesic Distances and Flows on Differentiable Manifolds.
(
Poster
)
>
link
Manifolds discovered by machine learning models provide a compact representation of the underlying data. Geodesics on these manifolds define locally length-minimising curves and provide a notion of distance, which are key for reduced-order modelling, statistical inference, and interpolation. In this work, we propose a model-based parameterisation for distance fields and geodesic flows on manifolds, exploiting solutions of a manifold-augmented Eikonal equation. We demonstrate how the geometry of the manifold impacts the distance field, and exploit the geodesic flow to obtain globally length-minimising curves directly. This work opens opportunities for statistics and reduced-order modelling on differentiable manifolds. |
Daniel Kelshaw · Luca Magri 🔗 |
-
|
Pitfalls in Measuring Neural Transferability
(
Poster
)
>
link
Transferability scores quantify the aptness of the pre-trained models for a downstream task and help in selecting an optimal pre-trained model for transfer learning. This work aims to draw attention to the significant shortcomings of state-of-the-art transferability scores. To this aim, we propose neural collapse-based transferability scores that analyse intra-class variability collapse and inter-class discriminative ability of the penultimate embedding space of a pre-trained model. The experimentation across the image and audio domains demonstrates that such a simple variability analysis of the feature space is more than enough to satisfy the current definition of transferability scores, and there is a requirement for a new generic definition of transferability. Further, building on these results, we highlight new research directions and postulate characteristics of an ideal transferability measure that will be helpful in streamlining future studies targeting this problem. |
Suryaka Suresh · Vinayak Abrol · Anshul Thakur 🔗 |
-
|
Random Field Augmentations for Self-Supervised Representation Learning
(
Poster
)
>
link
Self-supervised representation learning is heavily dependent on data augmentations to specify the invariances encoded in representations. Previous work has shown that applying diverse data augmentations is crucial to downstream performance, but augmentation techniques remain under-explored. In this work, we propose a new family of local transformations based on Gaussian random fields to generate image augmentations for self-supervised representation learning. These transformations generalize the well-established affine and color transformations (translation, rotation, color jitter, etc.) and greatly increase the space of augmentations by allowing transformation parameter values to vary from pixel to pixel. The parameters are treated as continuous functions of spatial coordinates, and modeled as independent Gaussian random fields. Empirical results show the effectiveness of the new transformations for self-supervised representation learning. Specifically, we achieve a 1.7% top-1 accuracy improvement over baseline on ImageNet downstream classification, and a 3.6% improvement on out-of-distribution iNaturalist downstream classification. However, due to the flexibility of the new transformations, learned representations are sensitive to hyperparameters. While mild transformations improve representations, we observe that strong transformations can degrade the structure of an image, indicating that balancing the diversity and strength of augmentations is important for improving generalization of learned representations. |
Philip Mansfield · Arash Afkanpour · Warren Morningstar · Karan Singhal 🔗 |
-
|
Changes in the geometry of hippocampal representations across brain states
(
Poster
)
>
link
The hippocampus (HPC) is a key structure of the brain's capacity to learn and generalize. One pervasive phenomenon in the brain, but missing in AI, is the presence of different gross brain states. It is known that these different brain states give rise to diverse modes of information processing that are imperative for hippocampus to learn and function, but the mechanisms by which they do so remain unknown. To study this, we harnessed the power of recently developed dimensionality reduction techniques to shed insight on how HPC representations change across brain states. We compared the geometry of HPC neuronal representations when rodents learn to generalize across different environments, and showed that HPC representation could support both pattern separation and generalization. Next, we compared HPC activity during different stages of sleep. Consistent with the literature, we found a robust recapitulation of the previous awake experience during non rapid eye movement sleep (NREM). But interestingly, such geometric correspondence to previous awake experience was not observed during rapid eye movement sleep (REM), suggesting a very different mode of information processing. This is the first known report of UMAP analysis on hippocampal neuronal data during REM sleep. We propose that characterizing and contrasting the geometry of hippocampal representations during different brain states can help understand the brain's mechanisms for learning, and in the future, can even help design next generation of AI that learn and generalize better. |
Wannan Yang · Chen Sun · Gyorgy Buzsaki 🔗 |
-
|
Entropy-MCMC: Sampling from Flat Basins with Ease
(
Poster
)
>
link
Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performances. Given a practical budget, sampling from the original posterior can lead to suboptimal performances, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias the sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration and out-of-distribution detection. |
Bolian Li · Ruqi Zhang 🔗 |
-
|
Roto-translation Equivariant YOLO for Aerial Images
(
Poster
)
>
link
This work introduces Eq-YOLO, an Equivariant One-Stage Object Detector based on YOLO-v8 incorporating group convolutions to handle rotational transformations. We show the interest of using equivariant-transforms to improve the detection performance on rotated data over the regular YOLO-v8 model while dividing the number of parameters to train by a factor greater than three. |
Benjamin Maurel · Samy Blusseau · Santiago Velasco-Forero · Teodora Petrisor 🔗 |
-
|
Full-dimensional Characterisation of Time-Warped Spike-Time Stimulus-Response Distribution Geometries
(
Poster
)
>
link
Characterising the representation of sensory stimuli in the brain is a fundamental scientific endeavor, which can illuminate principles of information coding. Most characterizations reduce the dimensionality of neural data by converting spike trains to firing rates or binned spike counts, applying explicitly named methods of "dimensionality reduction", or collapsing trial-to-trial variability. Characterisation of the full-dimensional geometry of timing-based representations may provide unexpected insights into how complex high-dimensional information is encoded. Recent research shows that the distribution of representations elicited over trials of a single stimulus can be geometrically characterized without the application of dimensionality reduction, maintaining the temporal spiking information of individual neurons in a cell assembly and illuminating rich geometric structure. We extend these results, showing that precise spike time patterns for larger cell assemblies are time-warped (i.e. stretched or compressed) on each trial. Moreover, by geometrically characterizing distributions of large spike time patterns, our analysis supports the hypothesis that the degree to which a spike time pattern is time-warped depends on the cortical area's background activity level on a single trial. Finally, we suggest that the proliferation of large electrophysiology datasets and the increasing concentration of "neural geometrists", creates ideal conditions for characterization of full-dimensional spike time representations, in complement to dimensionality reduction approaches. |
James Isbister 🔗 |
-
|
Emergence of Latent Binary Encoding in Deep Neural Network Classifiers
(
Poster
)
>
link
We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers.Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF).We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.
|
Luigi Sbailò · Luca Ghiringhelli 🔗 |
-
|
Testing Assumptions Underlying a Unified Theory for the Origin of Grid Cells
(
Poster
)
>
link
Representing and reasoning about physical space is fundamental to animal survival, and the mammalian lineage expresses a wealth of specialized neural representations that encode space. Grid cells, whose discovery earned a Nobel prize, are a striking example: a grid cell is a neuron that fires if and only if the animal is spatially located at the vertices of a regular triangular lattice that tiles all explored two-dimensional environments. Significant theoretical work has gone into understanding why mammals have learned these particular representations, and recent work has proposed a ``unified theory for the computational and mechanistic origin of grid cells," claiming to answer why the mammalian lineage has learned grid cells. However, the Unified Theory makes a series of highly specific assumptions about the target readouts of grid cells - putatively place cells. In this work, we explicitly identify what these mathematical assumptions are, then test two of the critical assumptions using biological place cell data. At both the population and single-cell levels, we find evidence suggesting that neither of the assumptions are likely true in biological neural representations. These results call the Unified Theory into question, suggesting that biological grid cells likely have a different origin than those obtained in trained artificial neural networks. |
Rylan Schaeffer · Mikail Khona · Adrian Bertagnoli · Sanmi Koyejo · Ila Fiete 🔗 |
-
|
SO(3)-Equivariant Representation Learning in 2D Images
(
Poster
)
>
link
Imaging physical objects that are free to rotate and translate in 3D is challenging. Whilean object’s pose and location do not change its nature, varying them presents problemsfor current vision models. Equivariant models account for these nuisance transformations,but current architectures only model either 2D transformations of 2D signals or 3D trans-formations of 3D signals. Here, we propose a novel convolutional layer consisting of 2Dprojections of 3D filters that models 3D equivariances of 2D signals—critical for capturingthe full space of spatial transformations of objects in imaging domains such as cryo-EM. Weadditionally present methods for aggregating our rotation-specific outputs. We demonstrate improvement on several tasks, including particle picking and pose estimation. |
Darnell Granberry · Alireza Nasiri · Jiayi Shou · Alex J. Noble · Tristan Bepler 🔗 |
-
|
Self-Supervised Latent Symmetry Discovery via Class-Pose Decomposition
(
Poster
)
>
link
In this paper, we explore the discovery of latent symmetries of data in a self-supervised manner. By considering sequences of observations undergoing uniform motion, we can extract a shared group transformation from the latent observations. In contrast to previous work, we utilize a latent space in which the group and orbit component are decomposed. We show that this construction facilitates more accurate identification of the properties of the underlying group, which consequently results in an improved performance on a set of sequential prediction tasks. |
Gustaf Tegnér · Hedvig Kjellstrom 🔗 |
-
|
Discovering Latent Causes and Memory Modification: A Computational Approach Using Symmetry and Geometry
(
Poster
)
>
link
We learn from our experiences, even though they are never exactly the same. This implies that we need to assess their similarity to apply what we have learned from one experience to another. It is proposed that we “cluster” our experiences based on hidden latent causes that we infer. It is also suggested that surprises, which occur when our predictions are incorrect, help us categorize our experiences into distinct groups. In this paper, we develop a computational theory that emulates these processes based on two basic concepts from intuitive physics and Gestalt psychology using symmetry and geometry. We apply our approach to simple tasks that involve inductive reasoning. Remarkably, the output of our computational approach aligns closely with human responses. |
Arif Dönmez 🔗 |
-
|
On the Information Geometry of Vision Transformers
(
Poster
)
>
link
Understanding the structure of high-dimensional representations learned by Vision Transformers (ViTs) provides a pathway toward developing a mechanistic understanding and further improving architecture design. In this work, we leverage tools from informationgeometry to characterize representation quality at a per-token (intra-token) level as well as across pairs of tokens (inter-token) in ViTs pretrained for object classification. In particular, we observe that these high-dimensional tokens exhibit a characteristic spectral decay inthe feature covariance matrix. By measuring the rate of this decay (denoted by $\alpha$) for each token across transformer blocks, we discover an $\alpha$ signature, indicative of a transition from lower to higher effective dimensionality. We also demonstrate that tokens can be clustered based on their $\alpha$ signature, revealing that tokens corresponding to nearby spatial patches of the original image exhibit similar $\alpha$ trajectories. Furthermore, for measuring the complexity at the sequence level, we aggregate the correlation between pairs of tokens independently at each transformer block. A higher average correlation indicates a significant overlap between token representations and lower effective complexity. Notably, we observe a U-shaped trend across the model hierarchy, suggesting that token representations are more expressive in the intermediate blocks. Our findings provide a framework for understanding information processing in ViTs while providing tools to prune/merge tokens across blocks, thereby making the architectures more efficient.
|
Sonia Joseph · Kumar Krishna Agrawal · Arna Ghosh · Blake Richards 🔗 |
-
|
The Variability of Representations in Mice and Humans Changes with Learning, Engagement, and Attention
(
Poster
)
>
link
In responding to a visual stimulus, cortical neurons exhibit a high degree of variability, and this variability can be correlated across neurons. In this study, we use recordings from both mice and humans to systematically characterize how the variability in the representation of visual stimuli changes with learning, engagement and attention. We observe that in mice, familiarization with a set of images over many weeks reduces the variability of responses, but does not change its shape. Further, switching from passive to active task engagement changes the overall shape by shrinking the neural variability only along the task-relevant direction, leading to a higher signal-to-noise ratio. In a selective attention task in humans wherein multiple distributions are compared, a higher signal-to-noise ratio is obtained via a different mechanism, by mainly increasing the signal of the attended category. These findings show that representation variability can be adjusted with task needs. A potential speculative role for variability, consistent with these findings, is that it helps generalization. |
Praveen Venkatesh · Corbett Bennett · Sam Gale · Juri Minxha · Hristos Courellis · Greggory Heller · Tamina Ramirez · Severine Durand · Ueli Rutishauser · Shawn Olsen · Stefan Mihalas
|
-
|
Explicit Neural Surfaces: Learning Continuous Geometry with Deformation Fields
(
Poster
)
>
link
We introduce Explicit Neural Surfaces (ENS), an efficient smooth surface representation that directly encodes topology with a deformation field from a known base domain. We apply this representation to reconstruct explicit surfaces from multiple views, where we use a series of neural deformation fields to progressively transform the base domain into a target shape. By using meshes as discrete surface proxies, we train the deformation fields through efficient differentiable rasterization. Using a fixed base domain allows us to have Laplace-Beltrami eigenfunctions as an intrinsic positional encoding alongside standard extrinsic Fourier features, with which our approach can capture fine surface details. Compared to implicit surfaces, ENS trains faster and has several orders of magnitude faster inference times. The explicit nature of our approach also allows higher-quality mesh extraction whilst maintaining competitive surface reconstruction performance and real-time capabilities. |
Thomas Walker · Octave Mariotti · Amir Vaxman · Hakan Bilen 🔗 |
-
|
Symmetric Models for Radar Response Modeling
(
Poster
)
>
link
Many radar applications require complex radar signature models that incorporate characteristics of an object's shape and dynamics as well as sensing effects. Even though high-fidelity, first-principles radar simulators are available, they tend to be resource-intensive and do not easily support the requirements of agile and large-scale AI development and evaluation frameworks. Deep learning represents an attractive alternative to these numerical methods, but can have large data requirements and limited generalization ability. In this work, we present the Radar Equivariant Model (REM), the first $SO(3)$-equivaraint model for predicting radar responses from object meshes. By constraining our model to the symmetries inherent to radar sensing, REM is able to achieve a high level reconstruction of signals generated by a first-principles radar model and shows improved performance and sample efficiency over other encoder-decoder models.
|
Colin Kohler · Nathan Vaska · Ramya Muthukrishnan · Whangbong Choi · Jung Yeon Park · Justin Goodwin · Rajmonda Caceres · Robin Walters 🔗 |
-
|
The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry
(
Poster
)
>
link
Extensive work has demonstrated that equivariant neural networks can significantly improve sample efficiency and generalization by enforcing an inductive bias in the network architecture. These applications typically assume that the domain symmetry is fully described by explicit transformations of the model inputs and outputs. However, many real-life applications contain only latent or partial symmetries which cannot be easily described by simple transformations of the input. In these cases, it is necessary to \emph{learn} symmetry in the environment instead of imposing it mathematically on the network architecture. We discover, surprisingly, that imposing equivariance constraints that do not exactly match the domain symmetry is very helpful in learning the true symmetry in the environment. We differentiate between \emph{extrinsic} and \emph{incorrect} symmetry constraints and show that while imposing incorrect symmetry can impede the model's performance, imposing extrinsic symmetry can actually improve performance. We demonstrate that an equivariant model can significantly outperform non-equivariant methods on domains with latent symmetries. |
Dian Wang · Jung Yeon Park · Neel Sortur · Lawson Wong · Robin Walters · Robert Platt 🔗 |
-
|
Large language models partially converge toward human-like concept organization
(
Poster
)
>
link
Large language models show human-like performance in knowledge extraction, reasoning and dialogue, but it remains controversial whether this performance is best explained by memorization and pattern matching, or whether it reflects human-like inferential semantics and world knowledge. Knowledge bases such as WikiData provide large-scale, high-quality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more human-like concept organization, across four families of language models and three knowledge graph embeddings. |
Jonathan Gabel Christiansen · Mathias Gammelgaard · Anders Søgaard 🔗 |
-
|
Cayley Graph Propagation
(
Poster
)
>
link
In spite of the plethora of success stories with graph neural networks (GNNs) on modelling graph-structured data, they are notoriously vulnerable to tasks which necessitate mixing of information between distant pairs of nodes, especially in the presence of bottlenecks in the graph. For this reason, a significant body of research has dedicated itself to discovering or pre-computing graph structures which ameliorate such bottlenecks. Bottleneck-free graphs are well-known in the mathematical community as *expander graphs*, with prior work—Expander Graph Propagation (EGP)—proposing the use of a well-known expander graph family—the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group—as a computational template for GNNs. However, despite its solid theoretical grounding, the actual computational graphs used by EGP are *truncated* Cayley graphs, which causes them to lose expansion properties. In this work, we propose to use the full Cayley graph within EGP, recovering significant improvements on datasets from the Open Graph Benchmark (OGB). Our empirical evidence suggests that the retention of the nodes in the expander graph can provide benefit for graph representation learning, which may provide valuable insight for future models.
|
Joseph Wilson · Petar Veličković 🔗 |
-
|
Curvature Fields from Shading Fields
(
Poster
)
>
link
We re-examine the estimation of 3D shape from images that are caused by shading of diffuse Lambertian surfaces. We propose a neural model that is motivated by the well-documented perceptual effect in which shape is perceived from shading without a precise perception of lighting. Our model operates independently in each receptive field and produces a scalar statistic of surface curvature for that field. The model’s architecture builds on previous mathematical analyses of lighting-invariant shape constraints, and it leverages geometric structure to provide equivariance under image rotations and translations. Applying our model in parallel across a dense set of receptive fields produces a curvature field that we show is quite stable under changes to a surface’s albedo pattern (texture) and also to changes in lighting, even when lighting varies spatially across the surface. |
Xinran Han · Todd Zickler 🔗 |
-
|
A Comparison of Equivariant Vision Models with ImageNet Pre-training
(
Poster
)
>
link
Neural networks pre-trained on large datasets provide useful embeddings for downstream tasks and allow researchers to iterate with less compute. For computer vision tasks, ImageNet pre-trained models can be easily downloaded for fine-tuning.However, no such pre-trained models are available that are equivariant to image transformations. In this work, we implement several equivariant versionsof the residual network architecture and publicly release the weights aftertraining on ImageNet. Additionally, we perform a comparison of enforced vs.learned equivariance in the largest data regime to date. |
David Klee · Jung Yeon Park · Robert Platt · Robin Walters 🔗 |
-
|
Almost Equivariance via Lie Algebra Convolutions
(
Poster
)
>
link
Recently, the $\textit{equivariance}$ of models with respect to a group action hasbecome an important topic of research in machine learning. Analysis of the built-in equivariance ofexisting neural network architectures, as well as the study of methods for building model architectures that explicitly ``bake in'' equivariance, have become significant research areas in their own right.However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, suchas those due to rotations or translations, real-world data does not always follow such strict equivariances,be it due to noise in the data or underlying physical laws that encode only approximate or partial symmetries.In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform on real-world data. Therefore, in this work we study a closely related topic, that of $\textit{almost equivariance}$. We give a practical method for encodingalmost equivariance in models by appealing to the Lie algebra of a Lie group and defining $\textit{Lie algebra convolutions}$.We demonstrate that Lie algebra convolutions offer several benefits over Lie group convolutions, including being computationally tractable and well-defined for non-compact groups.Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.
|
Daniel McNeela 🔗 |
-
|
Deep Ridgelet Transform: Voice with Koopman Operator Constructively Proves Universality of Formal Deep Networks
(
Poster
)
>
link
We identify hidden layers inside a deep neural network (DNN) with group actions on the data domain, and formulate a formal deep network as a dual voice transform with respect to the Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of DNNs. |
Sho Sonoda · Yuka Hashimoto · Isao Ishikawa · Masahiro Ikeda 🔗 |
-
|
Learning Symmetrization for Equivariance with Orbit Distance Minimization
(
Poster
)
>
link
We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method’s competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize. |
Dat Nguyen · Jinwoo Kim · Hongseok Yang · Seunghoon Hong 🔗 |
-
|
Algebraic Topological Networks via the Persistent Local Homology Sheaf
(
Poster
)
>
link
In this work, we introduce a novel approach based on algebraic topology to enhance graph convolution and attention modules by incorporating local topological properties of the data. To do so, we consider the framework of sheaf neural networks, which has been previously leveraged to incorporate additional structure into graph neural networks’ features and construct more expressive, non-isotropic messages. Specifically, given an input simplicial complex (e.g. generated by the cliques of a graph or the neighbors in a point cloud), we construct its local homology sheaf, which assigns to each node the vector space of its local homology. The intermediate features of our networks live in these vector spaces and we leverage the associated sheaf Laplacian to construct more complex linear messages between them. Moreover, we extend this approach by considering the persistent version of local homology associated with a weighted simplicial complex (e.g., built from pairwise distances of nodes embeddings). This i) solves the problem of the lack of a natural choice of basis for the local homology vector spaces and ii) makes the sheaf itself differentiable, which enables our models to directly optimize the topology of their intermediate features. |
Gabriele Cesa · Arash Behboodi 🔗 |
-
|
Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach
(
Poster
)
>
link
Lattice reduction is a combinatorial optimization problem aimed at finding the most orthogonal basis in a given lattice. In this work, we address lattice reduction via deep learning methods. We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases. We incorporate the symmetries of lattice reduction into the model by making it invariant and equivariant with respect to appropriate continuous and discrete groups. |
Giovanni Luca Marchetti · Gabriele Cesa · Kumar Pratik · Arash Behboodi 🔗 |
-
|
Opening Remarks
(
Opening Remarks
)
>
|
🔗 |
-
|
Opening Remarks
(
Opening Remarks
)
>
|
🔗 |