Workshop
Symmetry and Geometry in Neural Representations (NeurReps)
Sophia Sanborn · Christian A Shewmake · Simone Azeglio · Arianna Di Bernardo · Nina Miolane
Room 283  285
Sat 3 Dec, 6:15 a.m. PST
In recent years, there has been a growing appreciation for the importance of modeling the geometric structure in data — a perspective that has developed in both the geometric deep learning and applied geometry communities. In parallel, an emerging set of findings in neuroscience suggests that groupequivariance and the preservation of geometry and topology may be fundamental principles of neural coding in biology.
This workshop will bring together researchers from geometric deep learning and geometric statistics with theoretical and empirical neuroscientists whose work reveals the elegant implementation of geometric structure in biological neural circuitry. Group theory and geometry were instrumental in unifying models of fundamental forces and elementary particles in 20thcentury physics. Likewise, they have the potential to unify our understanding of how neural systems form useful representations of the world.
The goal of this workshop is to unify the emerging paradigm shifts towards structured representations in deep networks and the geometric modeling of neural data — while promoting a solid mathematical foundation in algebra, geometry, and topology.
Schedule
Sat 6:15 a.m.  6:30 a.m.

Opening Remarks
(
Opening remarks
)
>
SlidesLive Video 
Sophia Sanborn 🔗 
Sat 6:30 a.m.  7:00 a.m.

In search of invariance in brains and machines
(
Invited Talk
)
>
SlidesLive Video 
Bruno Olshausen 🔗 
Sat 7:00 a.m.  7:30 a.m.

SymmetryBased Representations for Artificial and Biological Intelligence
(
Invited Talk
)
>
SlidesLive Video 
Irina Higgins 🔗 
Sat 7:30 a.m.  8:00 a.m.

From Equivariance to Naturality
(
Invited Talk
)
>
SlidesLive Video 
Taco Cohen 🔗 
Sat 8:00 a.m.  8:30 a.m.

Coffee Break
(
Break
)
>

🔗 
Sat 8:30 a.m.  8:40 a.m.

Is the information geometry of probabilistic population codes learnable?
(
Cotributed Talk  Spotlight
)
>
link
SlidesLive Video One reason learning the geometry of latent neural manifolds from neural activity data is difficult is that the ground truth is generally not known, which can make manifold learning methods hard to evaluate. Probabilistic population codes (PPCs), a class of biologically plausible and selfconsistent models of neural populations that encode parametric probability distributions, may offer a theoretical setting where it is possible to rigorously study manifold learning. It is natural to define the neural manifold of a PPC as the statistical manifold of the encoded distribution, and we derive a mathematical result that the information geometry of the statistical manifold is directly related to measurable covariance matrices. This suggests a simple but rigorously justified decoding strategy based on principal component analysis, which we illustrate using an analytically tractable PPC. 
John Vastola · Zach Cohen · Jan Drugowitsch 🔗 
Sat 8:40 a.m.  8:50 a.m.

Computing Representations for Lie Algebraic Networks
(
Contributed Talk  Spotlight
)
>
link
SlidesLive Video Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit Lie group representations to derive the equivariant kernels and nonlinearities. We present three contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations with a novel algorithm that finds representations of arbitrary Lie groups given only the structure constants of the associated Lie algebra. Second, we provide a selfcontained method and software for building Lie groupequivariant neural networks using these representations. Third, we contribute a novel benchmark dataset for classifying objects from relativistic point clouds, and apply our methods to construct the first objecttracking model equivariant to the Poincaré group.Note to referees:This manuscript has been previously submitted to arxiv under a different title and has never been published in a conference or journal. This current submission includes several substantive revisions. The new title is intended to present a clearer description of the work. 
Noah Shutty · Casimir Wierzynski 🔗 
Sat 8:50 a.m.  9:00 a.m.

Kendall ShapeVAE : Learning Shapes in a Generative Framework
(
Contributed Talk  Spotlight
)
>
link
SlidesLive Video Learning an interpretable representation of data without supervision is an important precursor for the development of artificial intelligence. In this work, we introduce \textit{Kendall Shape}VAE, a novel Variational Autoencoder framework for learning shapes as it disentangles the latent space by compressing information to simpler geometric symbols. In \textit{Kendall Shape}VAE, we modify the Hyperspherical Variational Autoencoder such that it results in an exactly rotationally equivariant network using the notion of landmarks in the Kendall shape space. We show the exact equivariance of the model through experiments on rotated MNIST. 
Sharvaree Vadgama · Jakub Tomczak · Erik Bekkers 🔗 
Sat 9:00 a.m.  9:05 a.m.

Equivariance with Learned Canonical Mappings
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video Symmetrybased neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representation of the data. These canonical mappings can readily be plugged into nonequivariant backbone architectures. We offer explicit ways to implement them for many groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis is that learning a neural network to perform the canonicalization will perform better than doing it using predefined heuristics. Our results show that learning the canonical mappings indeed leads to better results and that the approach achieves great performance in practice. 
Oumar Kaba · Arnab Mondal · Yan Zhang · Yoshua Bengio · Siamak Ravanbakhsh 🔗 
Sat 9:05 a.m.  9:10 a.m.

Capacity of Groupinvariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video Equivariance has emerged as a desirable property of representations of objects subject to identitypreserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies the number of linearly separable and groupinvariant binary dichotomies that can be assigned to equivariant representations of objects. We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action. We show how this relation extends to operations such as convolutions, elementwise nonlinearities, and local pooling. While other operations do not change the fraction of separable dichotomies, local pooling decreases the fraction, despite being a highly nonlinear operation. Finally, we test our theory on intermediate representations of randomly initialized and fully trained convolutional neural networks and find perfect agreement. 
Matthew Farrell · Blake Bordelon · Shubhendu Trivedi · Cengiz Pehlevan 🔗 
Sat 9:10 a.m.  9:15 a.m.

Do Neural Networks Trained with Topological Features Learn Different Internal Representations?
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video There is a growing body of work that leverages features extracted via topological data analysis to train machine learning models. While this field, sometimes known as topological machine learning (TML), has seen some notable successes, an understanding of how the process of learning from topological features differs from the process of learning from raw data is still limited. In this work, we begin to address one component of this larger issue by asking whether a model trained with topological features learns internal representations of data that are fundamentally different than those learned by a model trained with the original raw data. To quantify "different", we exploit two popular metrics that can be used to measure the similarity of the hidden representations of data within neural networks, neural stitching and centered kernel alignment. From these we draw a range of conclusions about how training with topological features does and does not change the representations that a model learns. Perhaps unsurprisingly, we find that structurally, the hidden representations of models trained and evaluated on topological features differ substantially compared to those trained and evaluated on the corresponding raw data. On the other hand, our experiments show that in some cases, these representations can be reconciled (at least to the degree required to solve the corresponding task) using a simple affine transformation. We conjecture that this means that neural networks trained on raw data may extract some limited topological features in the process of making predictions. 
Sarah McGuire · Shane Jackson · Tegan Emerson · Henry Kvinge 🔗 
Sat 9:15 a.m.  9:20 a.m.

Expander Graph Propagation
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video Deploying graph neural networks (GNNs) on wholegraph classification or regression tasks is challenging, often requiring node features that are mindful of both local interactions and the graph global context. GNN architectures need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We provide an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. 
Andreea Deac · Marc Lackenby · Petar Veličković 🔗 
Sat 9:20 a.m.  9:25 a.m.

Homomorphism AutoEncoder  Learning Group Structured Representations from Observed Transitions
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video It is crucial for agents, both biological and artificial, to acquire world models that veridically represent the external world and how it is modified by the agent's own actions. We consider the case where such modifications can be modelled as transformations from a group of symmetries structuring the world state space. We use tools from representation learning and group theory to learn latent representations that account for both sensory information and the actions that alters it during interactions. We introduce the Homomorphism AutoEncoder (HAE), an autoencoder equipped with a learned group representation linearly acting on its latent space trained on 2step transitions to implicitly enforce the group homomorphism property on the action representation.Compared to existing work, our approach makes fewer assumptions on the group representation and on which transformations the agent can sample from. We motivate our method theoretically, and demonstrate empirically that it can learn the correct representation of the groups and the topology of the environment. We also compare its performance in trajectory prediction with previous methods. 
Hamza Keurti · HsiaoRu Pan · Michel Besserve · Benjamin F. Grewe · Bernhard Schölkopf 🔗 
Sat 9:25 a.m.  9:30 a.m.

Sheaf Attention Networks
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video Attention has become a central inductive bias for deep learning models irrespective of domain. However, increasing theoretical and empirical evidence suggests that Graph Attention Networks (GATs) suffer from the same pathological issues affecting many other Graph Neural Networks (GNNs). First, GAT's features tend to become progressively smoother as more layers are stacked, and second, the model performs poorly in heterophilic graphs. Sheaf Neural Networks (SNNs), a new class of models inspired by algebraic topology and geometry, have shown much promise in tackling these two issues. Building upon the recent success of SNNs and the wide adoption of attentionbased architectures, we propose Sheaf Attention Networks (SheafANs). By making use of a novel and more expressive attention mechanism equipped with geometric inductive biases, we show that this type of construction generalizes popular attentionbased GNN models to cellular sheaves. We demonstrate that these models help tackle the oversmoothing and heterophily problems and show that, in practice, SheafANs consistently outperform GAT on synthetic and realworld benchmarks. 
Federico Barbero · Cristian Bodnar · Haitz Sáez de Ocáriz Borde · Pietro Lió 🔗 
Sat 9:30 a.m.  9:35 a.m.

On the Expressive Power of Geometric Graph Neural Networks
(
Contributed Talk  Lightning
)
>
link
SlidesLive Video We propose a geometric version of the WeisfeilerLeman graph isomorphism test (GWL) for discriminating geometric graphs while respecting the underlying symmetries such as permutation, rotation, and translation.We use GWL to characterise the expressive power of Graph Neural Networks (GNNs) for geometric graphs and provide formal results for the following: (1) What geometric graphs can and cannot be distinguished by GNNs invariant or equivariant to spatial symmetries;(2) Equivariant GNNs are strictly more powerful than their invariant counterparts. 
Cristian Bodnar · Chaitanya K. Joshi · Simon Mathis · Taco Cohen · Pietro Liò 🔗 
Sat 9:35 a.m.  10:05 a.m.

Panel Discussion I: Geometric and topological principles for representation learning in ML
(
Discussion Panel
)
>
SlidesLive Video 
Irina Higgins · Taco Cohen · Erik Bekkers · Nina Miolane · Rose Yu 🔗 
Sat 10:05 a.m.  11:30 p.m.

Lunch Break
(
Break
)
>

🔗 
Sat 11:30 a.m.  12:00 p.m.

Generative models of nonEuclidean neural population dynamics
(
Invited Talk
)
>
SlidesLive Video 
Kristopher Jensen 🔗 
Sat 12:00 p.m.  12:30 p.m.

Robustness of representations in artificial and biological neural networks
(
Invited Talk
)
>
SlidesLive Video 
Gabriel Kreiman 🔗 
Sat 12:30 p.m.  1:00 p.m.

Neural Ideograms and Equivariant Representation Learning
(
Invited Talk
)
>
SlidesLive Video 
Erik Bekkers 🔗 
Sat 1:00 p.m.  1:30 p.m.

Panel Discussion II: Geometric and topological principles for representations in the brain
(
Discussion Panel
)
>
SlidesLive Video 
Bruno Olshausen · Kristopher Jensen · Gabriel Kreiman · Manu Madhav · Christian A Shewmake 🔗 
Sat 1:30 p.m.  3:00 p.m.

Poster Session
(
Poster Session
)
>

🔗 
Sat 2:55 p.m.  3:00 p.m.

Closing remarks
(
Closing remarks
)
>

🔗 


Exact Visualization of Deep Neural Network Geometry and Decision Boundary
(
Poster
)
>
link
Visualizing Deep Network (DN) geometry and decision boundaries remains a key challenge even today. In fact, despite the dire need for such methods e.g. to assess the quality of a trained model, to compare models, to interpret decisions, the community at large still relies on crude approximations and proxies. For example, computing the decision boundary of a model, say on a 2d slice of their input space, is done through gradient descent and sampling with dichotomy search. In this paper, we lean on the rich theory of Continuous PieceWise Linear (CPWL) DNs to provide, for the first time, a method that provably produces the exact geometry (CPWL partition) and decision boundary of any DN employing nonlinearities such as ReLU, LeakyReLU, and maxpooling. Using our method we are able to not only visualize the decision boundary but also obtain its spanning space, i.e., we can sample arbitrarily many inputs that provably lie on the model's decision boundary, up to numerical precision. We explore how such methods can be used to interpret architectural choices e.g. using convolutional architectures instead of fullyconnected neural networks. 
Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk 🔗 


Graph Neural Networks for Connectivity Inference in Spatially Patterned Neural Responses
(
Poster
)
>
link
A continuous attractor network is one of the most common theoretical framework for studying a wide range of neural computations in the brain. Many previous approaches have attempted to identify continuous attractor systems by investigating the statespace structure of population neural activity. However, establishing the patterns of connectivity for relating the structure of attractor networks to their function is still an open problem. In this work, we propose the use of graph neural networks combined with the structure learning for inferring the recurrent connectivity of a ring attractor network and demonstrate that the developed model greatly improves the quality of circuit inference as well as the prediction of neural responses compared to baseline inference algorithms. 
Taehoon Park · JuHyeon Kim · DongHee Kang · Kijung Yoon 🔗 


Objectcentric causal representation learning
(
Poster
)
>
link
There has been significant recent progress in causal representation learning that has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are $d$dimensional vectors, and (2) that the observations are the output of some injective observation function of these latent variables. While these assumptions appear benign–they amount to assuming that any changes in the latent space are reflected in the observation space, and that we can use standard encoders to infer the latent variables–we show that when the observations are of multiple objects, the observation function is no longer injective, and disentanglement fails in practice. We can address this failure by combining recent developments in objectcentric learning and causal representation learning. By modifying the Slot Attention architecture \citep{Locatello2020}, we develop an objectcentric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. We argue that this approach is more dataefficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space and, we show that this approach successfully disentangles the properties of a set of objects in a series of simple imagebased disentanglement experiments.

Amin Mansouri · Jason Hartford · Kartik Ahuja · Yoshua Bengio 🔗 


Equivariance with Learned Canonical Mappings
(
Poster
)
>
link
Symmetrybased neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representation of the data. These canonical mappings can readily be plugged into nonequivariant backbone architectures. We offer explicit ways to implement them for many groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis is that learning a neural network to perform the canonicalization will perform better than doing it using predefined heuristics. Our results show that learning the canonical mappings indeed leads to better results and that the approach achieves great performance in practice. 
Oumar Kaba · Arnab Mondal · Yan Zhang · Yoshua Bengio · Siamak Ravanbakhsh 🔗 


CategoryLevel 6D Object Pose Estimation in the Wild: A SemiSupervised Learning Approach and A New Dataset
(
Poster
)
>
link
6D object pose estimation is one of the fundamental problems in computer vision and robotics research. While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely categorylevel 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data. In this paper, we collect Wild6D, a new unlabeled RGBD object video dataset with diverse instances and backgrounds. We utilize this data to generalize categorylevel 6D object pose estimation in the wild with semisupervised learning. We propose a new model, called \textbf{Re}ndering for \textbf{Po}se estimation network (\textbf{RePoNet}), that is jointly trained using the free groundtruths with the synthetic data, and a silhouette matching objective function on the realworld data. Without using any 3D annotations on real data, our method outperforms stateoftheart methods on the previous dataset and our Wild6D test set (with manual annotations for evaluation) by a large margin. Our code and dataset will be made publicly available. 
Yanjie Ze · Xiaolong Wang 🔗 


Charting Flat Minima Using the Conserved Quantities of Gradient Flow
(
Poster
)
>
link
Empirical studies have revealed that many minima in the loss landscape of deep learning are connected and reside on a lowloss valley. Yet, little is known about the theoretical origin of these lowloss valleys. Ensemble models sampling different parts of a lowloss valley have reached stateoftheart performance. However, we lack theoretical ways to measure what portions of lowloss valleys are being explored during training. We address these two aspects of lowloss valleys using symmetries and conserved quantities. We show that continuous symmetries in the parameter space of neural networks can give rise to low loss valleys. We then show that conserved quantities associated with these symmetries can be used to define coordinates along lowloss valleys. These conserved quantities reveal that gradient flow only explores a small part of a lowloss valley. We use conserved quantities to explore other parts of the loss valley by proposing alternative initialization schemes. 
Bo Zhao · Iordan Ganev · Robin Walters · Rose Yu · Nima Dehmamy 🔗 


On the Ambiguity in Classification
(
Poster
)
>
link
We develop a theoretical framework for geometric deep learning that incorporates ambiguous data in learning tasks. This framework uncovers deep connections between noncommutative geometry and learning tasks. Namely, it turns out that learning tasks naturally arise from groupoids, and vice versa. We also find that learning tasks are closely linked to the geometry of its groupoid algebras. This point of view allows us to answer the question of what actually constitutes a classification problem and link unsupervised learning tasks to random walks on the second groupoid cohomology of its groupoid. 
Arif Dönmez 🔗 


Learning Generative Models with Invariance to Symmetries
(
Poster
)
>
link
While imbuing a model with invariance to symmetries can improve data efficiency and predictive performance, most methods require specialised architectures and thus prior knowledge of the symmetries. Unfortunately, we don't always know what symmetries are present in the data. Recent work has solved this problem by jointly learning the invariance (or the degree of invariance) with the model from the data alone. But, this work has focused on discriminative models. We describe a method for learning invariant generative models. We demonstrate that our method can learn a generative model of handwritten digits that is invariant to rotation. We hope this line of work will enable more dataefficient deep generative models. 
James Allingham · Javier Antorán · Shreyas Padhy · Eric Nalisnick · José Miguel HernándezLobato 🔗 


Geometry of interareal interactions in mouse visual cortex
(
Poster
)
>
link
The response of a set of neurons in an area is the result of the sensory input, the interaction of the neurons within the area as well as the long range interactions between areas. We aimed to study the relation between interactions among multiple areas, and if they are fixed or dynamic. The structural connectivity provides a substrate for these interactions, but anatomical connectivity is not known in sufficient detail and it only gives us a static picture. Using the Allen Brain Observatory Visual Coding Neuropixels dataset, which includes simultaneous recordings of spiking activity from up to 6 hierarchically organized mouse cortical visual areas, we estimate the functional connectivity between neurons using a linear model of responses to flashed static grating stimuli. We characterize functional connectivity between populations via interaction subspaces. We find that distinct subspaces of a source area mediate interactions with distinct target areas, supporting the notion that cortical areas use distinct channels to communicate. Most importantly, using a piecewise linear model for activity within each trial, we find that these interactions evolve dynamically over tens of milliseconds following a stimulus presentation. Interareal subspaces become more aligned with the intraareal subspaces during epochs in which a feedforward wave of activity propagates through visual cortical areas. When the shortterm dynamics are averaged over, we find that the interaction subspaces are stable over multiple stimulus blocks. These findings have important implications for understanding how information flows through biological neural networks composed of interconnected modules, each of which may have a distinct functional specialization. 
Ramakrishnan Iyer · Joshua H Siegle · Gayathri Mahalingam · Shawn Olsen · Stefan Mihalas 🔗 


Learning unfolded networks with a cyclic group structure
(
Poster
)
>
link
Deep neural networks lack straightforward ways to incorporate domain knowledge and are notoriously treated as black boxes. Prior works attempted to inject domain knowledge into architectures implicitly through data augmentation. Building on recent advances on equivariant neural networks, we propose networks that explicitly encode domain knowledge, speciﬁcally equivariance with respect to rotations. By using unfolded architectures, a rich framework that originated from sparse coding and has theoretical guarantees, we present interpretable networks with sparse activations. The equivariant unfolded networks compete favorably with baselines, with only a fraction of their parameters, as showcased on (rotated) MNIST and CIFAR10. 
Emmanouil Theodosis · Demba Ba 🔗 


Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings
(
Poster
)
>
link
Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives and are gaining more attention in machine learning applications. Imposing such quasimetric structures in model representations has been shown to improve many tasks, including reinforcement learning (RL) and causal relation learning. In this work, we present four desirable properties in such quasimetric models, and show how prior works fail at them. We propose Interval Quasimetric Embedding (IQE), which is designed to satisfy all four criteria. On three quasimetric learning experiments, IQEs show strong approximation and generalization abilities, leading to better performance and improved efficiency over prior methods. 
Tongzhou Wang · Phillip Isola 🔗 


Learning and Shaping Manifold Attractors for Computation in Gated Neural ODEs
(
Poster
)
>
link
Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. A particularly fruitful paradigm is computation via dynamical attractors, which is particularly relevant for computations requiring complex memory storage of continuous variables. We explore the interplay of attractor geometry and task structure in recurrent neural networks. Furthermore, we are interested in finding lowdimensional effective representations which enhance interpretability. To this end, we introduce gated neural ODEs (gnODEs) and probe their performance on a continuous memory task. The gnODEs combine the expressive power of neural ordinary differential equations (nODEs) with the trainability conferred by gating interactions. We also discover that an emergent property of the gating interaction is an inductive bias for learning (approximate) continuous (manifold) attractor solutions, necessary to solve the continuous memory task. Finally, we show how reduceddimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the manifold attractor geometry. 
Timothy Kim · Tankut Can · Kamesh Krishnamurthy 🔗 


See and Copy: Generation of complex compositional movements from modular and geometric RNN representations
(
Poster
)
>
link
A hallmark of biological intelligence and control is combinatorial generalization: animals are able to learn various things, then piece them together in new combinations to produce appropriate outputs for new tasks. Inspired by the ability of primates to readily imitate seen movement sequences, we present a model of motor control using a realistic model of arm dynamics, tasked with imitating a guide that makes arbitrary twosegment drawings. We hypothesize that modular organization is one of the keys to such flexible and generalizable control. We construct a modular control model consisting of separate encoding and motor RNNs and a scheduler, which we train endtoend on the task. We show that the modular structure allows the model to generalize not only to unseen twosegment trajectories, but to new drawings consisting of many more segments than it was trained on, and also allows for rapid adaptation to perturbations. Finally, our model recapitulates experimental observations of the preparatory and executionrelated processes unfolding during motor control, providing a normative explanation for functional segregation of preparatory and executionrelated activity within the motor cortex. 
Sunny Duan · Mikail Khona · Adrian Bertagnoli · Sarthak Chandra · Ila Fiete 🔗 


Testing geometric representation hypotheses from simulated place cell recordings
(
Poster
)
>
link
Hippocampal place cells can encode spatial locations of an animal in physical or taskrelevant spaces. We simulated place cell populations that encoded either Euclidean orgraphbased positions of a rat in a maze apparatus, and used an Autoencoder (AE) toanalyze these neural population activities. The structure of the latent space learned by theAE reflects the respective geometric representation, while PCA fails to do so. This suggestsfuture applications of AE architectures to decipher the geometry of spatial encoding in thebrain. 
Thibault Niederhauser · Adam Lester · Nina Miolane · Khanh Dao Duc · Manu Madhav 🔗 


Sheaf Attention Networks
(
Poster
)
>
link
Attention has become a central inductive bias for deep learning models irrespective of domain. However, increasing theoretical and empirical evidence suggests that Graph Attention Networks (GATs) suffer from the same pathological issues affecting many other Graph Neural Networks (GNNs). First, GAT's features tend to become progressively smoother as more layers are stacked, and second, the model performs poorly in heterophilic graphs. Sheaf Neural Networks (SNNs), a new class of models inspired by algebraic topology and geometry, have shown much promise in tackling these two issues. Building upon the recent success of SNNs and the wide adoption of attentionbased architectures, we propose Sheaf Attention Networks (SheafANs). By making use of a novel and more expressive attention mechanism equipped with geometric inductive biases, we show that this type of construction generalizes popular attentionbased GNN models to cellular sheaves. We demonstrate that these models help tackle the oversmoothing and heterophily problems and show that, in practice, SheafANs consistently outperform GAT on synthetic and realworld benchmarks. 
Federico Barbero · Cristian Bodnar · Haitz Sáez de Ocáriz Borde · Pietro Lió 🔗 


Learning Invariance Manifolds of Visual Sensory Neurons
(
Poster
)
>
link
Robust object recognition is thought to rely on neural mechanisms that are selective to complex stimulus features while being invariant to others (e.g., spatial location or orientation). To better understand biological vision, it is thus crucial to characterize which features neurons in different visual areas are selective or invariant to. In the past, invariances have commonly been identified by presenting carefully selected hypothesisdriven stimuli which rely on the intuition of the researcher. One example is the discovery of phase invariance in V1 complex cells. However, to identify novel invariances, a datadriven approach is more desirable. Here, we present a method that, combined with a predictive model of neural responses, learns a manifold in the stimulus space along which a target neuron's response is invariant. Our approach is fully datadriven, allowing the discovery of novel neural invariances, and enables scientists to generate and experiment with novel stimuli along the invariance manifold. We test our method on Gaborbased neuron models as well as on a neural network fitted on macaque V1 responses and show that 1) it successfully identifies neural invariances, and 2) disentangles invariant directions in the stimulus space. 
Luca Baroni · Mohammad Bashiri · Konstantin Willeke · Ján Antolík · Fabian Sinz 🔗 


Barron's Theorem for Equivariant Networks
(
Poster
)
>
link
The incorporation of known symmetries in a learning task provides a powerful inductive bias, reducing the sample complexity of learning equivariant functions in both theory and practice. Groupsymmetric architectures for equivariant deep learning are now widespread, as are accompanying universality results that verify their representational power. However, these symmetric approximation theorems suffer from the same major drawback as their original nonsymmetric counterparts: namely, they may require impractically large networks. In this work, we demonstrate that for some commonly used groups, there exist smooth subclasses of functions  analogous to Barron classes of functions  which can be efficiently approximated using invariant architectures. In particular, for permutation subgroups, there exist invariant approximating architectures whose sizes, while dependent on the precise orbit structure of the function, are in many cases just as small as the noninvariant architectures given by Barron's Theorem. For the rotation group, we define an invariant approximating architecture with a new invariant nonlinearity, which may be of independent practical interest, that is similarly just as small as its noninvariant counterparts. Overall, we view this work as a first step towards capturing the smoothness of invariant functions in invariant universal approximation, thereby providing approximation results that are not only invariant, but efficient. 
Hannah Lawrence 🔗 


Fuzzy cMeans Clustering in Persistence Diagram Space for Deep Learning Model Selection
(
Poster
)
>
link
Persistence diagrams concisely capture the structure of data, an ability that is increasingly being used in the nascent field of topological machine learning. We extend the ubiquitous Fuzzy cMeans (FCM) clustering algorithm to the space of persistence diagrams, enabling unsupervised learning in a topological setting. We give theoretical convergence guarantees that correspond to the Euclidean case and empirically demonstrate the capability of the clustering to capture topological information via the fuzzy RAND index. We present an application of our algorithm to a scenario that utilises both the topological and fuzzy nature of our algorithm: pretrained model selection in deep learning. As pretrained models can perform well on multiple tasks, selecting the best model is a naturally fuzzy problem; we show that fuzzy clustering persistence diagrams allows for unsupervised model selection using just the topology of their decision boundaries. 
Thomas Davies · Jack Aspinall · Bryan Wilder · Long TranThanh 🔗 


Moving Frame Net: SE(3)Equivariant Network for Volumes
(
Poster
)
>
link
Equivariance of neural networks to transformations helps to improve their performance and reduce generalization error in computer visions tasks, as they apply to datasets presenting symmetries (e.g. scalings, rotations, translations). The method of moving frames is classical for deriving operators invariant to the action of a Lie group in a manifold.Recently, a rotation and translation equivariant neural network for image data was proposed based on the moving frames approach. In this paper we significantly improve that approach by reducing the computation of moving frames to only one, at the input stage, instead of repeated computations at each layer. The equivariance of the resulting architecture is proved theoretically and we build a rotation and translation equivariant neural network to process volumes, i.e. signals on the 3D space. Our trained model overperforms the benchmarks in the medical volume classification of most of the tested datasets from MedMNIST3D. 
Mateus Sangalli · Samy Blusseau · Santiago VelascoForero · Jesus Angulo 🔗 


Periodic Signal Recovery with Regularized Sine Neural Networks
(
Poster
)
>
link
We consider the problem of learning a periodic onedimensional signal with neural networks, and designing models that are able to extrapolate the signal well beyond the training window. First, we show that multilayer perceptrons with ReLU activations are provably unable to perform this task, and lead to poor performance in practice even close to the training window. Then, we propose a novel architecture using sine activation functions along with a wellchosen nonconvex regularization, that is able to extrapolate the signal with low error well beyond the training window. Our architecture is several orders of magnitude better than its competitors for distant extrapolation (beyond 100 periods of the signal), while being able to accurately recover the frequency spectrum of the signal in a multitone setting. 
David A. R. Robin · Kevin Scaman · marc lelarge 🔗 


Topological Ensemble Detection with Differentiable Yoking
(
Poster
)
>
link
Modern neural recordings comprise thousands of neurons recorded at millisecond precision. An important step in analyzing these recordings is to identify neural ensembles – subsets of neurons that represent a subsystem of specific functionality. A famous example in the mammalian brain are grid cells, which are separated into ensembles of different spatial resolution. Recent work demonstrated that recordings from individual ensembles exhibit the clear topological signature of a torus, which, however, is obscured in combined recordings from multiple ensembles. Inspired by this observation, we introduce a topological ensemble detection algorithm that is capable of unsupervised identification of neural ensembles based on their topological signatures. This identification is achieved by optimizing a loss function that captures the assumed topological signature of the ensemble. To our knowledge, this is the first method that does not rely on external covariates and that leverages global features of the dataset to identify neural ensembles. This opens up exciting possibilities, e.g., searching for cell ensembles in prefrontal areas, which may represent cognitive maps on more conceptual spaces than grid cells. 
David Klindt · Sigurd Gaukstad · Erik Hermansen · Melvin Vaupel · Benjamin Dunn 🔗 


Kendall ShapeVAE : Learning Shapes in a Generative Framework
(
Poster
)
>
link
Learning an interpretable representation of data without supervision is an important precursor for the development of artificial intelligence. In this work, we introduce \textit{Kendall Shape}VAE, a novel Variational Autoencoder framework for learning shapes as it disentangles the latent space by compressing information to simpler geometric symbols. In \textit{Kendall Shape}VAE, we modify the Hyperspherical Variational Autoencoder such that it results in an exactly rotationally equivariant network using the notion of landmarks in the Kendall shape space. We show the exact equivariance of the model through experiments on rotated MNIST. 
Sharvaree Vadgama · Jakub Tomczak · Erik Bekkers 🔗 


Understanding Optimization Challenges when Encoding to Geometric Structures
(
Poster
)
>
link
Geometric inductive biases such as spatial curvature, factorizability, or equivariance have been shown to enable learning of latent spaces which better reflect the structure of data and perform better on downstream tasks. Training such models, however, can be a challenging task due to the topological constraints imposed by encoding to such structures. In this paper, we theoretically and empirically characterize obstructions to training autoencoders with geometric latent spaces. These include issues such as singularity (e.g. selfintersection), incorrect degree or winding number, and nonisometric homeomorphic embedding. We propose a method, isometric autoencoder, to improve the stability of training and convergence to an isometric mapping in geometric latent spaces. We perform an empirical evaluation of this method over 2 domains, which demonstrates that our approach can better circumvent the identified optimization problems. 
Babak Esmaeili · Robin Walters · Heiko Zimmermann · JanWillem van de Meent 🔗 


Surfing on the Neural Sheaf
(
Poster
)
>
link
The deep connections between Partial Differential Equations (PDEs) and Graph Neural Networks (GNNs) have recently generated a lot of interest in PDEinspired architectures for learning on graphs. However, despite being more interpretable and better understood via wellestablished tools from PDE analysis, the dynamics these models use are often too simple for complicated node classification tasks. The recently proposed Neural Sheaf Diffusion (NSD) models address this by making use of an additional geometric structure over the graph, called a sheaf, that can support a provably powerful class of diffusion equations. In this work, we propose Neural Sheaf Propagation (NSP), a new PDEbased Sheaf Neural Network induced by the wave equation on sheaves. Unlike diffusion models that are characterised by a dissipation of energy, wave models conserve energy, which can be beneficial for node classification tasks on heterophilic graphs. In practice, we show that NSP obtains competitive results with NSD and outperforms many other existent models. 
Julian Suk · Lorenzo Giusti · Tamir Hemo · Miguel Lopez · Marco La Vecchia · Konstantinos Barmpas · Cristian Bodnar 🔗 


Unsupervised learning of geometrical features from images by explicit group actions enforcement
(
Poster
)
>
link
In this work we propose an autoencoder architecture capable of automatically learning meaningful geometric features of objects in images, achieving a disentangled representation of 2D objects. It is made of a standard dense autoencoder that captures the deep features identifying the shapes and an additional encoder that extracts geometric latent variables regressed in an unsupervised manner. These are then used to apply a transformation on the output of the \textit{deep features} decoder. The promising results show that this approach performs better than a nonconstrained model having more degrees of freedom. 
Francesco Calisto · Luca Bottero · Valerio Pagliarino 🔗 


Learning to Continually Learn with Topological Regularization
(
Poster
)
>
link
Continual learning in neural networks suffers from a phenomenon called catastrophic forgetting, in which a network quickly forgets what was learned in a previous task. The human brain, however, is able to continually learn new tasks and accumulate knowledge throughout life. Neuroscience findings suggest that continual learning success in the human brain is potentially associated with its modular structure and memory consolidation mechanisms. In this paper we propose a novel topological regularization that penalizes cycle structure in a neural network during training using principled theory from persistent homology and optimal transport. The penalty encourages the network to learn modular structure during training. The penalization is based on the closedform expressions of the Wasserstein distance and barycenter for the topological features of a 1skeleton representation for the network. Our topological continual learning method combines the proposed regularization with a tiny episodic memory to mitigate forgetting. We demonstrate that our method is effective in both shallow and deep network architectures for multiple image classification datasets. 
Tananun Songdechakraiwut · Xiaoshuang Yin · Barry Van Veen 🔗 


Optimal Latent Transport
(
Poster
)
>
link
It is common to assume that the latent space of a generative model is a lowerdimensional Euclidean space. We instead endow the latent space with a Riemannian structure. Previous work endows this Riemannian structure by pulling back the Euclidean metric of the observation space or the FisherRao metric on the decoder distributions to the latent space. We instead investigate pulling back the Wasserstein metric tensor on the decoder distributions to the latent space. We develop an efficient realization of this metric, and, through proof of concept experiments, demonstrate that the approach is viable. 
Hrittik Roy · Søren Hauberg 🔗 


Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates
(
Poster
)
>
link
Adding momentum into Riemannian optimization is computationally challenging due to the intractable ODEs needed to define the exponential and parallel transport maps. We address these issues for Gaussian FisherRao manifolds by proposing new local coordinates to exploit sparse structures and efficiently approximate the ODEs, which results in a numerically stable update scheme. Our approach extends the structured naturalgradient descent method of Lin et al. (2021a) by incorporating momentum into it and scaling the method for largescale applications arising in numerical optimization and deep learning 
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Mohammad Emtiyaz Khan · Mark Schmidt 🔗 


Image to Icosahedral Projection for $\mathrm{SO}(3)$ Object Reasoning from SingleView Images
(
Poster
)
>
link
Reasoning about 3D objects based on 2D images is challenging due to variations in appearance caused by viewing the object from different orientations. Tasks such as object classification are invariant to 3D rotations and other such as pose estimation are equivariant. However, imposing equivariance as a model constraint is typically not possible with 2D image input because we do not have an a priori model of how the image changes under outofplane object rotations. The only $\mathrm{SO}(3)$equivariant models that currently exist require point cloud or voxel input rather than 2D images. In this paper, we propose a novel architecture based on icosahedral group convolutions that reasons in $\mathrm{SO(3)}$ by learning a projection of the input image onto an icosahedron. The resulting model is approximately equivariant to rotation in $\mathrm{SO}(3)$. We apply this model to object pose estimation and shape classification tasks and find that it outperforms reasonable baselines.

David Klee · Ondrej Biza · Robert Platt · Robin Walters 🔗 


Conformal Isometry of Lie Group Representation in Recurrent Network of Grid Cells
(
Poster
)
>
link
The activity of the grid cell population in the medial entorhinal cortex (MEC) of the brain forms a vector representation of the selfposition of the animal. Recurrent neural networks have been developed to explain the properties of the grid cells by transforming the vector based on the input velocity, so that the grid cells can perform path integration. In this paper, we investigate the algebraic, geometric, and topological properties of grid cells using recurrent network models. Algebraically, we study the Lie group and Lie algebra of the recurrent transformation as a representation of selfmotion. Geometrically, we study the conformal isometry of the Lie group representation of the recurrent network where the local displacement of the vector in the neural space is proportional to the local displacement of the agent in the 2D physical space. We then focus on a simple nonlinear recurrent model that underlies the continuous attractor neural networks of grid cells. Our numerical experiments show that conformal isometry leads to hexagon periodic patterns of the response maps of grid cells and our model is capable of accurate path integration. 
Dehong Xu · Ruiqi Gao · Wenhao Zhang · XueXin Wei · Ying Nian Wu 🔗 


Breaking the Symmetry: Resolving Symmetry Ambiguities in Equivariant Neural Networks
(
Poster
)
>
link
Equivariant networks have been adopted in many 3D learning areas. Here we identify a fundamental limitation of these networks: their ambiguity to symmetries. Equivariant networks cannot complete symmetrydependent tasks like segmenting a leftright symmetric object into its left and right sides. We tackle this problem by adding components that resolve symmetry ambiguities while preserving rotational equivariance. We present OAVNN: Orientation Aware Vector Neuron Network, an extension of the Vector Neuron Network Deng et al. (2021). OAVNN is a rotation equivariant network that is robust to planar symmetric inputs. Our network consists of three key components. 1) We introduce an algorithm to calculate symmetry detecting features. 2) We create a symmetrysensitive orientation aware linear layer. 3) We construct an attention mechanism that relates directional information across points. We evaluate the network using leftright segmentation and find that the network quickly obtains accurate segmentations. We hope this work motivates investigations on the expressivity of equivariant networks on symmetric objects. 
Sidhika Balachandar · Adrien Poulenard · Congyue Deng · Leonidas Guibas 🔗 


Spatial Symmetry in Slot Attention
(
Poster
)
>
link
Automatically discovering composable abstractions from raw perceptual data is a longstanding challenge in machine learning. Slotbased neural networks have recently shown promise at discovering and representing objects in visual scenes in a selfsupervised fashion. While they make use of permutation symmetry of objects to drive learning of abstractions, they largely ignore other spatial symmetries present in the visual world. In this work, we introduce a simple, yet effective, method for incorporating spatial symmetries in attentional slotbased methods. We incorporate equivariance to translation and scale into the attention and generation mechanism of Slot Attention solely via translating and scaling positional encodings. Both changes result in little computational overhead, are easy to implement, and can result in large gains in data efficiency and scene decomposition performance. 
Ondrej Biza · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gamaleldin Elsayed · Aravindh Mahendran · Thomas Kipf 🔗 


Nonlinear and Commutative Editing in Pretrained GAN Latent Space
(
Poster
)
>
link
Semantic editing of images is a fundamental goal of computer vision. While generative adversarial networks (GANs) are gaining attention for their ability to produce highquality images, they do not provide an inherent way to edit images semantically. Recent studies have investigated how to manipulate the latent variable to determine the images to be generated. However, methods that assume linear semantic arithmetic have limitations in the quality of image editing. Also, methods that discover nonlinear semantic pathways provide editing that is noncommutative, in other words, inconsistent when applied in different orders. This paper proposes a method for discovering semantic commutative vector fields. We theoretically demonstrate that thanks to commutativity, multiple editing along the vector fields depend only on the quantities of editing, not on the order of the editing. We also experimentally demonstrated that the nonlinear and commutative nature of editing provides higher quality editing than previous methods. 
Takehiro Aoshima · Takashi Matsubara 🔗 


Representing Repeated Structure in Reinforcement Learning Using Symmetric Motifs
(
Poster
)
>
link
Transition structures in reinforcement learning can contain repeated motifs and redundancies. In this preliminary work, we suggest using the geometric decomposition of theadjacency matrix to form a mapping into an abstract state space. Using the SuccessorRepresentation (SR) framework, we decouple symmetries in the translation structure fromthe reward structure, and form a natural structural hierarchy by using separate SRs for theglobal and local structures of a given task. We demonstrate that there is low error whenperforming policy evaluation using this method and that the resulting representations canbe significantly compressed. 
Matthew Sargent · Augustine MavorParker · Peter J Bentley · Caswell Barry 🔗 


Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
(
Poster
)
>
link
Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of underlying entities that take the value of object states. Worse, these entities are often unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured inputs. By constructing a factorized transition graph over clusters of object representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on a set of simulated rearrangement and stacking tasks. 
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang 🔗 


Neuromorphic Visual Scene Understanding with Resonator Networks (in brief)
(
Poster
)
>
link
Inferring the position of objects and their rigid transformations is still an open problem in visual scene understanding. Here we propose a neuromorphic framework that poses scene understanding as a factorization problem and uses a resonator network to extract object identities and their transformations. The framework uses vector binding operations to produce generative image models in which binding acts as the equivariant operation for geometric transformations. A scene can therefore be described as a sum of vector products, which in turn can be efficiently factorized by a resonator network to infer objects and their poses. We also describe a hierarchical resonator network that enables the definition of a partitioned architecture in which vector binding is equivariant for horizontal and vertical translation within one partition, and for rotation and scaling within the other partition. We demonstrate our approach using synthetic scenes composed of simple 2D shapes undergoing rigid geometric transformations and color changes. 
Alpha Renner · Giacomo Indiveri · Lazar Supic · Andreea Danielescu · Bruno Olshausen · Fritz Sommer · Yulia Sandamirskaya · Edward Frady 🔗 


SeLCA: SelfSupervised Learning of Canonical Axis
(
Poster
)
>
link
Robustness to rotation is critical for point cloud understanding tasks, as point cloud features can be affected dramatically with respect to prevalent rotation changes. In this work, we propose SeLCA, a novel selfsupervised learning framework to learn to the canonical axis of point clouds in a probabilistic manner. In essence, we propose to \textit{learn} rotationalequivariance by predicting the canonical axis of point clouds, and achieve rotationalinvariance by aligning the point clouds using their predicted canonical axis. When integrated into a rotationsensitive pipeline, SeLCA achieves competitive performances on the ModelNet40 classification task under unseen rotations. Most interestingly, our proposed method also shows high robustness to various realworld point cloud corruptions presented by the ModelNet40C dataset, compared to the stateoftheart rotationinvariant method. 
Seungwook Kim · Yoonwoo Jeong · Chunghyun Park · Jaesik Park · Minsu Cho 🔗 


Neural Implicit Stylenet: synthesizing shapes in a preferred style exploiting self supervision
(
Poster
)
>
link
We introduce a novel approach to disentangle style from content in the 3D domain and perform unsupervised neural style transfer.Our approach is able to extract style information from 3D input in a self supervised fashion, conditioning the definition of style on inductive biases enforced explicitly, in the form of specific augmentations applied to the input.This allows, at test time, to select specifically the features to be transferred between two arbitrary 3D shapes, being still able to capture complex changes (e.g. combinations of arbitrary geometrical and topological transformations) with the data prior. Coupled with the choice of representing 3D shapes as neural implicit fields, we are able to perform style transfer in a controllable way, handling a variety of transformations.We validate our approach qualitatively and quantitatively on a dataset with font style labels. 
Marco Fumero · Hooman Shayani · Aditya Sanghi · Emanuele Rodolà 🔗 


MixedMembership Community Detection via Line Graph Curvature
(
Poster
)
>
link
Community detection is a classical method for understanding the structure of relationaldata. In this paper, we study the problem of identifying mixedmembership communitystructure. We argue that it is beneficial to perform this task on the line graph, which canbe constructed from an input graph by encoding the relationship between its edges. Here,we propose a curvaturebased algorithm for mixedmembership community detection onthe line graph. Our algorithm implements a discrete Ricci curvature flow under which theedge weights of a graph evolve to reveal its community structure. We demonstrate theperformance of our approach in a series of benchmark experiments. 
Yu Tian · Zachary Lubberts · Melanie Weber 🔗 


Scalable Vector Representation for Topological Data Analysis Based Classification
(
Poster
)
>
link
Classification of large and dense networks based on topology is very difficult due to the computational challenges of extracting meaningful topological features from realworld networks. In this paper we present a computationally tractable approach to topological classification of networks by using principled theory from persistent homology and optimal transport to define a novel vector representation for topological features. The proposed vector space is based on the Wasserstein distance between persistence barcodes. The 1skeleton of the network graph is employed to obtain 1dimensional persistence barcodes that represent connected components and cycles. These barcodes and the corresponding Wasserstein distance can be computed very efficiently. The effectiveness of the proposed vector space is demonstrated using support vector machines to classify brain networks. 
Tananun Songdechakraiwut · Bryan Krause · Matthew Banks · Kirill Nourski · Barry Van Veen 🔗 


Geometry reveals an instructive role of retinal waves as biologically plausible pretraining signals
(
Poster
)
>
link
Prior to the onset of vision, neurons in the developing mammalian retina spontaneously fire in correlated activity patterns known as retinal waves. Experimental evidence suggests retinal waves strongly influence sensory representations before the visual experience. We aim to elucidate the computational role of retinal waves by using them as pretraining signals for neural networks. We consider simulated activity patterns generated by a model retina as well as real activity patterns observed experimentally in a developing mouse retina. We show that pretraining a classifier with a biologically plausible Hebbian learning rule on both simulated and real wave patterns improves the separability of the network’s internal representations. In particular, the pretrained networks achieve higher classification accuracy and exhibit internal representations with higher manifold capacity when compared to networks with randomly shuffled synaptic weights. 
Andrew Ligeralde · Miah Pitcher · Marla Feller · SueYeon Chung 🔗 


Sparse Convolutions on Lie Groups
(
Poster
)
>
link
Convolutional neural networks have proven very successful for a wide range of modelling tasks. Convolutional layers embed equivariance to discrete translations into the architectural structure neural networks. Recent extensions generalize this notion to continuous Lie groups beyond translation, such as rotation, scale or more complex symmetries. Another recent generalization of the convolution has allowed for relaxed equivariance constraints, which can be to model data that does not fully respect symmetries while still leveraging on useful inductive biases that equivariances provide. Unlike simple grids for regular convolution over the translational group, sampling convolutional filters on Lie groups requires filters that are continuously parameterised. To parameterise sufficiently flexible continuous filters, small MLP hypernetworks are often used in practice. Although this works, it introduces many additional model parameters. To be more parameterefficient, we propose an alternative approach defining continuous filters on Lie groups with a small finite set of basis functions through pseudopoints. Regular convolutional layers appear as a special case, allowing for practical conversion between regular filters and our basis function filter formulation, at equal memory complexity. We demonstrate that basis function filters can be used to create efficient equivariant and relaxedequivariant versions of commonly used neural network architectures, outperforming baselines on CIFAR10 and CIFAR100 vision classification tasks. 
Tycho van der Ouderaa · Mark van der Wilk 🔗 


Equivariant Representations for NonFree Group Actions
(
Poster
)
>
link
We introduce a method for learning representations that are equivariant with respect to general group actions over data. Differently from existing equivariant representation learners, our method is suitable for actions that are not free i.e., that stabilize data via nontrivial symmetries. Our method is grounded in the orbitstabilizer theorem from group theory, which guarantees that an ideal learner infers an isomorphic representation. Finally, we provide an empirical investigation on image datasets with rotational symmetries and show that taking stabilizers into account improves the quality of the representations. 
Luis Armando Pérez Rey · Giovanni Luca Marchetti · Danica Kragic · Dmitri Jarnikov · Mike Holenderski 🔗 


Capturing crosssession neural population variability through selfsupervised identification of consistent neuron ensembles
(
Poster
)
>
link
Decoding stimuli or behaviour from recorded neural activity is a common approach to interrogate brain function in research, and an essential part of braincomputer and brainmachine interfaces. Reliable decoding even from small neural populations is possible because high dimensional neural population activity typically occupies low dimensional manifolds that are discoverable with suitable latent variable models. Over time however, drifts in activity of individual neurons and instabilities in neural recording devices can be substantial, making stable decoding over days and weeks impractical. While this drift cannot be predicted on an individual neuron level, population level variations over consecutive recording sessions such as differing sets of neurons and varying permutations of consistent neurons in recorded data may be learnable when the underlying manifold is stable over time. Classification of consistent versus unfamiliar neurons across sessions and accounting for deviations in the order of consistent recording neurons across sessions of recordings may then maintain decoding performance and uncover a taskrelated neural manifold. Here we show that selfsupervised training of a deep neural network can be used to compensate for this intersession variability. As a result, a sequential autoencoding model can maintain stateoftheart behaviour decoding performance for completely unseen recording sessions several days into the future. Our approach only requires a single recording session for training the model, and is a step towards reliable, recalibrationfree brain computer interfaces. 
Justin Jude · Matthew Perich · Lee Miller · Matthias Hennig 🔗 


Identifying latent distances with Finslerian geometry
(
Poster
)
>
link
Riemannian geometry has been shown useful to explore the latent space of models of high dimensional data. This latent space is learnt via a stochastic smooth mapping, and a deterministic approximation of the metric is required. Yet, this approximation is adhoc and doesn't lead to interpretable quantities, such as the curve length. Here, we are defining a new metric as the expectation of the stochastic length induced by this smooth mapping. We show that this norm is a Finsler metric. We compare this Finsler metric with the previously studied expected Riemannian metric, and we show that in high dimensions, these metrics converge to each other. 
Alison Pouplin · David Eklund · Carl Henrik Ek · Søren Hauberg 🔗 


Capacity of Groupinvariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?
(
Poster
)
>
link
Equivariance has emerged as a desirable property of representations of objects subject to identitypreserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies the number of linearly separable and groupinvariant binary dichotomies that can be assigned to equivariant representations of objects. We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action. We show how this relation extends to operations such as convolutions, elementwise nonlinearities, and local pooling. While other operations do not change the fraction of separable dichotomies, local pooling decreases the fraction, despite being a highly nonlinear operation. Finally, we test our theory on intermediate representations of randomly initialized and fully trained convolutional neural networks and find perfect agreement. 
Matthew Farrell · Blake Bordelon · Shubhendu Trivedi · Cengiz Pehlevan 🔗 


Towards architectural optimization of equivariant neural networks over subgroups
(
Poster
)
>
link
Incorporating equivariance to symmetry groups in artificial neural networks (ANNs) can improve performance on tasks exhibiting those symmetries, but such symmetries are often only approximate and not explicitly known. This motivates algorithmically optimizing the architectural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which preserves functionality while reparameterizing a group equivariant layer to operate with equivariance constraints on a subgroup, and the $[G]$mixed equivariant layer, which mixes operations constrained to equivariance to different groups to enable withinlayer equivariance optimization. These two architectural tools can be used within neural architecture search (NAS) algorithms for equivarianceaware architectural optimization.

Kaitlin Maile · Dennis Wilson · Patrick Forré 🔗 


Lorentz Direct Concatenation for Stable Training in Hyperbolic Neural Networks
(
Poster
)
>
link
Hyperbolic neural networks have achieved considerable success in extracting representation from hierarchical or treelike data. However, they are known to suffer from numerical instability, which makes it difficult to build hyperbolic neural networks with deep hyperbolic layers, no matter whether the Poincaré or Lorentz coordinate system is used. In this note, we study the crucial operation of concatenating hyperbolic representations. We propose the Lorentz direct concatenation and illustrate that it is much more stable than concatenating in the tangent space. We provide some insights and show superiority of performing direct concatenation in real tasks. 
Eric Qu · Dongmian Zou 🔗 


Generalized Laplacian Positional Encoding for Graph Representation Learning
(
Poster
)
>
link
Graph neural networks (GNNs) are the primary tool for processing graphstructured data. Unfortunately, the most commonly used GNNs, called Message Passing Neural Networks (MPNNs) suffer from several fundamental limitations. To overcome these limitations, recent works have adapted the idea of positional encodings to graph data. This paper draws inspiration from the recent success of Laplacianbased positional encoding and defines a novel family of positional encoding schemes for graphs. We accomplish this by generalizing the optimization problem that defines the Laplace embedding to more general dissimilarity functions rather than the 2norm used in the original formulation. This family of positional encodings is then instantiated by considering pnorms. We discuss a method for calculating these positional encoding schemes, implement it in PyTorch and demonstrate how the resulting positional encoding captures different properties of the graph. Furthermore, we demonstrate that this novel family of positional encodings can improve the expressive power of MPNNs. Lastly, we present preliminary experimental results. 
Sohir Maskey · Ali Parviz · Maximilian Thiessen · Hannes Stärk · Ylli Sadikaj · Haggai Maron 🔗 


On the Level Sets and Invariance of Neural Tuning Landscapes
(
Poster
)
>
link
Visual perception results from the activation of neuronal populations, a process mirrored by hidden units in artificial neural networks (ANNs). The activation of a neuron as a function over all image space has been described as a "tuning landscape". As a function over a highdimensional space, what is the structure of this landscape? In this study, we characterize tuning landscapes through the lens of level sets and Morse theory. A recent study measured the in vivo twodimensional tuning maps of neurons in different brain regions. Here, we developed a robust signature for these maps based on the change of topology in level sets. We found this topological signature changes progressively throughout the cortical hierarchy. Further, we analyzed the tuning landscapes of ANN units. By measuring the geometry of level sets, we advance the hypothesis that higherorder units can be locally regarded as isotropic radial basis functions (but not globally). This shows the power of level sets as a conceptual tool to understand neuronal activations over image space. 
Binxu Wang · Carlos Ponce 🔗 


Local Geometry Constraints in V1 with Deep Recurrent Autoencoders
(
Poster
)
>
link
The classical sparse coding model represents visual stimuli as a convex combination of a handful of learned basis functions that are Gaborlike when trained on natural image data. However, the Gaborlike filters learned by classical sparse coding far overpredict welltuned simple cell receptive field (SCRF) profiles. The autoencoder that we use to address this problem, which maintains a natural hierarchical structure when paired with a discriminative loss, is evaluated with a weighted$\ell_1$ (WL) penalty that encourages selfsimilarity of basis function usage. The weighted$\ell_1$ constraint matches the spatial phase symmetry of recent contrastive objectives while maintaining core ideas of the sparse coding framework, yet also offers a promising path to describe the differentiation of receptive fields in terms of this discriminative hierarchy in future work.

Jonathan Huml · Demba Ba 🔗 


Expander Graph Propagation
(
Poster
)
>
link
Deploying graph neural networks (GNNs) on wholegraph classification or regression tasks is challenging, often requiring node features that are mindful of both local interactions and the graph global context. GNN architectures need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We provide an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. 
Andreea Deac · Marc Lackenby · Petar Veličković 🔗 


Connectedness of loss landscapes via the lens of Morse theory
(
Poster
)
>
link
Mode connectivity is a recently discovered property of neural networks saying that two weights of small loss can usually be connected by a path of small loss. This property is interesting practically as it has applications to design of optimizers with better generalization properties and various other applied topics as well as theoretically as it suggests that loss landscapes of deep networks have very nice properties even though they are known to be highly nonconvex. The goal of this work is to study connectedness of loss landscapes via the lens of Morse theory. A brief introduction to Morse theory is provided. 
Danil Akhtiamov · Matt Thomson 🔗 


The Union of Manifolds Hypothesis
(
Poster
)
>
link
The manifold hypothesis states that lowdimensional manifold structure exists in highdimensional data, which is strongly supported by the success of deep learning in processing such data. However, we argue here that the manifold hypothesis is incomplete, as it does not allow any variation in the intrinsic dimensionality of different subregions of the data space. We thus posit the union of manifold hypothesis, which states that highdimensional data of interest comes from a union of disjoint manifolds; this allows intrinsic dimensionality to vary. We empirically verify this hypothesis on image datasets using a standard estimator of intrinsic dimensionality, and also demonstrate an improvement in classification performance derived from this hypothesis. We hope our work will encourage the community to further explore the benefits of considering the union of manifolds structure in data. 
Bradley Brown · Anthony Caterini · Brendan Ross · Jesse Cresswell · Gabriel LoaizaGanem 🔗 


On the Expressive Power of Geometric Graph Neural Networks
(
Poster
)
>
link
We propose a geometric version of the WeisfeilerLeman graph isomorphism test (GWL) for discriminating geometric graphs while respecting the underlying symmetries such as permutation, rotation, and translation.We use GWL to characterise the expressive power of Graph Neural Networks (GNNs) for geometric graphs and provide formal results for the following: (1) What geometric graphs can and cannot be distinguished by GNNs invariant or equivariant to spatial symmetries;(2) Equivariant GNNs are strictly more powerful than their invariant counterparts. 
Cristian Bodnar · Chaitanya K. Joshi · Simon Mathis · Taco Cohen · Pietro Liò 🔗 


Group invariant machine learning by fundamental domain projections
(
Poster
)
>
link
We approach the wellstudied problem of supervised group invariant and equivariant machine learning from the point of view of geometric topology. We propose a novel approach using a preprocessing step, which involves projecting the input data into a geometric space which parametrises the orbits of the symmetry group. This new data can then be the input for an arbitrary machine learning model (neural network, random forest, supportvector machine etc). We give an algorithm to compute the geometric projection, which is efficient to implement, and we illustrate our approach on some example machine learning problems (including the wellstudied problem of predicting Hodge numbers of CICY matrices), in each case finding an improvement in accuracy versus others in the literature. 
Benjamin Aslan · Daniel Platt · David Sheard 🔗 


Hyperbolic and Mixed Geometry Graph Neural Networks
(
Poster
)
>
link
Hyperbolic Graph Neural Networks (GNNs) have shown great promise for modeling hierarchical and graphstructured data in the hyperbolic space, which reduces embedding distortion comparing to Euclidean space. However, existing hyperbolic GNNs implement most operations through differential and exponential maps in the tangent space, which is a Euclidean subspace. To avoid such complex transformations between the hyperbolic and Euclidean spaces, recent advances in hyperbolic learning have formalized hyperbolic neural networks based on the Lorentz model that realize their operations entirely in the hyperbolic space via Lorentz transformations \cite{chenetal2022fully}. Here, we adopt the hyperbolic framework from \cite{chenetal2022fully} and propose a family of hyperbolic GNNs with greater modeling capabilities as opposed to existing hyperbolic GNNs. We also show that this framework allows us to have neural networks with both hyperbolic layers and Euclidean layers that can be trained jointly. Our experiments demonstrate that our fully hyperbolic GNNs lead to substantial improvement in comparison with their Euclidean counterparts. 
Rishi Sonthalia · Xinyue Cui 🔗 


What shapes the loss landscape of selfsupervised learning?
(
Poster
)
>
link
Prevention of complete and dimensional collapse of representations has recently become a design principle for selfsupervised learning (SSL). However, questions remain in our theoretical understanding: Under what precise condition do these collapses occur? We provide theoretically grounded answers to this question by analyzing SSL loss landscapes for a linear model. We derive an analytically tractable theory of SSL landscape and show that it accurately captures an array of collapse phenomena and identifies their causes. 
Liu Ziyin · Ekdeep S Lubana · Masahito Ueda · Hidenori Tanaka 🔗 


Discretization Invariant Learning on Neural Fields
(
Poster
)
>
link
With the ability to generate and store continuous data in the form of neural fields (NFs), there is a need for neural networks that can process such fields in a manner that is invariant to the discretization of the data domain. We introduce INRNet, a framework for learning discretization invariant maps on NFs of any type. Driven by numerical integration, INRNet can universally approximate a large class of maps between $L^2$ functions. We demonstrate our framework on NF classification, and examine the network's ability to generalize to different discretizations.

Clinton Wang · Polina Golland 🔗 


Training shapes the curvature of shallow neural network representations
(
Poster
)
>
link
We study how training shapes the Riemannian geometry induced by neural network feature maps. At infinite width, shallow neural networks induce highly symmetric metrics on input space. Feature learning in networks trained to perform simple classification tasks magnifies local areas and reduces curvature along decision boundaries. These changes are consistent with previously proposed geometric approaches for handtuning of kernel methods to improve generalization. 
Jacob ZavatoneVeth · Julian Rubinfien · Cengiz Pehlevan 🔗 


Homomorphism AutoEncoder  Learning Group Structured Representations from Observed Transitions
(
Poster
)
>
link
It is crucial for agents, both biological and artificial, to acquire world models that veridically represent the external world and how it is modified by the agent's own actions. We consider the case where such modifications can be modelled as transformations from a group of symmetries structuring the world state space. We use tools from representation learning and group theory to learn latent representations that account for both sensory information and the actions that alters it during interactions. We introduce the Homomorphism AutoEncoder (HAE), an autoencoder equipped with a learned group representation linearly acting on its latent space trained on 2step transitions to implicitly enforce the group homomorphism property on the action representation.Compared to existing work, our approach makes fewer assumptions on the group representation and on which transformations the agent can sample from. We motivate our method theoretically, and demonstrate empirically that it can learn the correct representation of the groups and the topology of the environment. We also compare its performance in trajectory prediction with previous methods. 
Hamza Keurti · HsiaoRu Pan · Michel Besserve · Benjamin F. Grewe · Bernhard Schölkopf 🔗 


Computing Representations for Lie Algebraic Networks
(
Poster
)
>
link
Recent work has constructed neural networks that are equivariant to continuous symmetry groups such as 2D and 3D rotations. This is accomplished using explicit Lie group representations to derive the equivariant kernels and nonlinearities. We present three contributions motivated by frontier applications of equivariance beyond rotations and translations. First, we relax the requirement for explicit Lie group representations with a novel algorithm that finds representations of arbitrary Lie groups given only the structure constants of the associated Lie algebra. Second, we provide a selfcontained method and software for building Lie groupequivariant neural networks using these representations. Third, we contribute a novel benchmark dataset for classifying objects from relativistic point clouds, and apply our methods to construct the first objecttracking model equivariant to the Poincaré group.Note to referees:This manuscript has been previously submitted to arxiv under a different title and has never been published in a conference or journal. This current submission includes several substantive revisions. The new title is intended to present a clearer description of the work. 
Noah Shutty · Casimir Wierzynski 🔗 


Datadriven emergence of convolutional structure in neural networks
(
Poster
)
>
link
Exploiting data invariances is crucial for efficient learning in both artificial and biological neural circuits, but can neural networks learn apposite representations from scratch? Convolutional neural networks, for example, were designed to exploit translation symmetry, yet learning convolutions directly from data has so far proven elusive. Here, we show how initially fullyconnected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs, resulting in localised, spacetiling receptive fields that match the filters of a convolutional network trained on the same task. By carefully designing data models for the visual scene, we show that the emergence of this pattern is triggered by the nonGaussian, higherorder local structure of the inputs, which has long been recognised as the hallmark of natural images. We provide an analytical and numerical characterisation of the patternformation mechanism responsible for this phenomenon in a simple model and find an unexpected link between receptive field formation and tensor decomposition of higherorder input correlations. 
Alessandro Ingrosso · Sebastian Goldt 🔗 


Disentangling Images with Lie Group Transformations and Sparse Coding
(
Poster
)
>
link
Discrete spatial patterns and their continuous transformations are two important regularities in natural signals. Lie groups and representation theory are mathematical tools used in previous works to model continuous image transformations. On the other hand, sparse coding is an essential tool for learning dictionaries of discrete natural signal patterns. This paper combines these ideas in a Bayesian generative model that learns to disentangle spatial patterns and their continuous transformations in a completely unsupervised manner. Images are modeled as a sparse superposition of shape components followed by a transformation parameterized by n continuous variables. The shape components and transformations are not predefined but are instead adapted to learn the data’s symmetries. The constraint is that the transformations form a representation of an ndimensional torus. Training the model on a dataset consisting of controlled geometric transformations of specific MNIST digits shows that it can recover these transformations along with the digits. Training on the full MNIST dataset shows that it can learn the basic digit shapes and the natural transformations such as shearing and stretching contained in this data. This work provides the simplest known Bayesian mathematical model for building unsupervised factorized representations. 
Ho Yin Chau · Frank Qiu · Yubei Chen · Bruno Olshausen 🔗 


Does Geometric Structure in Convolutional Filter Space Provide Filter Redundancy Information?
(
Poster
)
>
link
This paper aims to study the geometrical structure present in a CNN filter space for investigating redundancy or importance of an individual filter. In particular, this paper analyses the convolutional layer filter space using simplical geometry to establish a relation between filter relevance and their location on the simplex. Convex combination of extremal points of a simplex can span the entire volume of the simplex. As a result, these points are inherently the most relevant components. Based on this principle, we hypothesize a notion that filters lying near these extremal points of a simplex modelling the filter space are least redundant filters and viceversa. We validate this positional relevance hypothesis by successfully employing it for dataindependent filter ranking and artificial filter fabrication in trained convolutional neural networks. The empirical analysis on different CNN architectures such as ResNet50 and VGG16 provide strong evidence in favour of the postulated positional relevance hypothesis. 
Anshul Thakur · Vinayak Abrol · Pulkit Sharma 🔗 


Is the information geometry of probabilistic population codes learnable?
(
Poster
)
>
link
One reason learning the geometry of latent neural manifolds from neural activity data is difficult is that the ground truth is generally not known, which can make manifold learning methods hard to evaluate. Probabilistic population codes (PPCs), a class of biologically plausible and selfconsistent models of neural populations that encode parametric probability distributions, may offer a theoretical setting where it is possible to rigorously study manifold learning. It is natural to define the neural manifold of a PPC as the statistical manifold of the encoded distribution, and we derive a mathematical result that the information geometry of the statistical manifold is directly related to measurable covariance matrices. This suggests a simple but rigorously justified decoding strategy based on principal component analysis, which we illustrate using an analytically tractable PPC. 
🔗 


Do Neural Networks Trained with Topological Features Learn Different Internal Representations?
(
Poster
)
>
link
There is a growing body of work that leverages features extracted via topological data analysis to train machine learning models. While this field, sometimes known as topological machine learning (TML), has seen some notable successes, an understanding of how the process of learning from topological features differs from the process of learning from raw data is still limited. In this work, we begin to address one component of this larger issue by asking whether a model trained with topological features learns internal representations of data that are fundamentally different than those learned by a model trained with the original raw data. To quantify "different", we exploit two popular metrics that can be used to measure the similarity of the hidden representations of data within neural networks, neural stitching and centered kernel alignment. From these we draw a range of conclusions about how training with topological features does and does not change the representations that a model learns. Perhaps unsurprisingly, we find that structurally, the hidden representations of models trained and evaluated on topological features differ substantially compared to those trained and evaluated on the corresponding raw data. On the other hand, our experiments show that in some cases, these representations can be reconciled (at least to the degree required to solve the corresponding task) using a simple affine transformation. We conjecture that this means that neural networks trained on raw data may extract some limited topological features in the process of making predictions. 
🔗 