Timezone: »

Representation Learning of Compositional Data
Marta Avalos · Richard Nock · Cheng Soon Ong · Julien Rouar · Ke Sun

Thu Dec 06 02:00 PM -- 04:00 PM (PST) @ Room 210 #73

We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.

Author Information

Marta Avalos (INRIA, INSERM U1219, University of Bordeaux)

Marta AVALOS, Ph.D. is an Associate Professor of Biostatistics at Bordeaux School of Public Health, University of Bordeaux, France, since 2005. She leads the interdisciplinary first-year of the Master of Public Health since 2011. She is a member of the research team "Statistics In Systems Biology and Translational Medicine" (SISTM) from the French National Institutes of computer science and automation research (INRIA) and health and medical research (INSERM). Marta received her Ph.D. in Information and Systems Technologies from the Technology University of Compi├Ęgne. She completed a Master's degree in Public Health at Paris-Sud University and a bachelor's degree in Mathematics at the University of Barcelona, Spain. Her work focuses on developing and integrating innovative statistical approaches, particularly Lasso-type regularization methods, to advance population health.

Richard Nock (Data61, the Australian National University and the University of Sydney)
Cheng Soon Ong (Data61 and ANU)

Cheng Soon Ong is a principal research scientist at the Machine Learning Research Group, Data61, CSIRO, and is the director of the machine learning and artificial intelligence future science platform at CSIRO. He is also an adjunct associate professor at the Australian National University. He is interested in enabling scientific discovery by extending statistical machine learning methods.

Julien Rouar (University of Bordeaux)
Ke Sun (Data61, CSIRO)

More from the Same Authors