Network and relational data structures have increasingly played a role in the understanding of complex biological, social and other relational systems. Statistical models of such systems can give descriptions of global relational features, characterize local network structure, and provide predictions for missing or future relational data.
Latent variable models are a popular tool for describing network and relational patterns. Many of these models are based on well-known matrix decomposition methods, and thus have a rich mathematical framework upon which to build. Additionally, the parameters in these models are easy to interpret: Roughly speaking, a latent variable model posits that the relationship between two nodes is a function of observed and unobserved (latent) characteristics, potentially in addition to contextual factors.
In this tutorial I give an introduction to latent variable models for relational and network data. I first provide a mathematical justification for a general latent factor model based on exchangeability considerations. I then describe and illustrate several latent variable models in the context of the statistical analysis of several network datasets. I also compare several such models in terms of what network features they can, and cannot, represent.
A particularly flexible class of models are the "latent factor" models, based on singular value and eigen-decompositions of a relational matrix. These models generalize in a natural way to accommodate more complicated relational data, such as datasets that are described by multiway arrays, such as a network measured over time or the measurement of several relational variables on a common nodeset. I will close the tutorial by showing how tools from multiway data analysis (such as the higher order SVD and PARAFAC decomposition) can be used to build statistical models of multiway networks and relational data.
Peter Hoff (University of Washington)
Peter Hoff is an Associate Professor of Statistics and Biostatistics at the University of Washington. He has developed a variety of Bayesian methods for multivariate data, including covariance and copula estimation, cluster analysis, mixture modeling and social network analysis. He is on the editorial board of the Annals of Applied Statistics, JRSSB and SIAM Classics.
More from the Same Authors
2007 Poster: Modeling homophily and stochastic equivalence in symmetric relational data »