Skip to yearly menu bar Skip to main content


Poster

Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development

Diane Hu · Laurens van der Maaten · Youngmin Cho · Lawrence Saul · Sorin Lerner


Abstract:

When software developers modify one or more files in a large code base, they must also identify and update other related files. Many file dependencies can be detected by mining the development history of the code base: in essence, groups of related files are revealed by the logs of previous workflows. From data of this form, we show how to detect dependent files by solving a problem in binary matrix completion. We explore different latent variable models (LVMs) for this problem, including Bernoulli mixture models, exponential family PCA, restricted Boltzmann machines, and fully Bayesian approaches. We evaluate these models on the development histories of three large, open-source software systems: Mozilla Firefox, Eclipse Subversive, and Gimp. In all of these applications, we find that LVMs improve the performance of related file prediction over current leading methods.

Live content is unavailable. Log in and register to view live content