Genetic variation in genome sequences within a species such as humans underpins our biological diversity, is the basis for the genetic
contribution to disease, provides information about our ancestry, and is the substrate for evolution.
Genetic variation has a complex structure of shared inheritance from a common ancestor at each position in the genome, with the pattern of sharing changing along the genome as a consequence of genetic recombination.
The scale of data sets that can be obtained from modern sequencing and genotyping methods, currently of the order of hundreds of terabytes, makes analysis computationally challenging. During the last few years, a number of tools such as BWA, Bowtie have been developed for sequence matching based on suffix array derived data structures, in particular the Burrows-Wheeler tranform (BWT) and Ferragina-Manzini (FM) index, which have the nice property that they not only give asymptotically optimal search, but also are highly compressed data structures (they underlie the bzip compression algorithms). I will discuss a number of approaches based on these data structures for primary data processing, sequence assembly, variation detection and large scale genetic analysis, with applications to very large scale human genetic variation data sets.
Richard Durbin (Wellcome Trust Sanger Institute)
Richard Durbin is a Senior Group Leader and joint Head of Human Genetics at The Wellcome Trust Sanger Institute. He is currently co-leading the 1000 Genomes Project to produce a deep catalogue of human genetic variation by large scale sequencing, and the UK10K collaboration to extend sequence based genetics to samples with clinically relevant phenotypes. Previously Richard contributed to the human genome project, and development of the Pfam database of protein families and the Ensembl genome data resource. He has also made theoretical and algorithmic contributions to biological sequence analysis. Richard has a BA in Mathematics, and a PhD in Biology from Cambridge University, where he was also a Research Fellow, at King's College, from 1986 to 1988. He was a Fulbright Visiting Scholar in Biophysics at Harvard University from 1982 to 1983 and a Lucille P Markey visiting Fellow in the Department of Psychology, Stanford University from 1988 to 1990. He was a staff scientist at the MRC Laboratory of Molecular Biology from 1990 to 1996, and was Head of Informatics at the Sanger Institute from 1992-2006 and Deputy Director from 1997 to 2006. He was elected a Fellow of the Royal Society in 2004. Richard's home page can be found at http://www.sanger.ac.uk/research/faculty/rdurbin/