Invited Talk

New Methods for the Analysis of Genome Variation Data

Richard Durbin

2013 Invited Talk

Abstract

Genetic variation in genome sequences within a species such as humans underpins our biological diversity, is the basis for the genetic contribution to disease, provides information about our ancestry, and is the substrate for evolution.
Genetic variation has a complex structure of shared inheritance from a common ancestor at each position in the genome, with the pattern of sharing changing along the genome as a consequence of genetic recombination.
The scale of data sets that can be obtained from modern sequencing and genotyping methods, currently of the order of hundreds of terabytes, makes analysis computationally challenging. During the last few years, a number of tools such as BWA, Bowtie have been developed for sequence matching based on suffix array derived data structures, in particular the Burrows-Wheeler tranform (BWT) and Ferragina-Manzini (FM) index, which have the nice property that they not only give asymptotically optimal search, but also are highly compressed data structures (they underlie the bzip compression algorithms). I will discuss a number of approaches based on these data structures for primary data processing, sequence assembly, variation detection and large scale genetic analysis, with applications to very large scale human genetic variation data sets.

Chat is not available.