University of Colorado at Boulder
Making Very Large-Scale Linear Algebraic Computations Possible Via Randomization
9:30 – 11:30am Monday, December 07, 2009
The demands on software for analyzing data are rapidly increasing as ever larger data sets are generated in medical imaging, in analyzing large networks such as the World Wide Web, in image and video processing, and in a range of other applications. To handle this avalanche of data, any software used must be able to fully exploit modern hardware characterized by multiple processors and capacious but slow memory. The evolution of computer architecture is currently forcing a shift in algorithm design away from the classical algorithms that were designed for single-processor computers with all the data available in Random Access Memory (RAM), towards algorithms that aggressively minimize communication costs. This tutorial will describe a set of recently developed techniques for standard linear algebraic computations (such as computing a partial singular value decomposition of a matrix) that are very well suited for implementation on multi-core or other parallel architectures, and for processing data stored on disk, or streamed. These techniques are based on the use of randomized sampling to reduce the effective dimensionality of the data. Remarkably, randomized sampling does not only loosen communication constraints, but does so while maintaining, or even improving, the accuracy and robustness of existing deterministic techniques.
Gunnar Martinsson is an assistant professor of applied mathematics at the University of Colorado at Boulder. He graduated from Chalmers University of Technology in Gothenburg, Sweden, in 1998, and earned his PhD in 2002 from the Computational and Applied Mathematics program at the University of Texas at Austin. His research interests include numerical linear algebra, fast solvers for linear PDEs, and applied harmonic analysis. A particular focus of his research is the development of algorithms specifically designed for very large scale problems. Such problems require algorithms whose complexity scales linearly with problem size, that can efficiently exploit parallel processors in a variety of configurations, that can run with the data stored out-of-core or streamed, and that maintain optimal accuracy even for very large problems.