Spring 2014

Evolutionary Biology Seminar

Feb. 25, 2014 10:30 am11:30 am

Add to Calendar


Calvin Lab 116

Information Theory for High Throughput Sequencing
Extraordinary advances in sequencing technology  in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. A key computational problem is that of assembly: how to reconstruct from the many millions of short reads the underlying biological sequence of interest, be it a DNA sequence or a set of  RNA transcripts? Traditionally, assembler design is viewed mainly as a software engineering project, where time and memory requirements are primary concerns while the assembly algorithms themselves  are designed based on heuristic considerations with no optimality guarantee. In this talk, we outline an alternative approach to assembly design based on information theoretic principles. Starting with the question of when there is enough information in the reads to reconstruct, we design near-optimal assembly algorithms that can reconstruct with  minimal amount of read information. We illustrate our approach in two settings: DNA sequencing and RNA sequencing. We report preliminary results from ShannonDNA, a DNA assembler, and  ShannonRNA, a RNA assembler, and compare their performance both with the fundamental limits and with state-of-the-art software in the field.