Evolutionary Biology Seminar

Parent Program

Evolutionary Biology and the Theory of Computing

Location

Calvin Lab 116

Speaker(s)

David Tse (Stanford University)

Date

Tuesday, Feb. 25, 2014

Time

10:30 – 11:30 a.m. PT

Back to calendar

Description

Information Theory for High Throughput Sequencing

Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. A key computational problem is that of assembly: how to reconstruct from the many millions of short reads the underlying biological sequence of interest, be it a DNA sequence or a set of RNA transcripts? Traditionally, assembler design is viewed mainly as a software engineering project, where time and memory requirements are primary concerns while the assembly algorithms themselves are designed based on heuristic considerations with no optimality guarantee. In this talk, we outline an alternative approach to assembly design based on information theoretic principles. Starting with the question of when there is enough information in the reads to reconstruct, we design near-optimal assembly algorithms that can reconstruct with minimal amount of read information. We illustrate our approach in two settings: DNA sequencing and RNA sequencing. We report preliminary results from ShannonDNA, a DNA assembler, and ShannonRNA, a RNA assembler, and compare their performance both with the fundamental limits and with state-of-the-art software in the field.

All scheduled dates:

Upcoming

No Upcoming activities yet

Evolutionary Biology Seminar

All scheduled dates:

Upcoming

Past