By Luca Trevisan
In the coming Spring semester, the Simons Institute will host a program on Algorithmic Challenges in Genomics, and a program on Counting Complexity and Phase Transitions.
Computational biology has its roots in the theoretical computer science community, and it has grown into a broad area involving mathematicians, physicists, biologists and computer scientists. The program on Algorithmic Challenges in Genomics, organized by Ming Li, Lior Pachter, Pavel Pevzner and Ron Shamir, will bring a multidisciplinary group of researchers to the Simons Institute to work on theoretical challenges in computational biology, including both modeling issues and algorithmic challenges.
The availability of cheap genome sequencing has generated a large amount of genomic medical data, for which new analysis techniques need to be developed. One focus area will be cancer biology: the practice of sequencing tumor cells, and the need for new approaches to improve diagnosis and to develop personalized therapies make cancer biology an area that is rich both in data and in computational challenges with potentially significant applications.
One goal, on the modeling side, will be to understand how genes are expressed, and to develop models of how DNA sequences are read and ”executed” under different environmental conditions by cells. Another research area covered by the program will be modeling the interactions between cells, taking advantage of the combined presence at the program of wet-lab biologists with deep domain knowledge, and of graph theorists and machine learning experts.
The program on counting complexity, organized by Andrei Bulatov, Jin-Yi Cai, Xi Chen, Martin Dyer and Allan Sly, deals with combinatorial problems in which one is interested in counting the number of solutions, rather that finding a solution, or optimizing a cost function. For example, given a graph, how many proper 3-colorings does it have, or how many independent sets, or how many perfect matchings?
Some counting problems are solvable in polynomial time: for example, one can count the number of perfect matchings in a planar graph by using linear-algebraic methods, and one can reduce other counting problems to the problem of counting perfect matchings in planar graphs by using local reductions. The latter type of algorithms are called holographic algorithms. Most counting problems are #P-complete, meaning that they cannot be solved in polynomial time unless all combinatorial counting problems are also solvable in polynomial time and P=#P (an even more unlikely complexity collapse than P=NP). Of counting problems that are #P-complete, some admit good polynomial time approximation algorithms, while others are also #P-hard to approximate. When they exist, approximation algorithms are often based on an equivalence between the task of generating random solutions almost uniformly and the task of approximating the number of solutions; in turn, the task of generating random solutions can be solved by randomized algorithms that perform a ”random walk” on the space of solutions, where one step of the walk consists in making small local changes to the solution at each step. In order to analyze the mixing time of such random walks, ideas from statistical physics are often useful, and interactions between computer scientists interested in approximate counting algorithms and statistical physicists have been very successful in the past two decades. The analysis of probabilistic processes that are involved in such approximation algorithms is also closely related to threshold phenomena in random structures and in random processes.
The program will involve both researchers who specialize in exact counting and #P-completeness via local reductions, and researchers interested in approximate counting problems and threshold phenomena: the two communities study closely related problems but they do not often have the opportunity to collaborate that the program will offer.