Calvin Lab 116
Multi-State Perfect Phylogeny: Modeling Infinite Alleles
A binary perfect phylogeny is the graph-theoretic analog of the basic coalescent under the infinite sites model of mutation. It generates binary sequences, obeying the infinite sites assumption, but does not involve any statistical model.
A multi-state perfect phylogeny is the graph-theoretic analog of the basic coalescent under the infinite alleles model of mutation. The infinite alleles modeled population genetic data before the SNP-era when the infinite sites model became dominant. However, the infinite alleles model has again become appropriate with the growth of sequence data (four states), and other types of complex locus data, with potentially a huge number of states.
The multi-state perfect phylogeny has been studied in the computer science literature, but is almost unknown in the population genetics community. In comparison to the binary perfect phylogeny problem, the multi-state perfect phylogeny problem is algorithmically much more complex, and yet connects to a more elegant, deeper body of mathematics.
In this talk, I will introduce the multi-state perfect phylogeny and discuss how it connects to chordal graph theory, minimal triangulation, 2-SAT, and integer linear programming. I will also discuss generalizations of the classic four-gametes condition, which characterizes the basic coalescent without recombination (equivalently a binary perfect phylogeny), to conditions that characterize multi-state perfect phylogenies, and I will identify open questions in this field.
The basic papers are:
D. Gusfield, "The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer Linear Programming and Chordal Graph Theory", J. of Computational Biology, Vol. 17, no. 3, 2010, p. 383-399
F. Lam, D. Gusfield, S. Sridhar, "Generalizing the Splits Equivalence Theorem and Four Gamete Condition: Perfect Phylogeny on Three State Characters", SIAM J. on Discrete Math. Vol. 25, No. 3, pp. 1144-1175, 2011