Abstract
The reconstruction of phylogenetic trees from mixed populations has become important in the study of cancer evolution, as sequencing is often performed on bulk tumor tissue containing mixed populations of cells. Recent work has shown how to reconstruct a perfect phylogeny tree from samples that contain mixtures of two-state characters, where each locus/character is either mutated or not. However, most cancers contain more complex mutations, such as copy-number aberrations, that exhibit more than two states.
We formulate the Multi-State Perfect Phylogeny Mixture Deconvolution Problem that reconstructs a multi-state perfect phylogeny tree given mixtures of the leaves of the tree. We characterize the solutions of this problem as a restricted class of spanning trees in a multi-graph constructed from the input data, show NP-hardness, and derive an algorithm to enumerate such trees in the important special case of cladisitic characters. We illustrate applications of our algorithm to simulated and real cancer data.