Abstract
We are currently working on developing and applying spectral learning algorithms to epigenetics data. Recently, international consortia such as ENCODE and Roadmap Epigenomics have released massive epigenetics data sets from hundreds of human cell types with the aim of interpreting Genome-wide Association Studies for many human diseases. To analyze this data, we have implemented and extensively tested spectral algorithms for HMMs in our Spectacle software and found that they have significantly improved run time and biological interpretability compared to the EM algorithm. This is particularly important when the underlying classes are highly imbalanced, a pervasive issue in biology. To model multiple cell types, we developed novel spectral algorithms for tree structured HMMs and show that the tree model further improves our prediction of functional elements in the genome.