Abstract
We are faced with a flood of molecular and clinical data. Various biomolecules interact in a cell to perform biological function, forming large, complex systems. Large amounts of patient-specific datasets are available, providing complementary information on the same disease type. The challenge is how to mine these complex data systems to answer fundamental questions, gain new insight into diseases and improve therapeutics. Just as computational approaches for analyzing genetic sequence data have revolutionized biological understanding, the expectation is that analyses of networked “omics” and clinical data will have similar ground-breaking impacts. However, dealing with these data is nontrivial, since many questions we ask about them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions.
We develop methods for extracting new biomedical knowledge from the wiring patterns of large networked biomedical data, linking network wiring patterns with function and translating the information hidden in the wiring patterns into everyday language. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and re-purposing of drugs for treating particular cancer patient groups. Our new methods stem from network science approaches coupled with graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets.