Abstract
Huge amounts of diverse biomedical data - in particular cancer data - are available, but most established analysis methods were tailored to a single dataset, decreasing their power to detect disease-specific events. Here we take an integrative approach in order to exploit data from multiple studies and across many diseases while correcting for biases arising from the complexity of the data (e.g., different technologies, or cancer subtypes as well as other diseases). For example, we extracted clinically meaningful and reliable disease biomarkers by analyzing more than 14,500 gene expression profiles from more than 180 studies. We detected cancer subtype-specific differential genes by correcting both for biological and disease-ontology related biases. The detected gene sets are highly informative and integrating them with non-expression data (e.g., somatic mutations and biological networks) reveals therapeutic potential.