# Abstracts

### Monday, February 1st, 2016

9:15 am9:45 am
In the first part of my talk I will present my view on the state of somatic mutation calling from short reads. I will present results from our effort to experimentally validate somatic mutation calls from 10 different software pipelines on 50 WGS tumour/normal pairs as part of the ICGC/TCGA "Pan-Cancer" project. I will use these results to highlight the strengths of current approaches as well as areas that need further improvement.

In the second part of my talk I will discuss the prospects for using very long sequence reads from nanopore sequencers in cancer genomics. In addition to describing the unique opportunities that nanopore sequencing provides, I will highlight algorithmic and scaling problems that will need to be addressed for this technology to become widely used.
9:45 am10:15 am

Cancer progression is an evolutionary process characterized by the accumulation of mutations and responsible for tumor growth, clinical progression, and drug resistance development. Evolutionary theory can be used to describe the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular profiling data. We present recent approaches to modeling the evolution of cancer, including population genetics models of tumorigenesis, phylogenetic methods of intra-tumor subclonal diversity, and probabilistic graphical models of tumor progression.

11:00 am11:30 am

No abstract available.

11:30 am12:00 pm

Huge amounts of diverse biomedical data - in particular cancer data - are available, but most established analysis methods were tailored to a single dataset, decreasing their power to detect disease-specific events. Here we take an integrative approach in order to exploit data from multiple studies and across many diseases while correcting for biases arising from the complexity of the data (e.g., different technologies, or cancer subtypes as well as other diseases). For example, we extracted clinically meaningful and reliable disease biomarkers by analyzing more than 14,500 gene expression profiles from more than 180 studies. We detected cancer subtype-specific differential genes by correcting both for biological and disease-ontology related biases. The detected gene sets are highly informative and integrating them with non-expression data (e.g., somatic mutations and biological networks) reveals therapeutic potential.

2:00 pm2:30 pm
Recent whole-exome sequencing studies have identified recurrent somatic mutations in splicing factor genes across multiple cancer types, supporting the need to globally characterize splicing alterations across human cancers. Through the integration of mRNA, whole-exome sequencing, and whole-genome sequencing data from The Cancer Genome Atlas and International Cancer Genome Consortium, we are identifying RNA splicing alterations across ~10,000 cancer transcriptomes and investigating the underlying somatic mutations that cause these splicing alterations. We have further developed a computational pipeline called JuncBASE to identify and quantify alternative splicing in RNA-Seq data, which incorporates unannotated splicing events in the analysis.

In initial studies, we have identified altered splicing events significantly associated with mutations in the splicing factors U2AF1 and SF3B1 and found that these mutations cause altered recognition of 3’ splice site sequences. Current work aims to associate transcriptome changes with somatic mutations at splice sites and proximal intronic regions. As a result, we have identified somatic mutations associated with expression of oncogenic isoforms of MET and ERBB2. Our work highlights the importance of including novel splicing events in cancer transcriptome analysis as aberrant transcripts can be expressed due to somatic mutations.
2:30 pm3:00 pm

No abstract available.

3:30 pm4:00 pm

In this short talk, we will present algorithmic approaches to questions pertaining to the identification of Breakage Fusion Bridge, Chromothripsis, episome formation, and other mechanistic explanations for amplification in the tumor genome.

### Tuesday, February 2nd, 2016

9:00 am9:30 pm
Cancer development is a multi-step process that leads to uncontrolled
tumor cell growth. Multiple signaling cascades are involved, some are
activated while other pathways are suppressed. To fathom these processes,
biomedical researchers use models of biological systems to integrate
diverse types of information. This ranges from multiple high-throughput
datasets and functional annotations to expert knowledge about biochemical
reactions and biological pathways. Such integrative systems are used to
develop new hypotheses and answer complex questions in precision medicine
such as what factors cause disease; which patients are at high risk; will
patients respond to a given treatment; how to rationally select a
combination therapy to individual patient, etc.

Precision medicine needs to be data-driven, and corresponding analyses
comprehensive and systematic. We will not find new treatments if only
testing known targets and studying characterized pathways. Thousands of
potentially important proteins remain poorly characterized. Computational
biology methods can help fill this gap with accurate predictions, making disease modeling more comprehensive. Intertwining computational prediction and modeling with biological experiments will lead to more useful findings faster and more economically.

These computational predictions already significantly improved human interactome coverage relevant to both basic and cancer biology, and importantly, helped us to identify, validate and characterize prognostic signatures, and identify potential novel treatments. Combined, these results may lead to unraveling mechanism_of_action for therapeutics, re-positioning existing drugs for novel use and, prioritizing multiple
candidates based on predicted toxicity, identifying groups of patients that may benefit from treatment and those where a given drug would be ineffective.

9:30 am9:45 am

This talk will focus on the combinatorial method LICHeE, designed to efficiently reconstruct multi-sample cancer cell lineage trees and infer the subclonal composition of tumor samples using somatic single nucleotide variants (SSNVs). Given a set of validated deeply sequenced SSNVs from multiple normal and tumor samples of individual cancer patients, LICHeE uses the presence patterns and variant allele frequencies (VAFs) of SSNVs across the samples as lineage markers by relying on the perfect phylogeny model (PPM), which assumes that mutations do not recur independently in different cells. This assumption allows us to formulate a set of SSNV ordering constraints, which are leveraged by LICHeE to limit the search space of the possible underlying lineage trees and evaluate the validity of the resulting topologies. In particular, LICHeE’s key strategy is to encode all possible precedence relationships among clusters of SSNVs into an evolutionary constraint network, which embeds all possible valid lineage trees and allows us to formulate the task of inferring such trees as a search for all spanning trees satisfying the derived PPM-based constraints. Due to this substantial reduction in search space, LICHeE can process large SSNV datasets in seconds. As a result, LICHeE reports a set of lineage trees that are fully consistent with the SSNV presence patterns and VAFs within each sample under PPM. Given each such tree, LICHeE also provides estimates of the subclonal mixtures of the samples by inferring sample heterogeneity simultaneously with phylogenetic cell lineage tree reconstruction. LICHeE’s effectiveness has been demonstrated on several large recently published ultra-deep-sequencing multi-sample datasets, as well as on simulated datasets. For more information, please see the following publication:  Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R.B. and Batzoglou, S., 2015. Fast and scalable inference of multi-sample cancer lineages. Genome biology, 16(1), p.91.

9:45 am10:15 am

No abstract available.

11:00 am11:40 am

No abstract available.

11:40 am11:55 am
Cancer is a disease driven in part by alterations to key signaling pathways. However, our knowledge of these pathways remains incomplete.  We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify multiple combinations of alterations that exhibit a pattern of mutual exclusivity across individuals, a pattern often observed for alterations in the same pathway. CoMEt includes two key innovations. First is an exact statistical test for mutual exclusivity with a novel enumeration procedure for computing the corresponding tail probability. Second is a stochastic algorithm to perform simultaneous analysis of multiple sets of mutually exclusive and subtype-specific alterations. We demonstrate that CoMEt outperforms existing approaches on simulated and real data. We apply CoMEt to five different cancer types, identifying both known cancer genes and pathways, and novel putative cancer genes.
2:00 pm2:30 pm
Next-generation sequencing technologies allow the measurement of somatic mutations in a large number of patients from the same cancer type. One of the main goals in the analysis of these mutations is the identification of mutations associated with clinical parameters, for example survival time. This goal is complicated by the extensive genetic heterogeneity of mutations in cancer, due to the fact that genes and mutations act in the context of subnetworks or pathways.

We propose a novel algorithm, NoMAS, that directly finds subnetworks of a large gene-gene interaction network with mutations associated with survival. NoMAS employs a score for subnetworks based on the test statistic of the log-rank test, a widely used statistical test for survival analysis. We tested NoMAS on simulated and cancer data, comparing it to approaches based on single gene scores and to various greedy approaches. Our results show that NoMAS performs better than other approaches and identifies subnetworks with significant association to survival while none of the genes has significant association with survival when considered in isolation.
2:30 pm3:00 pm
Studying different angles of cancer pathogenesis and development benefits from measuring related samples at different molecular levels. I will describe studies that involve the measurement of serum glycomics, miRNA and transcriptomics. Synthetic oligonucleotide libraries are being used in molecular measurement for several years in the context of sample enrichment for sequencing. More applications that support the understanding of cellular regulation in different contexts are now emerging and will also be described. In the examples I will emphasize the use of statistics for ranked lists to enhance the data interpretation and to improve the knowledge extraction.

3:30 pm3:45 pm
Problems of genome rearrangement are central in both evolution and cancer. Most evolutionary scenarios have been studied under the assumption that the genome contains a single copy of each gene. In contrast, tumor genomes undergo deletions and duplications, and thus the number of copies of genes varies. The number of copies of each gene along a chromosome is called its copy number profile. Understanding copy number profile changes can assist in predicting disease progression and treatment. To date, questions related to distances between copy number profiles gained little scientific attention. We focus on the following problem, introduced by Schwarz et al. (PLOS Comp. Biol., 2014): given two copy number profiles, u and v, compute the edit distance from u to v, where the edit operations are segmental deletions and amplifications. We establish the computational complexity of this problem, showing that it is solvable in linear time and constant space.

3:45 pm4:00 pm
Cancer is a disease of evolution whose process is characterized by the accumulation of somatic alterations to the genome, which selectively make a cancer cell fitter to survive. The understanding of progression models for cancer, i.e., the identification of sequences of mutations that leads to the emergence of the disease, is still unclear. The problem of reconstructing such progression models is not new; in fact several methods to extract progression models from cross-sectional samples have been developed since the late 90s.

In the past two years and a half, we have proposed two novel algorithms called CAPRESE (CAncer PRogression Extraction with Single Edges) and CAPRI (CAncer PRogression Inference) to reconstruct models of the sequences of mutations accumulation, which characterize cancer evolution. To the best of our knowledge, the existing techniques are based either on correlation or on maximum likelihood. Differently, we perform the reconstruction by exploiting the notion of probabilistic causation in the spirit of Suppes’ causality theory.  We note that in the context of biological systems and cancer progression, the notion of causality can be interpreted as the notion of "selective advantage" of the occurrence of a mutation.

In this setting, we prove the correctness of our algorithms and characterize their performance. Finally we discuss how our R BioConductor package TRanslational ONCOlogy (TRONCO) is being used on real cancer datasets - e.g. Atypical Myeloid Chronic Leukemia (aCML), Colorectal Cancer (CRC), et al. - and how it highlights possibly biologically significant patterns in the progressions inferred.

Web Sites:

### Wednesday, February 3rd, 2016

9:00 am9:30 am
High-throughput data that are being collected on tumor samples create a massive opportunity to uncover the inner workings of tumor cells, how they interact with tissue microenvironments, and why they respond or resist treatments. While we have gleaned a great deal of insight from such troves of data and have made important inroads, several challenges remain. First, sequencing projects continue to produce huge lists of events (mutations, deletions, fusions, etc) of unknown significance. Methods for integrating and interpreting the findings using genetic pathway information will be discussed.

Second, the collection of integrated data and the formation of related patient subtypes from these data has uncovered new connections both within and across tumor subtypes. The tissue-of-origin of a tumor has been shown to be the strongest signal in nearly all platforms of data. However, several cross-tumor connections have already been identified with important implications for patient outcomes that I will present. New methods are needed to look beyond the tissue to probe other important aspects of tumor biology including the initiating and driving oncogenic processes, the original cell type that was transformed by the genetic assaults, and the impacted genetic pathways that might provide clues about treatment. I’ll discuss an approach to identify differentiation and de-differentiation status in minor but important fractions of tumor samples that may provide key information for understanding response. Rather than use out-of-the-box machine-learning methods for training signatures, I’ll discuss the development of biologically-motivated methods that seek solutions that match our current understanding of genetic circuitry.

Finally, the dearth of data should inform the biology of each patient individually, however very little tools exist for leveraging large compendia for an individual patient’s use. Each patient can have a unique combination of genetic alterations, gene expression changes, hypo- or hyper-methylated enhancers, and so on. At the same time, we can often identify a set of samples that share common properties (e.g. mRNA or miRNA transcriptional profiles) with a particular patient of interest. The question remains how best to use a comprehensive collection of data to maximally enlighten our understanding of even one tumor sample. Clearly, using the statistical power of the large collection is necessary to bring robustness and confidence to the investigation. On the other hand, ignoring patient- and specimen-specific properties (e.g. newly acquired mutations after treatment) could over-generalize and miss important differences in a particular patient, not shared by the many, that could uncover druggable opportunities not otherwise apparent. Thus, new computational methods are needed that strike the right balance for interpreting each “N-of-1” case. I’ll discuss approaches we are developing for visualizing and inferring patient-specific networks.

9:30 am9:45 am

We present a novel method for detecting and genotyping somatic structural variations (SVs) in multiple whole-genome sequencing (WGS) tumor samples, taken from a cancer patient. In contrast to standard SV discovery approaches in cancer genomes, which do not leverage phylogenetic information, we make use of the multi-sample lineage tree structure reconstructed from ultra-deep sequencing somatic SNV datasets. We demonstrate that leveraging lineage trees boosts sensitivity in detecting and genotyping of SVs. Our method effectively pools samples that share a common ancestor in the tree and finds clusters of discordant paired-end reads that suggest the same SV breakpoint across these samples. Placement of SVs onto specific branches of the lineage tree results in a more comprehensive roadmap of the tumor's genome evolution that begins at the zygote.

9:45 am10:15 am
In this talk I will discuss the major determinants of inferring fitness landscapes of cancers.  These include genotype to phenotype inference, clonal population dynamics and the impact of tumor microenvironments.  I will discuss computational methods and exemplifications on patient derived datasets pertaining to each of these concepts.  Specific methods using hierarchical Bayes' based probabilistic models to address inference of these properties of cancer will be presented (including published and unpublished work).  I will synthesize these methods into a view that treats a cancer as an evolving and dynamic system whose properties can be variously measured and modeled with advances in high dimensional technologies.   Example studies on breast cancer patient derived xenografts, ovarian cancer multi-site analysis and evolutionary dynamics of follicular lymphoma will be discussed.

11:00 am11:40 am
Classically tumor evolution is considered to be driven by serial acquisition of mutations that increase the fitness of the tumor and evolve its phenotypes until a malignant lesion develops.  I will discuss data and analysis from kidney (and other) cancers that challenges the canonical view. Specifically, genomic data from many sequencing projects is challenging this paradigm, identifying cancers that do not appear to have driving mutations (at least as they are classically understood), identifying clear examples of branched evolution, and examples of sub-clones that appear to have synergistic fitness.

11:40 am11:55 am
We present a computational strategy to simulate drug treatment in a personalized setting. The method is based on integrating patient mutation and differential expression data with a protein-protein interaction network. We test the impact of in-silico deletions of different proteins on the flow of information in the network and use the results to infer potential drug targets. We apply our method to AML data from TCGA and validate the predicted drug targets using known targets. To benchmark our patient-specific approach, we compare the personalized setting predictions to those of the conventional setting. Our predicted drug targets are highly enriched with known targets from DrugBank and COSMIC (outperforming the non-personalized predictions. Finally, we focus on the largest AML patient subgroup (~30%) which is characterized by an FLT3 mutation, and utilize our prediction score to rank patient sensitivity to inhibition of each predicted target, reproducing previous findings of in-vitro experiments.

2:00 pm2:30 pm
Genetic diversification and clonal selection are thought to underlie tumor progression and resistance, but their dynamics are poorly characterized. Although direct observations of human tumor growth are impractical, tumors encode their ancestries in the form of somatic alterations acquired during cell division, which can be exploited through multi-region sampling to infer their evolutionary trajectories. Recently, we described a novel ‘Big Bang’ model of primary human colorectal tumor growth, whereby after transformation, the neoplasm grows predominantly as a single expansion producing numerous intermixed sub-clones and where the timing of a mutation is the fundamental determinant of its frequency in the final tumor. This new model is compatible with effectively neutral evolution and explains the origins of intra-tumor heterogeneity and the dynamics of colorectal tumor growth with implications for earlier detection, treatment resistance and metastasis.  I will describe extensions of this work and our spatial computational model of tumor growth and statistical inference framework to delineate mechanisms of tumor progression and the importance of accounting for tumor growth dynamics when inferring tumor subclonal architecture.

2:30 pm3:00 pm
Here we will consider some complexity issues related to rearrangements, and show that for a wide range of rearrangement types, the size of the space of possible structures scales approximately as (2x)^(n(n-1)/2), where x are the number of breakpoints formed by each rearrangement and n is the number of rearrangements that take place. We also show that many clusters of rearrangements cannot be explained by standard rearrangements, but may be explained by break induced replication.

3:30 pm4:00 pm
This talk will examine the problem of inferring phylogenetic trees to model tumor evolution at the single-cell level via copy number variations (CNVs).  We will focus primarily on inference from fluorescence in situ hybridization (FISH) data, which provides copy number counts for small numbers of probes in potentially large populations of cells extracted from single tumors.  We will consider a series of models and algorithmic developments, working from simple to increasingly realistic representations of the mechanisms by which tumors evolve via copy number variation.  We will simultaneously explore applications of these methods to a variety of tumor types, including breast, cervical, prostate, and head-and-neck.  In the process, we will see how successive algorithmic developments lead to increasingly accurate models of tumor evolution and, in turn, to ever greater power to classify tumors and predict their future progression.  We will, finally, consider emerging directions in extending methods for phylogenetics of CNVs and other structural variations to increasingly complex evolutionary models and data sources.

### Thursday, February 4th, 2016

9:00 am9:30 am
One of the most striking outcomes of the cancer genome projects is that tumors of similar types and clinical responses can have patterns of mutations that are strikingly different. Despite these differences, it is becoming very clear that tumor alterations hijack the same hallmark molecular pathways and networks. Thus, a complete vision for cancer precision medicine requires not only genome sequencing, but the ability to interpret these genomes against knowledge of cancer molecular networks. We are developing a range of approaches that use molecular networks to analyze tumors, with the goal of classifying tumors into biologically meaningful subtypes and identifying subnetworks underlying these subtypes. We are also exploring means by which network data can be transformed from flat graphical representations of gene-gene interaction to infer a gene ontology, i.e. the multi-scale  hierarchy of components and processes comprising the cell. Recently, we and others have found that a large hierarchy  of cellular components can be assembled directly from analysis of molecular networks. These ‘network-extracted’ ontologies, which we call NeXO, closely resemble, and in some cases greatly revise and expand, the literature-curated Gene Ontology. The ability to create data-driven gene ontologies such as NeXO opens the possibility of creating ontology models of the cell that are specific to different cell types and diseases such as cancer.
9:30 am9:45 am
The reconstruction of phylogenetic trees from mixed populations has become important in the study of cancer evolution, as sequencing is often performed on bulk tumor tissue containing mixed populations of cells. Recent work has shown how to reconstruct a perfect phylogeny tree from samples that contain mixtures of two-state characters, where each locus/character is either mutated or not. However, most cancers contain more complex mutations, such as copy-number aberrations, that exhibit more than two states.

We formulate the Multi-State Perfect Phylogeny Mixture Deconvolution Problem that reconstructs a multi-state perfect phylogeny tree given mixtures of the leaves of the tree. We characterize the solutions of this problem as a restricted class of spanning trees in a multi-graph constructed from the input data, show NP-hardness, and derive an algorithm to enumerate such trees in the important special case of cladisitic characters. We illustrate applications of our algorithm to simulated and real cancer data.
9:45 am10:15 am
To improve the effectiveness of cancer drug combination therapy, accounting for intratumor heterogeneity is critical, particularly as the tumor evolves under exposure to different drug regimes.  We aim to identify optimal cancer drug combinations based on the single cell characterization of an individual tumor response to a panel of monotherapies.  Using state-of-the-art single cell technology, namely mass cytometry, we quantify changes in the expression of intracellular markers  before and after treatment for approximately 30 markers per cell under each drug.  We discuss methods to integrate this data in order to identify and compare the drug combination strategies that are optimized under a variety of objective functions.  This strategy for optimizing drug combinations can complement genomics based strategies for a more comprehensive approach to personalized treatment.

11:00 am11:40 am
The emergence of resistance to cancer therapy remains a pressing challenge and has led to several major experimental efforts aiming to identify individual molecular signatures of resistance to specific cancer drugs. Here we describe a comprehensive computational framework for identifying the molecular pathways underlying cancer resistance, accounting for many of these results. Our approach, termed INCISOR, is applied to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patients’ data, to identify a class of genetic interactions termed synthetic rescues (SR). An SR denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal but the subsequent altered activity of its partner rescuer gene restores cell viability. Applying INCISOR to the TCGA identifies the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTORWe find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients’ survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance.
11:40 am11:55 am
Within a given form of cancer, two patients usually do not share the same set of mutations and can even have none in common. This lack of similarity makes it difficult to compare tumour mutation profiles. To overcome this problem, methods have been proposed to identify shared disrupted pathways. Notably, diffusion processes over gene-gene interaction networks have been used to identify shared mutated subnetworks. We will focus on the prognostic power of mutation profiles and show that diffusion can greatly enhance survival predictions compared to the raw tumour sequences. We will also present a simpler method that restricts the diffusion to the neighbours of mutated genes, and show that it performs at least as well as fully diffused mutation profiles. Finally, for this new method as well as for the existing ones, we will highlight the importance of a normalisation step that virtually reduces all patients to have the same number of mutations.

2:00 pm2:30 pm
Tumor DNA carries numerous alterations, including somatic point mutations, amplifications, and deletions. It is challenging to identify the disease-causing alterations from the plethora of random ones, and to delineate their functional relations and involvement in common pathways.

One solution for this task involves analysis of mutual exclusivity patterns of gene alterations in tumor data. On the one hand, this is inspired by the observation that genes from the same cancer pathway tend not to be altered together in each patient, and thus form patterns of mutually exclusive alterations across patients. Mutual exclusivity may arise, because alteration of only one pathway component is sufficient to deregulate the entire process. Detecting and evaluating statistical significance of such patterns is an important step in de novo identification of cancerous pathways and potential treatment targets.

Another reason for mutual exclusivity of two gene mutations is that they are in a synthetic lethal interaction. Synthetic lethality occurs when the co-inactivation of two genes results in cellular death, while inactivation
of each individual gene is viable. Synthetic lethality can be exploited in cancer therapy. For cancer patients, one inactivation already occurs via the endogenous mutation of a specific gene in the tumor cells, and not in the normal cells. Thus, applying a drug that targets the synthetic lethal partner of that gene will selectively kill cancer cells, leaving the rest viable.

In this talk, I will first present our probabilistic, generative model of mutually exclusive patterns accounting for observation errors, with efficient algorithms for parameter estimation and pattern ranking, together with a statistical test for mutual exclusivity. Second, I will present our previous and current methods for detecting synthetic lethality from tumor genomic data.

2:30 pm3:00 pm
I will discuss mechanisms of therapeutic resistance in advanced castrate resistance prostate cancer and, in this context, the challenges and promise of precision oncology. I will present data from cell free DNA (liquid biopsy) studies that reveal dynamic genome landscapes that occur in response to therapy and discuss how this information can be used to understand tumor heterogeneity and evolution and to improve patient management. These dynamic landscapes are defined by amplification or mutation of the androgen receptor and other genes. In addition, I will discuss neuroendocrine transdifferentiation (state change) as a mechanism of therapeutic resistance in prostate cancer. Finally, I will discuss the value of patient derived cancer xenografts in “evidence-based” precision oncology.

3:30 pm4:00 pm
Pathway-centric approaches have emerged as methods that can empower studies of inter-tumor heterogeneity.  In addition, the data gathered by the Pan-Cancer initiative has created an unprecedented opportunity for illuminating common features across different cancer types. We combined across-cancer mutual exclusivity with interactions data to uncover pan-cancer dysregulated pathways. Our new method, Mutual Exclusivity Module Cover (MEMCover) not only identified previously known Pan-Cancer dysregulated subnetworks but also novel subnetworks whose across cancer role has not been appreciated well before. In addition, we demonstrate the existence of mutual exclusivity hubs, putatively corresponding to cancer drivers with strong growth advantages. Finally, we show that while mutually exclusive pairs within or across cancer types are predominantly functionally interacting, the pairs in between cancer mutual exclusivity class are more often disconnected in functional networks.

### Friday, February 5th, 2016

9:00 am9:30 am
Rapid advancement in high throughput sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic,  transcriptomic and proteomic data from the same tissue sample. We recently developed a computational framework which, for the first time, can integratively analyze all three types of omics data to obtain a complete molecular profile of same tissue in normal vs disease conditions. Our framework includes a  computational method to identify micro structural variants (microSVs) by jointly analyzing matching whole genome sequencing (WGS) and RNA-Seq data. Our framework, coupled with deFuse, our gene fusion detection method, can provide an accurate profile of structurally aberrant transcripts, commonly observed in tumor samples. Given the genomic breakpoints, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures in the respective proteomics data sets.

When used together with CITUP and CTP-single, our WGS based clonal composition inference methods, our computational framework can help identify clone-specific expressed structural alterations in a given tumor sample.
We perform further systemic analysis of such expressed variants with HIT'nDRIVE, our combinatorial method to identify (structurally) aberrant genes that can collectively influence possibly distant outlier'' genes based on what we call the random-walk facility location'' (RWFL) problem on a protein or gene interaction network. These influential structurally altered genes have been shown to play prominent roles in tumor evolution as potential drivers of the observed cancer phenotype.
9:30 am9:45 am

Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.

9:45 am10:15 am

Alternative isoform usage is known to be have an important impact in some genes related to cancer progression. Mutations of genes in the splicesome have also been reported in some cancers, which could result in widespread alteration in the splicing patterns in tumors. I will discuss in my talk methods we have developed for detecting patterns based on alternative splicing in cancer. The first approach focuses on the question of finding differential expression in splicing patterns between known groups. The other focuses on clustering methods to find clusters that are characterized by differences in alternative splicing.

11:00 am11:30 am
Pan-cancer analyses of somatic mutations and copy number aberrations have confirmed that the same genes or pathways are often altered across multiple tumor types.  There is great interest in deploying targeted therapies in a pan-cancer manner, matching pathway-targeted drugs to the mutational profile of the tumor regardless of cancer type.  However, ‘actionable mutations’ in different tumor types interact with distinct cancer-specific gene regulatory programs and signaling networks and occur against different genetic backgrounds of co-occurring alterations.  To better model the context-dependent role of somatic alterations, we applied a novel computational strategy for integrating parallel phosphoproteomic and mRNA sequencing data across 12 TCGA tumor data sets, linking dysregulation of upstream signaling pathways with altered transcriptional response.  We then developed a statistical approach to interpret the impact of mutations and copy number events in terms of functional outcomes such as altered signaling and transcription factor (TF) activity.  Our analysis revealed both known and novel transcriptional regulators downstream of oncogenic pathways and identified potential synergies between co-occurring mutations.  These results have implications for the applying targeted drugs across cancer contexts and potentially for the design of combination therapies.

11:30 am12:00 pm

Data-driven approaches to molecular classification of cancer patients for diagnosis, prognosis or drug response prediction is often challenging due to the high dimensionality of omics data, resulting in suboptimal performance in prediction and difficulty to identify robust biomarkers. A possible strategy to overcome this issue is to replace the input omics data by simpler representations more amenable to statistical learning. In this talk I will discuss two recent attempts to represent high-dimensional omics profiles by simpler, rank-based representations: one based on full-quantile normalization, where the target distribution is optimized to solve the learning problem, and one based on all pairwise comparisons, which leads to efficient learning with kernel methods. This is joint work with Marina Le Morvan and Yunlong Jiao.