Playlist: 22 videos

From Algorithms to Discovery in Genome-Scale Biology and Medicine

Remote video URL
0:32:45
Andreas Beyer (CECAD, University of Cologne, Germany)
https://simons.berkeley.edu/talks/long-range-propagation-genetic-effects-molecular-networks
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Complex traits are established through the joint influences of multiple genetic and environmental perturbations. There is a shortage of generalizable principles explaining how molecular networks integrate genetic and environmental effects ultimately leading to complex cellular and organismal traits. In particular, it is poorly understood when and how genetic perturbations lead to molecular changes that are confined to small parts of a network versus when they lead to large-scale adaptations of global network states. Here, we present a concept classifying genetic effects as local, regional or global depending on what fraction of a molecular network they affect. We exemplify this notion using transcriptome, proteome and phospho-proteome profiling of genetically heterogeneous populations of yeast strains, which we integrate with an array of cellular traits. Our analysis identified a central gauge of the yeast molecular network that is related to PKA and TOR (PT) signaling. The resulting ‘PT state’ could be summarized in a single value that explained large parts of the molecular configuration of the strains. This PT state associated with a specific balance between cellular processes spanning energy- and amino acid metabolism, transcription, translation, cell cycle control and cellular stress response. Carbon source quality, oxidative stress, and gene-environment interactions caused monotonic shifts of the molecular network state along the same axis. We further show that complex traits like heat stress resistance and longevity (stationary phase viability) result from the synthesis of genetic effects modulating this PT state with global network effects, plus much more trait-specific effects modulating only small parts of the network. Our work provides a rational for the conditions under which genetic effects propagate through molecular networks with pleiotropic consequences.
Visit talk page
Remote video URL
0:36:55
Erich Wanker (Max Delbrueck Center for Molecular Medicine)
https://simons.berkeley.edu/talks/binary-quantitative-interaction-mapping-approach-elucidating-multiprotein-complexes-health-and
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Complementary methods are required to fully characterize multiprotein complexes in vitro and in vivo. Affinity purification coupled to mass spectrometry (MS) can identify the composition of protein complexes at scale. However, information on direct contacts between subunits is often lacking. In contrast, solving the 3D structure of protein complexes by X-ray diffraction or cryo-electron microscopy can provide this information, but is not yet scalable for proteome-wide efforts. We have developed quantitative bioluminescence-based methods that facilitate binary interaction mapping in mammalian cells with sensitivity and specificity. We have applied these technologies to study the associations of huntingtin (HTT), a protein of unknown function at the root of Huntington’s disease. We found that HTT controls the abundance of its partner HAP40 in mammalian cells, suggesting that it functions as a scaffold preventing the degradation of partner proteins in mammalian cells. In another systematic screen, we identified high-confidence binary interactions for proteins of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which subsequently were entered into an in silico compound screening. We discovered a new chemical compound that directly targets the interaction between NSP10 and NSP16, which is critical for virus replication. Finally, we defined partners for the AAA ATPase p97, which interacts with many proteins and plays a functional role in various subcellular processes. We found that p97 associates with splicing regulators in an ASPL-dependent manner, suggesting a functional link between the p97:ASPL complex and mRNA processing. Overall, systematic mapping of direct interactions between proteins in higher-order protein assemblies facilitates a better understanding of cellular and disease processes. Also, high-confidence binary interactions are important drug targets with a high potential for innovation in therapy development.
Visit talk page
Remote video URL
0:32:40
Sourav Bandyopadhyay (UCSF)
https://simons.berkeley.edu/talks/tbd-448
From Algorithms to Discovery in Genome-Scale Biology and Medicine
Visit talk page
Remote video URL
0:38:46
Teresa Przytycka (National Center for Biotechnology Information)
https://simons.berkeley.edu/talks/tbd-449
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Cancer genomes accumulate many somatic mutations resulting from imperfection of DNA processing during normal cell cycle as well as from carcinogenic exposures or cancer related aberrations of DNA maintenance machinery. These processes often lead to distinctive patterns of mutations, called mutational signatures. Considering these signatures as quantitative traits, we can leverage them for studies of the interactions between mutagenic processes, other cellular processes, and environment. Untangling these interactions is critical for understanding the processes underlying mutational signatures and their impact on the organism. I will discuss several computational approaches including a method for the deconvolution of the contributions of DNA damage and repair to the mutational landscape of cancer.
Visit talk page
Remote video URL
0:33:40
Brenda Andrews (University of Toronto)
https://simons.berkeley.edu/talks/tbd-450
From Algorithms to Discovery in Genome-Scale Biology and Medicine

A powerful method to study the genotype-to-phenotype relationship is the systematic assessment of mutant phenotypes using genetically accessible model systems. We have developed and applied methods for quantitative analysis of genetic interactions in double mutants using yeast colony size as a proxy for cell fitness. Our global digenic interaction network reveals a hierarchy of functional modules, including pathways and complexes, bioprocesses and cell compartments. We have also expanded our systematic genetics pipeline to include single cell image-based readouts and arrays of yeast strains expressing GFP-tagged proteins for exploration of proteome dynamics and the effects of genetic perturbations on subcellular compartment morphology. Recently, we have leveraged the principles about genetic networks that we discovered in yeast to map genetic interactions in human HAP1 cells using genome-wide CRISPR/Cas9 screens. Our yeast work guided our selection of query genes to screen and provided a road-map for extraction of functional information from the resulting data. The interactions screened to date include more than 85% of the genes in the human genome that are expressed in HAP1 cells, and as was observed in yeast, interaction profile similarity is highly predictive of gene function. I will describe our results in the context of our ongoing efforts to discover the principles of genetic networks in yeast and apply what we learn to understand the functional organization of the human genome.
Visit talk page
Remote video URL
0:42:30
Tony Capra (UCSF)
https://simons.berkeley.edu/talks/algorithms-inferring-phenotypes-ancient-dna
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Changes in gene regulation were a major driver of the divergence of archaic hominins (AHs)— Neanderthals and Denisovans—and modern humans (MHs). The three-dimensional (3D) folding of the genome is critical for regulating gene expression; however, its role in recent human evolution has not been explored because the degradation of ancient samples does not permit experimental determination of AH 3D genome folding. To fill this gap, we apply deep learning methods for inferring 3D genome organization from DNA sequence to Neanderthal, Denisovan, and diverse MH genomes. Using the resulting 3D contact maps across the genome, we identify 167 distinct regions with diverged 3D genome organization between AHs and MHs. We show that these 3D-diverged loci are enriched for genes related to the function and morphology of the eye, supra-orbital ridges, hair, lungs, immune response, and cognition. Despite these specific diverged loci, the 3D genome of AHs and MHs is more similar than expected based on sequence divergence, suggesting that the pressure to maintain 3D genome organization constrained hominin sequence evolution. We also find that 3D genome organization constrained the landscape of AH ancestry in MHs today: regions more tolerant of 3D variation are enriched for introgression in modern Eurasians. Finally, we identify loci where modern Eurasians have inherited novel 3D genome folding from AH ancestors, which provides a putative molecular mechanism for phenotypes associated with these introgressed haplotypes. In summary, our application of deep learning to predict archaic 3D genome organization illustrates the potential of inferring molecular phenotypes from ancient DNA to reveal previously unobservable biological differences.
Visit talk page
Remote video URL
0:26:55
Martin Kircher (BIH @ Charité / University of Luebeck)
https://simons.berkeley.edu/talks/predicting-deleteriousness-genomic-variants-big-and-small
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Approaches for the identification of disease causal mutations are widely applied in research and clinical settings, but interpretation and ranking of the resulting variants remains challenging. Combined Annotation Dependent Depletion (CADD, https://cadd-sv.bihealth.org/) integrates annotations by contrasting variants that survived purifying selection along the human lineage with simulated mutations to score short sequence variants (SNVs, InDels, multi-allelic substitutions). Since its publication (Kircher, Witten et al. Nat Genet. 2014), CADD was well adopted by the community and minor adjustments and fixes were released since, including the native support of both GRCh37 and GRCh38 assemblies (Rentzsch et al. NAR 2019). Recently, we assessed existing deep neural network (DNN) models for splice effects with the Multiplexed Functional Assay of Splicing using Sort-seq dataset (MFASS, Cheung et al. Mol Cell. 2019). We selected two DNN models based only on genomic sequence, MMSplice and SpliceAI, which showed the best performance for integration into CADD (Rentzsch et al. Genome Med. 2021). The DNN scores boosted CADD's predictions for splice effects and we noted that while the DNN scores have superior performance on splice variants, they fail to account for nonsense and missense effects of the same variants. This suggests that variant prioritization will improve with more domain-specific information and underlines the importance of identifying additional such features, e.g. for regulatory sequences. With rapid advances in the identification of structural variants (SVs), we decided to apply the general concept of CADD to score them (CADD-SV, https://cadd-sv.bihealth.org/). While methods utilizing individual mechanistic principles like the deletion of coding sequence or 3D architecture disruptions were available, a comprehensive tool that uses the broad spectrum of available SV annotations was missing. We show that CADD-SV scores are predictive of pathogenicity and population frequency and that CADD-SV's ability to prioritize pathogenic variants exceeds that of existing methods like SVScore and AnnotSV (Kleinert & Kircher, Genome Res. 2022). Our results highlight advantages of the CADD approach, like profiting from a large training data set covering diverse and rare feature annotations without major ascertainment effects from historic and on-going variant collections.
Visit talk page
Remote video URL
0:36:0
Gil Ast (Tel Aviv University)
https://simons.berkeley.edu/talks/how-genome-3d-organization-regulates-alternative-splicing
From Algorithms to Discovery in Genome-Scale Biology and Medicine

How the splicing machinery defines exons or introns as the spliced unit has remained a puzzle for 30 years. Here, we demonstrate that peripheral and central regions of the nucleus harbor genes with two distinct exon-intron GC content architectures that differ in the splicing outcome. Genes with low GC content exons, flanked by long introns with lower GC content, are localized in the periphery, and the exons are defined as the spliced unit. Alternative splicing of these genes results in exon skipping. In contrast, the nuclear center contains genes with a high GC content in the exons and short flanking introns. Most splicing of these genes occurs via intron definition, and aberrant splicing leads to intron retention. We demonstrate that the nuclear periphery and center generate different environments for the regulation of alternative splicing and that two sets of splicing factors form discrete regulatory subnetworks for the two gene architectures. Our study connects 3D genome organization and splicing, thus demonstrating that exon and intron definition modes of splicing occur in different nuclear regions.
Visit talk page
Remote video URL
0:24:45
David Knowles (Columbia University/New York Genome Center)
https://simons.berkeley.edu/talks/determining-molecular-intermediates-between-genotype-and-phenotype
From Algorithms to Discovery in Genome-Scale Biology and Medicine

I will describe two projects that aim to better dissect the causal chain from functional genetic variant through molecular intermediates and finally to organismal trait or disease risk. In the first, we are using pooled profiling of RNA binding protein (RBPs, splice factors) binding across individuals to measure and then computationally model genetic effects on both binding and RNA splicing. In the second, we have developed a causal network inference method that scales to hundreds of nodes by leveraging convex optimization.
Visit talk page