The rapid progress in technologies to automatically collect genetic and phenotypic information on living systems at all scales (from molecules to cells, to organisms, to ecosystems) offers a great opportunity to understand life at an unprecedented level of detail. Extracting meaningful and reliable biological information from the analysis of the resulting datasets that are ever-increasing in size and also in complexity (e.g., dependence structure, technical noise, sparsity) poses great computational and statistical challenges. Some of these challenges arise from
- The increasing capacity, throughput, and read length of deep sequencing technologies (e.g., Illumina, Nanopore, 10x Genomics, Pacific Biosciences) for bulk and single-cell DNA and RNA.
- The launching of very large-scale projects to describe the many dimensions of biological diversity at the molecular level. These include, among others:
- The Human Cell Atlas (HCA), aiming to monitor the RNA content of all cells in the human body (estimated to be at about 1013) and to identify all distinct cell types.
- The Earth Biogenome Project (EBP), aiming to sequence the genome of all eukaryotic species living on earth (estimated to be about 1.5 milion)
- Large scale metagenomics projects, monitoring microbial diversity in natural (Tara Oceans) and urban (MetaSub) environments, as well as in the human body (Human Microbiome Project, HMP).
- Functional genomics, which uses deep sequencing to annotate genomic regions with their biological function (methylation, histone modifications, transcription factor binding, etc).
- Other -omics technologies, such as proteomics, metabolomics, lipidomics, etc.
- Genome editing technologies (CRISPR screens), which allow for large-scale genomic perturbations, followed by phenotypic or molecular assays of the cellular response.
- Advanced genomic imaging, allowing in vivo monitoring of genome activity (transcription, translation, etc.) within the cells, as well as the 3D organization of individual cells in organs (spatial transcriptomics).
- Cohort-based studies, which aim to analyze -omics data together with phenotypic information, either from medical records or (dynamically and continuously) collected through electronic recording devices, for thousands to millions of individuals (from GTEx and UK Biobank to national precision medicine projects). Data may include, among others, medical annotations, high-resolution imaging (neuroimaging, X-ray imaging), histopathologies, and longitudinal measurements of physiological variables (heart rate, body temperature, physical activity), many of which may be collected autonomously.
Data produced by these projects appeal to methods that have been studied extensively in the bioinformatics literature, such as (multiple) sequence alignments, motif (gene) finding, (meta) genome assembly, and phylogenetic reconstruction. However, existing methods are unlikely to properly scale, and fundamental computational problems persist, in terms of adapting the methods and the algorithms to unprecedented data volumes. Other new data types (imaging, longitudinal recordings, etc.) and subject-matter questions will require the development of novel methods and algorithms. Many of these will build on sequence analysis, classical statistics, and, in particular on the recent success of machine and deep learning methods.
At the workshop, we will discuss these algorithms and methods, as well as new ways to work with the data, and applications to specific domains. Finally, we will deliberate the ethical issues involved in generating and working with such data; in particular, how these data can be used in a nondiscriminatory fashion, and for the benefit of all.
Everyone is welcome to attend this conference. Registration is required. Space may be limited, and you are advised to register early. The link to the registration form will appear on this page approximately 10 weeks before the conference. To submit your name for consideration, please register and await confirmation of your acceptance before booking your travel.
To apply for presenting a poster in the poster session, you must first be registered above for the workshop. The Simons Institute will provide a 30 inch x 40 inch board, an easel, and clips to attach the poster to the board; presenters must print and bring their own posters. To apply for presenting a poster, please fill out this form.
Further details about this conference will be posted in due course. To contact the organizers about this conference, please complete this form. (Please note the form is not for registration.)
Please note: the Simons Institute regularly captures photos and video of activity around the Institute for use in videos, publications, and promotional materials.
Steven Brenner (UCB), Soren Brunak (University of Copenhagen), Rayan Chikhi (Institute Pasteur), Ana Conesa (University of Florida), Sandrine Dudoit (UC Berkeley), Jasmin Fisher (UCL), Roderic Guigo Serra (CRG - Center for Genomic Regulation), Eran Halperin (University of California, Los Angeles), Ian Holmes (UC Berkeley), Rachel Karchin (John Hopkins), Sunduz Keles (University of Wisconsin Madison), Harris Lewin (UC Davis), Alejandra Medina-Rivera (International Laboratory for Human Genome Research), Priya Moorgani (UC Berkeley), Ali Mortazavi (University of Victoria), Katherine Pollard (UCSF), Elizabeth Purdom (UC Berkeley), Tim Reddy (Duke), Ron Shamir (Tel-Aviv University), Meromit Singer (Harvard University), Oliver Stegle (EMBL Heidelberg), Tandy Warnow (University of Illinois at Urbana–Champaign)