![Computational Innovation and Data Driven Biology_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-03/Computational%20Innovation%20and%20Data%20Driven%20Biology_hi-res.jpg?h=ccd3d9f7&itok=NsmWm_q5)
Abstract
Petabytes of valuable sequencing data reside in public repositories, doubling in size every two years. They contain a wealth of genetic information about viruses that would help us monitor spillovers and anticipate future pandemics. We recently developed a bioinformatics cloud infrastructure, named Serratus, to perform petabase-scale sequence alignment. With it we analyzed all available RNA-seq samples (5.7 million samples, 10 petabytes) and discovered 10x more RNA viruses than previously known, including a new family of coronaviruses (Edgar et al, Nature, 2022). In this talk, I will present the computational infrastructure and some of the biological analyses.