Abstract

A significant proportion of mammalian genes encode for multiple transcript isoforms that result from differential promoter usage, changes in internal splicing, and 3’ end choice. The comprehensive characterization of transcript diversity across tissues, cell types, and species has been challenging because transcripts are much longer than reads normally used for RNA-seq. Long-read RNA-seq (lrRNA-seq) allows for identification of the complete structure of each transcript. As part of the final phase of the ENCODE Consortium, we sequenced 216 lrRNA-seq libraries totaling 1 billion circular consensus reads (CCS) for 60 unique human and mouse samples. We detected and quantified 94.4% of GENCODE protein coding genes as well as 42.6% of known protein coding transcripts. Overall, we detected over 100,000 full-length transcripts, one third of which are novel. We then define a new reference set of transcription start sites (TSSs), transcription end sites (TESs), and intron chains that are used for each gene across diverse tissues and cell types. Finally, we develop new metrics to characterize the transcriptional diversity of each gene in terms of alternative TSS choice, TES choice, and internal splicing; and demonstrate that this diversity varies on a per-gene basis across tissues, cell lines, and species. Our results represent the first comprehensive survey of human and mouse transcriptomes using full-length long reads and will serve as a foundation for further transcript-centric analyses. Genomic regulation after birth contributes significantly to tissue and organ maturation, but is under-studied relative to existing genomic catalogues of prenatal development in mouse. As part of ENCODE4, we generated the first comprehensive bulk and single-cell atlas of postnatal regulatory events across a diverse set of mouse tissues. The collection encompassed  seven postnatal time points spanning the human equivalent of childhood through adolescence and adulthood, and focused on  adrenal glands, gastrocnemius muscle, heart, hippocampus, and cortex. To allow for allele-specific analyses, we used C57BL6J/Castaneus F1 hybrid mice. Our analysis revealed novel dynamics of cell type composition including identifying new sex-specific cell populations and new commonalities in cell types shared among tissues. We also identify genomic regulatory signatures associated with dynamics of cell type composition, specialization of sub-cell types, and switching between cell states during postnatal development across 21 different cell types broken down into 68 sub-cell types.  We provide an organizational framework to describe TFs that are re-purposed in regulatory signatures of cell type identity in different tissues. Together, these analyses provide a foundation for understanding the postnatal development of diverse tissues. 

Video Recording