Abstract
Post-transcriptional mechanisms such as Alternative Splicing (AS) and Alternative PolyAdenylation (APA) regulate the maturation of pre-mRNAs and may result in different transcripts arising from the same gene, increasing the diversity and regulation capacity of transcriptomes and proteomes. AS and APA has been extensively characterized at the mechanistic levels but to a lesser extent in terms of functional impact. While functional profiling is widely used to characterize the functional relevance of gene expression at the genome-wide level, similar tools at isoform resolution are missing. In contrast to short reads, single molecular sequencing technologies allow for direct sequencing of full-length transcripts, and novel tools are needed to leverage the information potential of these platforms to study the functional consequences of alternative transcript processing. Particularly, RNA sequencing using long reads technologies results in a vast number of novel transcripts that are a mixture of representations of true molecules and technology artifacts. Additionally, functional annotation at isoform resolution has not been developed yet. Here we present a novel computational framework for Functional Iso-Transcriptomics analysis (FIT), specially designed to study isoform (differential) expression from a functional perspective. This framework consists of three bioinformatics developments. SQANTI is used to define and curate expressed transcriptomes obtained with long-read technologies. SQANTI categorizes full-length reads, evaluates their potential biases, and removes low-quality instances. The IsoAnnot pipeline combines multiple databases and function prediction algorithms to return a rich isoform-level annotation file of functional domains, motifs, and sites, both coding and non-coding. Finally, the tappAS software introduces novel analysis methods to interrogate the functional relevance of isoform complexity. I will show the application of the FIT framework to the analysis of differentiating mouse neural cells.