Abstract
Single cell RNA-sequencing reveals the differences in gene and exon expression levels across individual cells. In particular, recent studies showed considerable difference in the distributions of reads from different cells for the same gene. This variation of isoform usage across single cells was not observed from bulk RNA-seq data. We seek to quantify this variation, understand the sources of the variation, and identify the patterns of the different in isoform usage. To quantify the variation, we have developed a profile-variation (PV) score for each gene while accounting for various confounding factors in the data, and this score allows us to extract genes with highly variable read density profiles across cells.
Based on the PV score we can study the sources of the transcript variation. Gene Ontology analysis of genes with high PV reveals two levels in the isoform variation in terms of gene functions. As we analyzed date sets from different cell types, we found that the first level of functions are common for all cell types, whereas the second level of functions is cell type specific, for example, immunology related functions in activated T helper cells. We further studied the patterns of the isoform usage across cells. Although we found genes which switch isoforms between cell types, they do not switch in a correlated manner, showing high stochasticity in isoform generation in single cells. Finally, we show that applying our PV score on single cell RNA-seq data finds genes which are not detected on bulk RNA-seq data with traditional methods to be differentially spliced, and these genes potentially represent the gradual change from one cell type to another.