Abstract

Despite the abundance of spatial gene expression data, extracting meaningful information to reveal how genes interact remains a challenge. We developed staNMF, a method that combines a powerful unsupervised learning algorithm, nonnegative matrix factorization (NMF), with a new stability criterion that selects the size of the dictionary or the set of principal patterns (PP). We demonstrate that PP give rise to a novel and concise representation of the Drosophila embryonic spatial expression patterns and they correspond to biologically meaningful regions of the Drosophila embryo. Furthermore, this new representation was used to automatically predict manual annotations, categorize gene expression patterns, and reconstruct the local gap gene network with high accuracy. Finally, we will present theoretical results on local identifiability in dictionary learning that shed some light into the conditions under which we might be able to trust the PPs.