Abstract

RNA-binding proteins play vital roles in many processes in the cell, yet little is known about their structural binding preferences. A comprehensive study measured the binding of more than 200 proteins to around 240K RNA probes each (Ray et al. 2013), but only focused on their sequence specificities since the RNA probes were designed to be unstructured. Recently, we made an algorithmic breakthrough in modeling and learning the structural preferences from these unstructured data (Orenstein, Wang and Berger 2016). As a result, a large-scale analysis of RNA-binding structural preferences became possible for the first time.

Here, we analyze the structural binding preferences of the largest compendium of RNA-binding proteins to-date. First, we show that RNA structural variability exists in the unstructured data, and that it is correlated with RNA-binding preferences. Second, we assert that overall RBPs prefer to bind unpaired regions, while many can bind both in loop or external regions and mote show preferences to loop regions than external. Third, we gauge the improvement in protein-binding prediction that is achieved by using RNA structure, both in vitro and in vivo. We find that structure preferences as measured in vitro correlate with structure preferences identified in vivo.

These results are the first to analyze RNA-structural preferences on such a large scale. We hope that our analysis will facilitate better understanding of protein-RNA binding and its rolls in gene regulation and disease.