Computational Modeling of Long Noncoding RNA Transcriptional Regulation
With work from the past decade and rapid development of high throughput transcriptome sequencing techniques, our knowledge of noncoding RNAs has broadened. The human genome is pervasively transcribed; however, a significant fraction of the transcripts generated from the human genome does not code for proteins. Among noncoding RNAs, long non-coding RNAs (lncRNAs) play a variety of roles, ranging from transcription regulation, controlling chromatin epigenetic state, participating alternative splicing to subnuclear compartment formation. Most lncRNAs exert their functions through the interaction of protein partners. Elucidation of RNAprotein interactions is essential for understanding many critical biological processes. In particular, lncRNAs interact with ribonucleoprotein complexes and numerous chromatin regulators to target appropriate locations in the genome. Based on the support vector machine method, we present LncLink, a computational method used to infer the set of the most probable proteins involved in lncRNA and the genomic regions that they control. We modeled LncLink using data obtained by capture hybridization analysis of RNA targets. The inferences derived from this method are obtained by integrating transcription factor binding sites and genome-wide chromatin interactions as predictive features. To validate the method, we applied LncLink to CHART-seq data obtained from MCF-7 cells to identify putative protein binding partners of lncRNA NEAT1. Our method generated a signature of 27 proteins highly predicted to be involved in NEAT1 interaction and mediating NEAT1 chromatin targeting. Furthermore, several of these proteins have been implicated in NEAT1 binding and NEAT1-mediated transcription in the literature, confirming the reliability of our results. The findings revealed by our work provide novel insights into our understanding of key players targeted by lncRNA.