Browsing by Author "Zhang, Michael Q."
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item FastDMA: An Infinium Humanmethylation450 Beadchip Analyzer(2013-09-05) Wu, D.; Gu, J.; Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.DNA methylation is vital for many essential biological processes and human diseases. Illumina Infinium HumanMethylation450 Beadchip is a recently developed platform studying genome-wide DNA methylation state on more than 480,000 CpG sites and a few CHG sites with high data quality. To analyze the data of this promising platform, we developed FastDMA which can be used to identify significantly differentially methylated probes. Besides single probe analysis, FastDMA can also do region-based analysis for identifying the differentially methylated region (DMRs). A uniformed statistical model, analysis of covariance (ANCOVA), is used to achieve all the analyses in FastDMA. We apply FastDMA on three large-scale DNA methylation datasets from The Cancer Genome Atlas (TCGA) and find many differentially methylated genomic sites in different types of cancer. On the testing datasets, FastDMA shows much higher computational efficiency than current tools. FastDMA can benefit the data analyses of large-scale DNA methylation studies with an integrative pipeline and a high computational efficiency. The software is freely available via http://bioinfo.au.tsinghua.edu.cn/software/fastdma/.Item Gene Module Based Regulator Inference Identifying miR-139 as a Tumor Suppressor in Colorectal Cancer(Royal Society of Chemistry, 2014-09-30) Gu, J.; Chen, Y.; Huang, H.; Yin, L.; Xie, Z.; Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Colorectal cancer is one of the most commonly diagnosed cancer types worldwide. Identification of the key regulators of the altered biological networks is crucial for understanding the complex molecular mechanisms of colorectal cancer. We proposed a gene module based approach to infer key miRNAs regulating the major gene network alterations in cancer tissues. By integrating gene differential expression and co-expression information with a protein-protein interaction network, the differential gene expression modules, which captured the major gene network changes, were identified for colorectal cancer. Then, several key miRNAs, which extensively regulate the gene modules, were inferred by analyzing their target gene enrichment in the modules. Among the inferred candidates, three miRNAs, miR-101, miR-124 and miR-139, are frequently down-regulated in colorectal cancers. The following computational and experimental analyses demonstrate that miR-139 can inhibit cell proliferation and cell cycle G1/S transition. A known oncogene ETS1, a key transcription factor in the gene module, was experimentally verified as a novel target of miR-139. miR-139 was found to be significantly down-regulated in early pathological cancer stages and its expression remained at very low levels in advanced stages. These results indicate that miR-139, inferred by the gene module based approach, should be a key tumor suppressor in early cancer development.Item A Highly Efficient and Effective Motif Discovery Method for ChIP-Seq/ChIP-Chip Data using Positional Information(2012-01-06) Ma, Xiaotu; Kulkarni, Ashwinikumar; Zhang, Zhihua; Xuan, Zhenyu; Serfling, Robert J. (Robert Joseph); Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.Item HITS-CLIP and Integrative Modeling Define the Rbfox Splicing-Regulatory Network Linked to Brain Development and Autism(Cell Press, 2014-03) Weyn-Vanhentenryck, Sebastien; Mele, Aldo; Yan, Qinghong; Sun, Shuying; Farny, Natalie; Zhang, Zuo; Xue, Chenghai; Herre, Margaret; Silver, Pamela A.; Zhang, Michael Q.; Krainer, Adrian R.; Darnell, Robert B.; Zhang, Chaolin; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.The RNA binding proteins Rbfox1/2/3 regulate alternative splicing in the nervous system, and disruption of Rbfox1 has been implicated in autism. However, comprehensive identification of functional Rbfox targets has been challenging. Here, we perform HITS-CLIP for all three Rbfox family members in order to globally map, at a single-nucleotide resolution, their in vivo RNA interaction sites in the mouse brain. We find that the two guanines in the Rbfox binding motif UGCAUG are critical for protein-RNA interactions and crosslinking. Using integrative modeling, these interaction sites, combined with additional datasets, define 1,059 direct Rbfox target alternative splicing events. Over half of the quantifiable targets show dynamic changes during brain development. Of particular interest are 111 events from 48 candidate autism-susceptibility genes, including syndromic autism genes Shank3, Cacna1c, and Tsc2. Alteration of Rbfox targets in some autistic brains is correlated with downregulation of all three Rbfox proteins, supporting the potential clinical relevance of the splicing-regulatory network.Item Integrated Omics Study Delineates the Dynamics of Lipid Droplets in Rhodococcus Opacus PD630(Oxford University Press, 2013-10-22) Chen, Yong; Ding, Yunfeng; Yang, Li; Yu, Jinhai; Liu, Guiming; Wang, Xumin; Zhang, Shuyan; Zhang, Michael Q.; Li, Yanda; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Rhodococcus opacus strain PD630 (R. opacus PD630), is an oleaginous bacterium, and also is one of few prokaryotic organisms that contain lipid droplets (LDs). LD is an important organelle for lipid storage but also intercellular communication regarding energy metabolism, and yet is a poorly understood cellular organelle. To understand the dynamics of LD using a simple model organism, we conducted a series of comprehensive omics studies of R. opacus PD630 including complete genome, transcriptome and proteome analysis. The genome of R. opacus PD630 encodes 8947 genes that are significantly enriched in the lipid transport, synthesis and metabolic, indicating a super ability of carbon source biosynthesis and catabolism. The comparative transcriptome analysis from three culture conditions revealed the landscape of gene-altered expressions responsible for lipid accumulation. The LD proteomes further identified the proteins that mediate lipid synthesis, storage and other biological functions. Integrating these three omics uncovered 177 proteins that may be involved in lipid metabolism and LD dynamics. A LD structure-like protein LPD06283 was further verified to affect the LD morphology. Our omics studies provide not only a first integrated omics study of prokaryotic LD organelle, but also a systematic platform for facilitating further prokaryotic LD research and biofuel development.Item Miror: A Method for Cell-Type Specific MicroRNA Occupancy Rate Prediction(Royal Soc Chemistry, 2014-03-13) Xie, Peng; Liu, Yu; Li, Yanda; Zhang, Michael Q.; Wang, Xiaowo; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.MicroRNA (miRNA) regulation is highly cell-type specific. It is sensitive to both the miRNA-mRNA relative abundance and the competitive endogenous RNA (ceRNA) effect. However, almost all existing miRNA target prediction methods neglected the influence of the cellular environment when analyzing miRNA regulation effects. In this study, we proposed a method, MIROR (miRNA Occupancy Rate predictor), to predict miRNA regulation intensity in a given cell type. The major considerations were the miRNA-mRNA relative abundance and the endogenous competition between different mRNA species. The output of MIROR is the predicted miRNA occupancy rates of each target site. The predicted results significantly correlated with Ago HITS-CLIP experiment that indicated miRNA binding intensities. When applied to the analysis of the breast invasive carcinoma dataset, MIROR identified a number of differentially regulated miRNA-mRNA pairs with significant miRNA occupancy rate changes between tumor and normal tissues. Many of the predictions were supported by previous research studies, including the ones without a significant change in the mRNA expression level. These results indicate that MIROR provides a novel strategy to study the miRNA differential regulation in different cell types.Item ModuleRole: A Tool for Modulization, Role Determination and Visualization in Protein-Protein Interaction Networks(Public Library of Science, 2014-05-01) Li, GuiPeng; Li, Ming; Zhang, YiWei; Wang, Dong; Li, Rong; Guimera, Roger; Gao, Juntao Tony; Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Rapidly increasing amounts of (physical and genetic) protein-protein interaction (PPI) data are produced by various high-throughput techniques, and interpretation of these data remains a major challenge. In order to gain insight into the organization and structure of the resultant large complex networks formed by interacting molecules, using simulated annealing, a method based on the node connectivity, we developed ModuleRole, a user-friendly web server tool which finds modules in PPI network and defines the roles for every node, and produces files for visualization in Cytoscape and Pajek. For given proteins, it analyzes the PPI network from BioGRID database, finds and visualizes the modules these proteins form, and then defines the role every node plays in this network, based on two topological parameters Participation Coefficient and Z-score. This is the first program which provides interactive and very friendly interface for biologists to find and visualize modules and roles of proteins in PPI network. It can be tested online at the website http://www.bioinfo.org/modulerole/index.php, which is free and open to all users and there is no login requirement, with demo data provided by "User Guide'' in the menu Help. Non-server application of this program is considered for high-throughput data with more than 200 nodes or user's own interaction datasets. Users are able to bookmark the web link to the result page and access at a later time. As an interactive and highly customizable application, ModuleRole requires no expert knowledge in graph theory on the user side and can be used in both Linux and Windows system, thus a very useful tool for biologist to analyze and visualize PPI networks from databases such as BioGRID. Availability: ModuleRole is implemented in Java and C, and is freely available at http://www.bioinfo.org/modulerole/index.php. Supplementary information (user guide, demo data) is also available at this website. API for ModuleRole used for this program can be obtained upon request.Item New Fusion Transcripts Identified in Normal Karyotype Acute Myeloid Leukemia(2012-12-12) Wen, H.; Li, Yongjin; Malek, S. N.; Kim, Y. C.; Xu, J.; Chen, P.; Xiao, F.; Huang, X.; Xuan, Zhenyu; Mankala, Shiva; Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Genetic aberrations contribute to acute myeloid leukemia (AML). However, half of AML cases do not contain the well-known aberrations detectable mostly by cytogenetic analysis, and these cases are classified as normal karyotype AML. Different outcomes of normal karyotype AML suggest that this subgroup of AML could be genetically heterogeneous. But lack of genetic markers makes it difficult to further study this subgroup of AML. Using paired-end RNAseq method, we performed a transcriptome analysis in 45 AML cases including 29 normal karyotype AML, 8 abnormal karyotype AML and 8 AML without karyotype informaiton. Our study identified 134 fusion transcripts, all of which were formed between the partner genes adjacent in the same chromosome and distributed at different frequencies in the AML cases. Seven fusions are exclusively present in normal karyotype AML, and the rest fusions are shared between the normal karyotype AML and abnormal karyotype AML. CIITA, a master regulator of MHC class II gene expression and truncated in B-cell lymphoma and Hodgkin disease, is found to fuse with DEXI in 48% of normal karyotype AML cases. The fusion transcripts formed between adjacent genes highlight the possibility that certain such fusions could be involved in oncological process in AML, and provide a new source to identify genetic markers for normal karyotype AML.Item Nucleosome Eviction and Multiple Co-Factor Binding Predict Estrogen-Receptor-Alpha-Asociated Long-Range Interactions(Oxford University Press, 2014-04-29) He, C.; Wang, X.; Zhang, Michael Q.; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.Many enhancers regulate their target genes via long-distance interactions. High-throughput experiments like ChIA-PET have been developed to map such largely cell-type-specific interactions between cis-regulatory elements genome-widely. In this study, we integrated multiple types of data in order to reveal the general hidden patterns embedded in the ChIA-PET data. We found characteristic distance features related to promoter-promoter, enhancer-enhancer and insulator-insulator interactions. Although a protein may have many binding sites along the genome, our hypothesis is that those sites that share certain open chromatin structure can accommodate relatively larger protein complex consisting of specific regulatory and 'bridging' factors, and may be more likely to form robust long-range deoxyribonucleic acid (DNA) loops. This hypothesis was validated in the estrogen receptor alpha (ERa) ChIA-PET data. An efficient classifier was built to predict ERa-associated long-range interactions solely from the related ChIP-seq data, hence linking distal ERa-dependent enhancers to their target genes. We further applied the classifier to generate additional novel interactions, which were undetected in the original ChIA-PET paper but were validated by other independent experiments. Our work provides a new insight into the long-range chromatin interactions through deeper and integrative ChIA-PET data analysis and demonstrates DNA looping predictability from ordinary ChIP-seq data.Item OLego: Fast and Sensitive Mapping of Spliced mRNA-Seq Reads Using Small Seeds(Oxford University Press, 2013-04) Wu, Jie; Anczuk©w, Olga; Krainer, Adrian R.; Zhang, Michael Q.; Zhang, Chaolin; 0000 0001 1707 1372 (Zhang, MQ); 99086074 (Zhang, MQ); Zhang, Michael Q.A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (∼14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows-Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.;