Estimating Distributions of Deletion Mutations Among Bacterial Populations and Graphical Modeling of Multiple Biological Pathways in Genomic Studies




Journal Title

Journal ISSN

Volume Title



Genomics, the study of the entirety of an organism’s genes (the genome), including their structure, functions, related techniques, etc., plays a significant role in human health and disease. Using enormous amounts of genome sequences and related data collected by highthroughput techniques, researchers develop computational and statistical methods to investigate genetic components of the diseases. Genome-based research enables us to develop prevention strategies and improve treatment efficiencies of the complex diseases, which carry a massive public health burden and may have a grave economic and social impact. With the abundance of genomic data, my researches focus on proposing statistical methods of two critical human health problems: estimating the distribution of deletion mutation among the bacterial population under antibiotic selection, and graphical modeling of multiple biological pathways in genomic studies. In my first work, I proposed two mortified ExpectationMaximization (EM) methods to estimate the proportions of bacterial populations carrying deletion mutations based on mapping results of next-generation sequencing (NGS) pairedend data. We compare two methods, one using only split reads and the other using both split- and non-split reads. Two simulation studies and one real case study, the CRISPR-Cas system of E. faecalis, were employed to evaluate the performances of the proposed methods. In my second work, I focused on examining the Markov Random Field (MRF) model of multiple biological pathways proposed by Cao (2016) using simulated and real genomic data.



Expectation-maximization algorithms, CRISPR (Genetics), Markov random fields, Biology