Tracking Dissemination of Plasmids in the Murine Gut Using Hi-C Sequencing and Bayesian Landmark-based Shape Analysis of Tumor Pathology Images




Journal Title

Journal ISSN

Volume Title



Starting from the experimental design and simple group result comparison using studentst test to the analysis need of explosively growing digital information in the big data era, extensive statistical approaches have been developed and incorporated into biology studies in order to understand the mechanism of life processes and disease. The term ”omics” refers to the various disciplines in biology performing a comprehensive, or global, assessment of a set of biological features in a high-throughput way, such as genomics, transcriptomics, proteomics and metagenomics. When analyzing such a huge amount of data, a proper framework needs to be developed with a thorough knowledge of the associated biology as well as statistical models. In this work, I focus on two critical health-related problems, antibiotic resistance spread in microbial communities by conjugative plasmids, and the association between tumor shape and prognosis in pathology images. In my first work, metagenomics and Hi-C sequencing were employed to analyze plasmid dissemination from Enterococcus faecalis donor strains in the murine intestine. I clustered assembled contigs into metagenomeassembled genomes (MAGs) and showed that the quality of obtained MAGs was improved by combining those two types of sequencing techniques. Then, I demonstrated that Hi-C is able to detect the in situ hosts of native resistance genes in the murine gut microbiota. We also confirmed the association between introduced E. faecalis plasmids and the donor strains and found potential new gram-positive host for the pAM830 resistance plasmid. In my second work, we developed a framework with a novel automatic landmark detection model for tumor shape boundary in pathology image called Bayesian LAndmark-based Shape Analysis (BayesLASA). Two types of landmark-based boundary roughness features were proposed, and we demonstrated the predictive value of them in a large cohort of lung cancer patients.



Statistics, Biology, Bioinformatics