Bivariate Logistic Bayesian LASSO for Detecting Rare Haplotype Association with Two Correlated Phenotypes
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in statistical power and better understanding of genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotype variants using case-control data, i.e., a single binary phenotype. Currently there is no rare haplotype association method that can handle multiple phenotypes. This dissertation aims to fill this gap by extending LBL to jointly model (i) two binary phenotypes and (ii) one binary and one continuous phenotypes. First, we develop a bivariate LBL model for two binary phenotypes by considering two logistic regression models for the two phenotypes under the retrospective likelihood framework of LBL. The models share a common latent variable to induce correlation between the phenotypes. We carry out extensive simulations to investigate the bivariate LBL and compare with the original (univariate) LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two datasets — Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association dataset on lung cancer and smoking, and detect several associated rare haplotypes. Next, we continue to extend bivariate LBL for modeling one binary and one continuous phenotypes jointly. This is more challenging from a statistical point of view due to different scales of the two phenotypes. We consider a logistic and a linear regression models for the binary and continuous phenotypes. However, due to different scales, we work with a latent variable representation of the binary phenotype to ensure that the two phenotypes share a common continuous underlying scale. As before, a common latent variable is used to induce correlation between the two phenotypes. We carry out extensive simulations to investigate this extension (named as bivariate LBL-BC) and compare it with univariate LBL and bivariate LBL with two binary phenotypes (bivariate LBL-2B). In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance — when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a dataset on lung cancer and nicotine dependence and detect several haplotypes including a rare one.