Browsing by Author "Biswas, Swati"

Now showing 1 - 8 of 8

A Bayesian Hierarchical Framework for Pathway Analysis in Genome-Wide Association Studies
(2018-08-31) Zhang, Lei; Biswas, Swati
The genome-wide association studies (GWAS) aim to identify genetic variants, typically single nucleotide polymorphisms (SNPs), associated with a disease/trait. A commonly used analytic strategy in GWAS is to test for association with one single SNP at a time. However, such a strategy lacks power to detect associations that are caused by joint effects of multiple variants, each with a modest effect of its own. Pathway analysis jointly tests the combined effects of all SNPs in all genes belonging to a molecular pathway. This analysis is usually more powerful than single-SNP analyses for detecting joint effects of variants in a pathway. Moreover, due to biological functionality of pathways, a significant result lends itself more easily to interpretation. In this dissertation, we develop a Bayesian hierarchical model that fully models the natural three-level hierarchy inherent in pathway structure, namely SNP—gene—pathway, unlike most other methods that use ad hoc ways of combining such information. We model the effects at each level conditional on the effects of the levels preceding them within the generalized linear model framework. This joint modeling allows detection of not only the associated pathways but also testing for association with genes and SNPs within significant pathways and significant genes in a hierarchical manner, which can be useful for follow-up studies. To deal with the high dimensionality of such a unified model, we regularize the regression coefficients through an appropriate choice of priors. We fit the model using a combination of Iteratively Weighted Least Squares and Expectation-Maximization algorithms to estimate the posterior modes and their standard errors. The inference is carried out in a hierarchical manner from pathways to genes to SNPs. Hierarchical false discovery rate (FDR) is used for multiplicity adjustment of the entire inference procedure. We also explore the utility of effective number of parameters proposed in the Bayesian literature in our context of multiplicity adjustment using the hierarchical FDR. To study the proposed approach, we conduct simulations with samples generated under realistic linkage disequilibrium patterns obtained from the HapMap project. We find that our method has higher power than some standard approaches in several settings for identifying pathways that have multiple modest-sized variants. Moreover, it can also pinpoint associated genes once a pathway is implicated, a feature unavailable in other methods. We also find that the use of the effective number of parameters can boost the power to detect associated genes and helps in distinguishing them from the null genes. We apply the proposed method to two GWAS datasets on breast and renal cancer.
A Bayesian Latent Variable Approach to Aggregation of Partial and Top-Ranked Lists in Genomic Studies
(Wiley) Li, X.; Choudhary, Pankaj K.; Biswas, Swati; Wang, X.; 0000 0001 2704 188X (Biswas, S); 0000-0002-0398-7459 (Choudary, PK); Choudhary, Pankaj K.; Biswas, Swati
In genomic research, it is becoming increasingly popular to perform meta-analysis, the practice of combining results from multiple studies that target a common essential biological problem. Rank aggregation, a robust meta-analytic approach, consolidates such studies at the rank level. There exists extensive research on this topic, and various methods have been developed in the past. However, these methods have two major limitations when they are applied in the genomic context. First, they are mainly designed to work with full lists, whereas partial and/or top-ranked lists prevail in genomic studies. Second, the component studies are often clustered, and the existing methods fail to utilize such information. To address the above concerns, a Bayesian latent variable approach, called BiG, is proposed to formally deal with partial and top-ranked lists and incorporate the effect of clustering. Various reasonable prior specifications for variance parameters in hierarchical models are carefully studied and compared. Simulation results demonstrate the superior performance of BiG compared with other popular rank aggregation methods under various practical settings. A non–small-cell lung cancer data example is analyzed for illustration.
Bivariate Logistic Bayesian LASSO for Detecting Rare Haplotype Association with Two Correlated Phenotypes
(2020-11-20) Yuan, Xiaochen; Biswas, Swati
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in statistical power and better understanding of genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotype variants using case-control data, i.e., a single binary phenotype. Currently there is no rare haplotype association method that can handle multiple phenotypes. This dissertation aims to fill this gap by extending LBL to jointly model (i) two binary phenotypes and (ii) one binary and one continuous phenotypes. First, we develop a bivariate LBL model for two binary phenotypes by considering two logistic regression models for the two phenotypes under the retrospective likelihood framework of LBL. The models share a common latent variable to induce correlation between the phenotypes. We carry out extensive simulations to investigate the bivariate LBL and compare with the original (univariate) LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two datasets — Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association dataset on lung cancer and smoking, and detect several associated rare haplotypes. Next, we continue to extend bivariate LBL for modeling one binary and one continuous phenotypes jointly. This is more challenging from a statistical point of view due to different scales of the two phenotypes. We consider a logistic and a linear regression models for the binary and continuous phenotypes. However, due to different scales, we work with a latent variable representation of the binary phenotype to ensure that the two phenotypes share a common continuous underlying scale. As before, a common latent variable is used to induce correlation between the two phenotypes. We carry out extensive simulations to investigate this extension (named as bivariate LBL-BC) and compare it with univariate LBL and bivariate LBL with two binary phenotypes (bivariate LBL-2B). In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance — when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a dataset on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Factors Associated with Attention Deficit/Hyperactivity Disorder Among US Children: Results from a National Survey
(BioMedCentral, 2012-05-14) Lingineni, R. K.; Biswas, Swati; Ahmad, N.; Jackson, B. E.; Bae, S.; Singh, K. P.; 0000 0001 2704 188X (Biswas, S)
Background: The purpose of this study was to investigate the association between Attention Deficit/Hyperactivity Disorder (ADHD) and various factors using a representative sample of US children in a comprehensive manner. This includes variables that have not been previously studied such as watching TV/playing video games, computer usage, family member's smoking, and participation in sports. Methods: This was a cross-sectional study of 68, 634 children, 5-17 years old, from the National Survey of Children's Health (NSCH, 2007-2008). We performed bivariate and multivariate logistic regression analyses with ADHD classification as the response variable and the following explanatory variables: sex, race, depression, anxiety, body mass index, healthcare coverage, family structure, socio-economic status, family members' smoking status, education, computer usage, watching television (TV)/playing video games, participation in sports, and participation in clubs/organizations. Results: Approximately 10% of the sample was classified as having ADHD. We found depression, anxiety, healthcare coverage, and male sex of child to have increased odds of being diagnosed with ADHD. One of the salient features of this study was observing a significant association between ADHD and variables such as TV usage, participation in sports, two-parent family structure, and family members' smoking status. Obesity was not found to be significantly associated with ADHD, contrary to some previous studies. Conclusions: The current study uncovered several factors associated with ADHD at the national level, including some that have not been studied earlier in such a setting. However, we caution that due to the cross-sectional and observational nature of the data, a cause and effect relationship between ADHD and the associated factors can not be deduced from this study. Future research on ADHD should take into consideration these factors, preferably through a longitudinal study design. © 2012 Lingineni et al.; licensee BioMed Central Ltd.
Impact of a Community Based Implementation of Reach II Program for Caregivers of Alzheimer's Patients
(PLOS, 2014-02-27) Lykens, Kristine; Moayad, Neda; Biswas, Swati; Reyes-Ortiz, Carlos; Singh, Karan P.; 0000 0001 2704 188X (Biswas, S)
Background: In 2009 an estimated 5.3 million people in the United States were afflicted with Alzheimer's disease, a degenerative form of dementia. The impact of this disease is not limited to the patient but also has significant impact on the lives and health of their family caregivers. The Resources for Enhancing Alzheimer's Caregiver Health (REACH II) program was developed and tested in clinical studies. The REACH II program is now being delivered by community agencies in several locations. This study examines the impact of the REACH II program on caregiver lives and health in a city in north Texas. Study design: Family caregivers of Alzheimer's patients were assessed using an instrument covering the multi-item domains of Caregiver Burden, Depression, Self-Care, and Social Support upon enrollment in the program and at the completion of the 6 month intervention. The domain scores were analyzed using a multivariate paired t-test and Bonferroni confidence interval for the differences in pre- and post-service domain scores. Results: A total of 494 families were enrolled in the program during the period January 1, 2011 through June 30, 2012. Of these families 177 completed the 6 month program and have pre - and post service domain scores. The median age for the caregivers was 62 years. The domain scores for Depression and Caregiver Burden demonstrated statistically significant improvements upon program completion. Conclusion: The REACH II intervention was successfully implemented by a community agency with comparable impacts to those of the clinical trial warranting wider scale implementation.
An Improved Version of Logistic Bayesian Lasso for Detecting Rare Haplotype-Environment Interactions with Application to Lung Cancer
(Libertas Academica, 2015-02-09) Zhang, Yuan; Biswas, Swati; 0000 0001 2704 188X (Biswas, S)
The importance of haplotype association and gene-environment interactions (GxE) in the context of rare variants has been underlined in voluminous literature. Recently, a software based on logistic Bayesian LASSO (LBL) was proposed for detecting GxE, where G is a rare (or common) haplotype variant (rHTV)-it is called LBL-GxE. However, it required relatively long computation time and could handle only one environmental covariate with two levels. Here we propose an improved version of LBL-GxE, which is not only computationally faster but can also handle multiple covariates, each with multiple levels. We also discuss details of the software, including input, output, and some options. We apply LBL-GxE to a lung cancer dataset and find a rare haplotype with protective effect for current smokers. Our results indicate that LBL-GxE, especially with the improvements proposed here, is a useful and computationally viable tool for investigating rare haplotype interactions.;
Prediction of Individualized Risk of Contralateral Breast Cancer
(2018-05) Chowdhury, Marzana; Choudhary, Pankaj K.; Biswas, Swati
Women diagnosed with cancer in one breast are increasingly choosing to remove their other unaffected (contralateral) breast through a surgery called contralateral prophylactic mastectomy (CPM) to reduce the risk of contralateral breast cancer (CBC). Yet a large proportion of CPMs are believed to be medically unnecessary because the risk of CBC has, in fact, gone down substantially mainly due to availability of effective therapies for breast cancer (BC), which have a preventative effect on the contralateral breast. Thus, this dramatic rise in the rate of CPMs is a particularly disturbing trend. Research shows that many BC patients tend to substantially overestimate their CBC risk. Although CPM reduces the risk of CBC, there is no convincing evidence that it prolongs survival. The surgery also has a significant number of side effects and can have an adverse effect on a woman’s health and well-being. Thus, there is a pressing need to educate patients effectively on their CBC risk. For this task, physicians need a statistical model for risk prediction of CBC based on patient’s personal risk factors. This dissertation is focused on filling this critical need. Although several risk factors for CBC are well established in the literature, one factor that is relatively less well-studied is mammographic breast density. This factor has come to the attention of the scientific community only recently and, in particular, it has been shown that increased breast density is a strong risk factor for first BC. Thus, it is of interest to study if it is associated with the risk of CBC as well. To this end, first we studied the relationship between breast density and CBC by analyzing data from Breast Cancer Surveillance Consortium (BCSC), which is a large population based source consisting of seven cancer registries across the US. We found that breast density is an independent and significant risk factor for development of CBC. In particular, breast density has a dose dependent effect on the risk of CBC, with increased breast density associated with increased risk. Next, we developed a CBC risk prediction model using data from BCSC and also Surveillance, Epidemiology, and End Results, another large population based source. We explored numerous potential risk factors for inclusion into this model. The final model consists of eight risk factors — age at first BC diagnosis, anti-estrogen therapy, family history of BC, high risk pre-neoplasia, estrogen receptor status, breast density, type of first BC, and age at first birth. Combining the relative risk estimates of these factors with the relevant hazard rates, our model, named CBCRisk, projects absolute risk of developing CBC over a given period. Finally, we validated CBCRisk on clinical datasets from the MD Anderson Cancer Center and Johns Hopkins University. We computed the relevant calibration and validation measures, and found that the model performs reasonably well for both datasets. With independent validation, CBCRisk can be used confidently in clinical settings in counseling BC patients by providing their individualized CBC risk. In turn, this may potentially help alleviate the rate of medically unnecessary CPMs.
Recent Enhancements to the Genetic Risk Prediction Model BRCAPRO
(Libertas Academica, 2015-05-10) Mazzola, Emanuele; Blackford, Amanda; Parmigiani, Giovanni; Biswas, Swati; 0000 0001 2704 188X (Biswas, S)
BRCAPRO is a widely used model for genetic risk prediction of breast cancer. It is a function within the R package BayesMendel and is used to calculate the probabilities of being a carrier of a deleterious mutation in one or both of the BRCA genes, as well as the probability of being affected with breast and ovarian cancer within a defined time window. Both predictions are based on information contained in the counselee's family history of cancer. During the last decade, BRCAPRO has undergone several rounds of successive refinements: the current version is part of release 2.1 of BayesMendel. In this review, we showcase some of the most notable features of the software resulting from these recent changes. We provide examples highlighting each feature, using artificial pedigrees motivated by complex clinical examples. We illustrate how BRCAPRO is a comprehensive software for genetic risk prediction with many useful features that allow users the flexibility to incorporate varying amounts of available information.;