Browsing by Author "Choudhary, Pankaj K."

Now showing 1 - 12 of 12

A Bayesian Latent Variable Approach to Aggregation of Partial and Top-Ranked Lists in Genomic Studies
(Wiley) Li, X.; Choudhary, Pankaj K.; Biswas, Swati; Wang, X.; 0000 0001 2704 188X (Biswas, S); 0000-0002-0398-7459 (Choudary, PK); Choudhary, Pankaj K.; Biswas, Swati
In genomic research, it is becoming increasingly popular to perform meta-analysis, the practice of combining results from multiple studies that target a common essential biological problem. Rank aggregation, a robust meta-analytic approach, consolidates such studies at the rank level. There exists extensive research on this topic, and various methods have been developed in the past. However, these methods have two major limitations when they are applied in the genomic context. First, they are mainly designed to work with full lists, whereas partial and/or top-ranked lists prevail in genomic studies. Second, the component studies are often clustered, and the existing methods fail to utilize such information. To address the above concerns, a Bayesian latent variable approach, called BiG, is proposed to formally deal with partial and top-ranked lists and incorporate the effect of clustering. Various reasonable prior specifications for variance parameters in hierarchical models are carefully studied and compared. Simulation results demonstrate the superior performance of BiG compared with other popular rank aggregation methods under various practical settings. A non–small-cell lung cancer data example is analyzed for illustration.
Comparing Practical Approaches for Regression Models With Censored Covariates
(August 2022) Alyabs, Norah; Torabifard, Hedieh; Chiou, Sy Han; Chen, Min; Choudhary, Pankaj K.; Shin, Sunyoung
The accelerated failure time (AFT) model is a useful linear model to estimate the effect of some covariates on the failure time. Estimating the AFT model coefficients is challenging when there are missing values in the covariate due to the limit of detection (LOD) or when there are randomly censored covariates. Removing the subjects with the missing observations in the complete-case (CC) analysis is the usual approach due to the simplicity and consistency of the estimator. In small sample studies with a high missing proportion in more than one covariate, dropping observations in the CC analysis may result in a non-convergence issue. When the covariates are subject to the LOD, the missingness in the data is due to the inability to measure the values beyond the LODs. The missing indicator (MDI) approach could be a good alternative to the CC analysis. For small samples, the MDI is justified in simulation studies for the AFT model with covariates subject to the lower, upper, and interval LOD. In the linear and Cox models, the MDI outperforms other approaches too. When the covariates are randomly censored, imputing the censored values could be useful in parametric and semi-parametric AFT models. For the parametric AFT model, we proposed a parametric imputation approach that takes advantage of the available information in the censored covariate. The parametric imputation approach outperforms the CC analysis under the correct parametric assumptions. In the absence of a correct parametric assumption and under the semi-parametric AFT model, the MDI approach could be a good alternative to the CC analysis that preserves the sample size. In application to the NCCTG lung cancer data, the bias of the MDI estimator is less than that of the CC estimator when some covariates are subject to artificial LOD.
Contributions to Modeling and Analysis of Method Comparison Data
(2018-12) Kotinkaduwe Rankothgedara, Lak Nilusha; Choudhary, Pankaj K.
Method comparison studies compare a new method of measuring a continuous variable with an established method that serves as a reference. Both methods have the same unit of measurement and none of them is considered error free. The major goals in these studies are to quantify the degree of similarity and agreement between the two methods. The motivation behind the comparison is that if two methods agree well, the cheaper, simpler, or the less invasive among them can be preferred or both can be used interchangeably. Such studies are common in biomedical sciences with medical devices, assays, measurement protocols, or clinical observers serving as methods. The most popular design for conducting these studies is the paired measurements design, which leads to one measurement by each method on every subject. These paired measurements method comparison data are often analyzed by modeling them using the classical measurement error model or a special case of it, a mixedeffects model. Motivated by real applications, this dissertation makes two contributions toward modeling and analysis of these data. First, we develop a segmented measurement error model assuming equal error variances. This model extends the classical measurement error model to allow a piecewise linear relationship between the measurements. The changepoint at which the transition takes place is treated as an unknown parameter in the model. We provide an expectation conditional maximization (ECM) algorithm to fit the model and propose segmented-specific evaluation of similarity and agreement using appropriate extensions of the existing measures. Bootstrapping and largesample theory of maximum likelihood estimators are used to perform the relevant inferences. We are also able to obtain an explicit expression for the Hessian matrix that is needed for this purpose. The proposed methodology is evaluated by simulation and is illustrated by analyzing a dataset containing measurements of digoxin concentration. This work is also generalized to allow unequal error variances in the segmented model. Second, we develop a Bayesian approach that uses informative priors for error variances within a mixed-effect model framework. This approach allows taking advantage of information about error variances that may be available from previous studies, potentially leading to their improved estimation. Half-normal and hierarchical half-normal distributions are used as prior distributions for error variances and data from previous studies are used to estimate the hyperparameters of these distributions. We discuss strategies for posterior simulation to estimate the model parameters and their functions. The proposed methodology is compared with its likelihood-based counterpart in a simulation study. It is illustrated by analyzing a dataset containing oxygen saturation measurements.
Estimation of Covariance Structures in Functional Mixed Models with Application to Heritability Estimation
(2020-08) Patwary, Mohammad Shaha Alam; 0000-0002-3465-2184 (Patwary, MSA); Choudhary, Pankaj K.
When the response on a subject can be naturally viewed as a smooth curve or function, it is said to be a functional response. The response may be observed at a set of discrete times, possibly with noise. Functional data arise when observations of a functional response are available from a sample of subjects. Thus, the functional data essentially consist of a sample of curves. One example of such data is the usual longitudinal data where a variable of interest is measured over time on a sample of subjects. Functional data arise in a variety of disciplines, including economics, environmental science, public health, medicine, and genetics. Analysis of such data is currently an active area of statistical research. Functional data are often analyzed by modeling them as a functional mixed model. This model commonly assumes that the within-subject errors are homoscedastic and uncorrelated. But this assumption is often violated in practice, which may sometimes lead to potentially misleading inferences.This is especially an issue if the object of inference is a function of both random effect and error autocovariance functions. One such quantity is heritability function, defined as the proportion of variance explained by the random effect. In genetics, the random effect can be interpreted as the additive genetic component of a quantitative trait. This way, heritability is the ratio of additive genetic variance to the total phenotypic variance of the trait. It measures the extent to which individuals’ phenotypes are determined by the genes transmitted from the parents. This makes heritability a fundamental quantity of interest in genetics. This dissertation makes three contributions toward the issue of estimating both random effect and error covariance structures in a functional mixed model. First, it develops a methodology for modeling functional data from independent subjects that incorporates parametric models for error covariance structure. The methodology is evaluated using a simulation study. Its application is illustrated by analyzing a growth curve data. Next, this methodology is extended for family data where the subjects may be grouped into families and subjects from the same family are dependent. This methodology is also evaluated using simulation. Finally, it introduces the novel notion of a singular mixed model, whose further development in future may allow modeling the error covariance structure nonparametrically, enhancing the flexibility of functional mixed models.
Identification of Linear Control Systems via Gradient Descent
(December 2021) Gelir, Fatih; Ramakrishna, Viswanath; Bensoussan , Alain; Rugg, Elizabeth; Dabkowski, Mieczyslaw K.; Choudhary, Pankaj K.; Dragovic, Vladimir
In this dissertation we use gradient descent and its variations, in the spirit of machine learning to identify a linear control system. When the full state is observable, the most natural least square cost function is convex. However, when the state is partially observable, this is no longer the case. We propose two algorithms for later problem and show that the cost function decreases as the iteration proceeds. The simulations are provided to support that theoretical results. We also perform recursivity analysis when the amount of data increases. Finally we provide an asymptotic analysis when a certain natural parameter goes to infinity.
Investigation Into Higher Dimensional Rotations
(2022-12-01T06:00:00.000Z) Bal, Sabindra Singh 1981-; Zakhidov, Anvar A.; Ramakrishna, Viswanath; Cao, Yan; Dabkowski, Mieczyslaw K.; Choudhary, Pankaj K.
Axis-angle representations provides efficient methods to study three dimensional rotations. The representation imparts visualization and thus aids the analysis of a three-dimensional proper rotation by reducing its study to that of a two dimensional one. In this dissertation, we accomplish a similar result for five dimensional proper rotation by reducing its study to that of either two or four dimensional proper rotations. For a matrix in SO(5, R), we complete a closed from formula for the axis which is the fixed point set of the matrix as well as the formula for the angle which is the complementary proper rotation that the matrix performs in the orthogonal complement to the axis. In fact, two such derivations are provided. The first is based on the properties of a matrix in SO(5, R) such as the special structure of its characteristic polynomial being skew palindromic while the second utilizes the structure of the Lie algebra of the covering group. Closed form formula for the logarithm in the covering group of SO(5, R) is also derived as it is essential for the second method. Further, we study indefinite rotations with signature (1,9) and come close to establish that the group of such rotations is isomorphic to 2x2 octonion matrices with determinant 1.
Magnetic and Catalytic Properties of Lanthanide Complexes
(2021-05-01T05:00:00.000Z) Miller, Justin Todd; Stefan, Mihaela C.; Choudhary, Pankaj K.; Biewer, Michael C.; Nielsen, Steven O.; Pantano, Paul
Lanthanides are an intriguing family of elements possessing unique properties useful in many diverse applications. The first chapter of this work describes the origins of some of these properties and their catalytic and magnetic applications. The second chapter will highlight a highly unusual neodymium catalyst for diene polymerization. This coordination polymer catalyst contains no halides and makes use of no halide donor, yet produces desirable 96% 1,4- cis stereospecific material. The third chapter is concerned with the surprising formation and superparamagnetism of a neodymium-peroxide diimine cluster and the associated crystals. The cluster is formed by a rare example of anion-templated assembly in which the anion is derived from dissolved atmospheric oxygen. The resulting structural motif featured an array of tight three-metal clusters separated by a distance long enough to prevent long-range magnetic order, which resulted in superparamagnetic behavior in the solid state. This is believed to be the first report of superparamagnetism in a bulk crystal state. The fourth and final chapter is concerned with MRI contrast agents and presents an example of a new variety of potential next-generation agents composed of coordination polymers. The gadolinium diethylphosphate polymer features a far longer rotational coordination time than conventional small gadolinium complexes and thus offers dramatically improved T1 relaxation performance at low-fields common in clinical imaging applications. All of these lanthanide complexes are synthesized using an azeotropic distillation method. This method avoids the need for strict water-free techniques and also occasionally allows for novel structures to be obtained, as demonstrated in Chapter 3 in particular.
Mathematical Methods for Advanced Problems of Inventory Control
(2021-05-01T05:00:00.000Z) Helal, Md Abu; Ramakrishna, Viswanath; Bensoussan, Alain; Izen, Joseph M.; Dragovic, Vladimir; Dabkowski , Mieczyslaw K.; Choudhary, Pankaj K.
We study infinite horizon stochastic inventory problems with general demand distributions and piecewise linear concave ordering costs. Such costs arise in the important cases of quantity discounts or multiple suppliers. We consider the case of concave costs involving two linear segments. This corresponds to the case of one supplier with a fixed cost, a variable cost up to a given order quantity, and a quantity discount beyond that, or equivalently, the case of two suppliers, one with a low fixed cost along with a high variable cost and the other with a high fixed cost along with a low variable cost. It is well understood that for a stochastic inventory control problem with a fixed cost and a per-unit variable cost, an (s, S) policy is optimal when there is only one supplier. In this work we address the case of multiple suppliers under several different scenarios. We provide a rigorous mathematical proof of the optimality of several inventory control models, which will help managers make better business decisions regarding procurement policies when facing multiple supply sources and/or quantity discounts for big purchases. Broadly, there are two main areas to explore in the realm of inventory control. The first is lost sales and the second is backlog sales. Our study examines both of these crucial areas. Our analysis is concerned with the generalization of the classical (s, S) policy for general demand distributions under a variety of modifications to the classical work of Scarf [36]. In particular, for the lost sales case, we show that certain three and four parameter generalizations of the classical (s, S) policy are optimal. Our contributions consist of generalizing the demand, solving a functional Bellman equation for the value function that arises in the infinite horizon framework, and providing an explicit solution in the special case of exponential demand density. We also give conditions under which our generalizations of the (s, S) policy reduce to the standard (s, S) policy, even though there are two suppliers involved. Moreover, we provide an explicit solution for the three number policy when the demand distribution is exponential. In the other situation, we are concerned with stochastic inventory control problems with backlog sales during stockout. As was the case for lost sales, we consider both the scenario in which an optimal selection can be made among two suppliers, as well as the scenario in which inventory can be purchased with incremental quantity discounts from a single supplier. We study the problem for arbitrary demand distributions and in infinite horizon. In this case, we first spell out conditions that guarantee the optimization of (s, S) policy for the problem under consideration. If these conditions fail to holds, we also demonstrate that a generalized three parameter policy is optimal in two distinct situations.
Nonparametric Regression With the Scale Depending on Auxiliary Covariates and Missing Data
(2022-05-01T05:00:00.000Z) Jiang, Tian; Efromovich, Sam; Jia, Lin; Choudhary, Pankaj K.; Chen, Min; Chiou, Sy Han
Nonparametric curve estimation is a powerful statistical methodology which allows estimation of curves with no assumption about their shape. It provides useful insight into the nature of data and may guide further inference for specific parametric models. Considered statistical problem is a nonparametric heteroscedastic regression with auxiliary covariates and missing data. In this regression a univariate component is of the primary interest while the scale function is allowed to be dependent on both the predictor and auxiliary covariates. Missing mechanism is the missing at random (MAR), and two settings with missing responses or missing predictors are considered. The assumed MAR means that the probability of missing may depend on observed variables but not on missing variables. Developed asymptotic theory shows how the heteroscedasticity and MAR mechanism affect the constant of minimax convergence under the mean integrated squared error criterion. Further, it is shown that a procedure ignoring the scale function is not efficient and does not attain a sharp constant in the minimax lower bound. Models of missing responses and predictors are considered separately because their theory and methodology are different. For the case of missing responses, a sharp minimax and data-driven procedure is developed which is based on estimation of an unknown nuisance scale function. The estimator adapts to the MAR response mechanism and unknown smoothness of an underlying regression function. Further, efficiency is still preserved for a more general additive model with auxiliary covariates. A model with MAR predictors is dramatically more involved, and here classic regression estimators are no longer even consistent. For a model with MAR predictors a novel data-driven estimator is suggested which takes into account a scale function. This estimator is adaptive and matches performance of an oracle that knows all underlying nuisance functions. The asymptotic theory is extended to the case of a general additive model as well. The theory and methodology are tested using Monte Carlo simulation studies and real examples. The results favor the proposed methodology and support practical feasibility of the proposed methods for heteroscedastic regressions with missing data.
Prediction of Individualized Risk of Contralateral Breast Cancer
(2018-05) Chowdhury, Marzana; Choudhary, Pankaj K.; Biswas, Swati
Women diagnosed with cancer in one breast are increasingly choosing to remove their other unaffected (contralateral) breast through a surgery called contralateral prophylactic mastectomy (CPM) to reduce the risk of contralateral breast cancer (CBC). Yet a large proportion of CPMs are believed to be medically unnecessary because the risk of CBC has, in fact, gone down substantially mainly due to availability of effective therapies for breast cancer (BC), which have a preventative effect on the contralateral breast. Thus, this dramatic rise in the rate of CPMs is a particularly disturbing trend. Research shows that many BC patients tend to substantially overestimate their CBC risk. Although CPM reduces the risk of CBC, there is no convincing evidence that it prolongs survival. The surgery also has a significant number of side effects and can have an adverse effect on a woman’s health and well-being. Thus, there is a pressing need to educate patients effectively on their CBC risk. For this task, physicians need a statistical model for risk prediction of CBC based on patient’s personal risk factors. This dissertation is focused on filling this critical need. Although several risk factors for CBC are well established in the literature, one factor that is relatively less well-studied is mammographic breast density. This factor has come to the attention of the scientific community only recently and, in particular, it has been shown that increased breast density is a strong risk factor for first BC. Thus, it is of interest to study if it is associated with the risk of CBC as well. To this end, first we studied the relationship between breast density and CBC by analyzing data from Breast Cancer Surveillance Consortium (BCSC), which is a large population based source consisting of seven cancer registries across the US. We found that breast density is an independent and significant risk factor for development of CBC. In particular, breast density has a dose dependent effect on the risk of CBC, with increased breast density associated with increased risk. Next, we developed a CBC risk prediction model using data from BCSC and also Surveillance, Epidemiology, and End Results, another large population based source. We explored numerous potential risk factors for inclusion into this model. The final model consists of eight risk factors — age at first BC diagnosis, anti-estrogen therapy, family history of BC, high risk pre-neoplasia, estrogen receptor status, breast density, type of first BC, and age at first birth. Combining the relative risk estimates of these factors with the relevant hazard rates, our model, named CBCRisk, projects absolute risk of developing CBC over a given period. Finally, we validated CBCRisk on clinical datasets from the MD Anderson Cancer Center and Johns Hopkins University. We computed the relevant calibration and validation measures, and found that the model performs reasonably well for both datasets. With independent validation, CBCRisk can be used confidently in clinical settings in counseling BC patients by providing their individualized CBC risk. In turn, this may potentially help alleviate the rate of medically unnecessary CPMs.
Risk-associated Inferences in Survival Analysis: a Study on Adequacy of the Cox Model and Isotonic Proportional Hazard Models
(May 2023) Chen, Huan 1992-; Tang, Chuan-Fa; Zhang, Fan; Chen, Min; Chiou, Sy Han Steven; Choudhary, Pankaj K.
Survival analysis is a powerful statistical methodology utilized in various disciplines to study time-to-event data, particularly when the time to the event of interest is censored. The Cox model proposed by (Cox, 1972), also called the proportional hazards model, has been one of the most frequently used models for survival analysis due to its flexibility without specifying error distribution. However, the proportionality and log-linearity assumptions limit the use of the model and therefore have led to the development of more adaptable models, such as the isotonic proportional hazards model proposed by (Chung et al., 2018), which relaxes the log linearity to the more flexible monotonicity assumption. This dissertation proposes goodness-of-fit tests for the Cox model under the constraints that the hazard func- tion is monotonic with respect to continuous covariates. The proposed tests are diagnostics for the log-linearity assumption for the Cox model, while the minor assumption about the monotonicity relationship between the hazard and the covariate is satisfied. Three popular scenarios regarding the continuous covariate of interest are discussed to develop the test, including time-independent univariate covariate, time-independent univariate covariate together with additional linear-effect covariates, and time-dependent univariate covariate. We proposed partial-likelihood-ratio-based tests with bootstrapped critical values for the Cox model versus the isotonic proportional hazards model. Simulation studies under each scenario show the controlled type-I error and the consistency of the test. Several data examples have been discussed to apply the goodness-of-fit tests, including the heart failure clinical records data, lung cancer data from North Central Cancer Treatment Group, breast cancer data from the German Breast Cancer Study Group, monoclonal gammopathy data and the AIDS Clinical Trials Grup 175 dataset.
Testing Quasi-independence With Survival Tree Approaches
(2022-08-01T05:00:00.000Z) Guo, Qi; Chiou, Sy Han; Pantano, Paul; Chen, Min; Choudhary, Pankaj K.; Shin, Sunyoung
This dissertation aims to address two common issues in survival analysis. First, we develop a powerful quasi-independence test for survival and recruitment times. The proposed test extends existing permutation tests by incorporating cutting-edge survival tree algorithms to achieve high power in detecting various quasi-dependence scenarios. Second, we explore tools from frailty models, recurrent event models, and meta-analysis while considering the longitudinal information collected throughout the last decade to understand the risk of injury and re-injury. The dissertation is organized as follows. Chapter 2 develops the tree-based method to test the quasi-independence between left trun- cation time and survival time. Individuals who experienced the event prior to when the study began are left-truncated. Quasi-independence of truncation and event times refers to a factorization of their joint density in the observable region (the event time is no less than the left truncation time). Unlike the assumption of independence between censoring time and event time, the assumption of quasi-independence between truncation time and event time can be tested. Existing tools such as the conditional Kendall’s tau test are pow- erful in detecting monotonic dependencies. Extensions of existing quasi-independent tests have been proposed to detect non-monotonic alternatives. Thus, the two minimum p-value tests (minp1 and minp2 tests) are also proposed to test simple non-monotonic dependencies. In this paper, we develop a powerful and computationally fast tree-based p-value test for quasi-independence where the data has complicated non-monontonic dependencies. We also developed different permutation and bootstrap methods to approximate the null distribution of the tree-based test. Chapter 3 assesses the aggregated overall effect of different covariates on injury risk during the eight competitive seasons for soccer players in German Bundesliga. One of the standard approaches in modeling the risk in recurrent event data is to model the rate function. We start by stratifying the injury data by season where we assume that the eight Bundesliga seasons are mutually independent, and the seasons are not interacting with any of the co- variates. We then apply the Anderson-Gill (AG) model to find the regression coefficients of the covariates without accounting for overlapping players thus ignoring between seasons dependence. To solve this issue, we apply the AG model for each season and aggregate the results by meta-analysis. However, the common meta-analysis models such as the fixed effects model, and the random effects model assume that there are no overlapping subjects between studies. Since our data has overlapping players between seasons, we developed a novel meta-analysis model which can handle our injury data. This new model suggests that the presence of a pre-season injury, and additional game-time during a match might be associated with a lower risk of getting a regular season injury. As the number of previ- ous injuries increases, there might be a higher risk of getting a regular season injury. We found that the Goalkeeper position is less likely to get injured compared to the Defender position. In this chapter, we also take into account that the playing condition of a soccer player will go from “healthy” to “injured” multiple times (alternating gap times) within a season. The semiparametric estimation approach under the accelerated failure time (AFT) model was used to evaluate the covariate effects on the two alternating states. To get the aggregated overall effect, we applied a fixed effects model. From this analysis, we find that the presence of pre-season injuries and frequent injuries in the past imply shorter healthy times. Additionally, increased game-time during a season might be associated with longer healthy times and shorter injury times.