Testing Quasi-independence With Survival Tree Approaches





Journal Title

Journal ISSN

Volume Title




This dissertation aims to address two common issues in survival analysis. First, we develop a powerful quasi-independence test for survival and recruitment times. The proposed test extends existing permutation tests by incorporating cutting-edge survival tree algorithms to achieve high power in detecting various quasi-dependence scenarios. Second, we explore tools from frailty models, recurrent event models, and meta-analysis while considering the longitudinal information collected throughout the last decade to understand the risk of injury and re-injury. The dissertation is organized as follows. Chapter 2 develops the tree-based method to test the quasi-independence between left trun- cation time and survival time. Individuals who experienced the event prior to when the study began are left-truncated. Quasi-independence of truncation and event times refers to a factorization of their joint density in the observable region (the event time is no less than the left truncation time). Unlike the assumption of independence between censoring time and event time, the assumption of quasi-independence between truncation time and event time can be tested. Existing tools such as the conditional Kendall’s tau test are pow- erful in detecting monotonic dependencies. Extensions of existing quasi-independent tests have been proposed to detect non-monotonic alternatives. Thus, the two minimum p-value tests (minp1 and minp2 tests) are also proposed to test simple non-monotonic dependencies. In this paper, we develop a powerful and computationally fast tree-based p-value test for quasi-independence where the data has complicated non-monontonic dependencies. We also developed different permutation and bootstrap methods to approximate the null distribution of the tree-based test. Chapter 3 assesses the aggregated overall effect of different covariates on injury risk during the eight competitive seasons for soccer players in German Bundesliga. One of the standard approaches in modeling the risk in recurrent event data is to model the rate function. We start by stratifying the injury data by season where we assume that the eight Bundesliga seasons are mutually independent, and the seasons are not interacting with any of the co- variates. We then apply the Anderson-Gill (AG) model to find the regression coefficients of the covariates without accounting for overlapping players thus ignoring between seasons dependence. To solve this issue, we apply the AG model for each season and aggregate the results by meta-analysis. However, the common meta-analysis models such as the fixed effects model, and the random effects model assume that there are no overlapping subjects between studies. Since our data has overlapping players between seasons, we developed a novel meta-analysis model which can handle our injury data. This new model suggests that the presence of a pre-season injury, and additional game-time during a match might be associated with a lower risk of getting a regular season injury. As the number of previ- ous injuries increases, there might be a higher risk of getting a regular season injury. We found that the Goalkeeper position is less likely to get injured compared to the Defender position. In this chapter, we also take into account that the playing condition of a soccer player will go from “healthy” to “injured” multiple times (alternating gap times) within a season. The semiparametric estimation approach under the accelerated failure time (AFT) model was used to evaluate the covariate effects on the two alternating states. To get the aggregated overall effect, we applied a fixed effects model. From this analysis, we find that the presence of pre-season injuries and frequent injuries in the past imply shorter healthy times. Additionally, increased game-time during a season might be associated with longer healthy times and shorter injury times.