Robust Analysis of Non-Parametric Space-Time Clustering
Recently, the rampant growth of various remote sensing technologies has resulted in a spike of interest in space-time data mining and particularly clustering of environmental time series and spatio-temporal processes. Remarkably, the dynamic data-driven clustering procedures for space-time data that allow the number, shape and distributional properties of clusters to vary, have received a flare of interest in recent years. Despite the potential of the dynamic data-driven clustering procedures, the price for their flexibility is usually a set of parameters that control clustering performance and are to be user-specified – for instance, the value similarity threshold in TRUST; the maximum radius of the neighborhood Eps in DBSCAN; the steepness parameter ⇠ in OPTICS; and the kernel smoothing parameter h in DENCLUE. The choice of these parameters can noticeably impact the number and shape of detected clusters, and ideally should be approached in an objective manner. The goal of this dissertation is to address those challenges by developing new nonparametric data-driven approaches in space-time clustering. First, we propose a new data-driven procedure for optimal selection of these tuning parameters in dynamic clustering algorithms, using the notion of stability probe. We study finite sample performance of DR in conjunction with DBSCAN and TRUST in application to clustering synthetic times series and yearly temperature records in Central Germany. We also utilized DR in studying the ecological trends and water quality in Chesapeake Bay and legislative rhetoric data in the U.S. Senate. Second, when it comes to optimal selection of tuning parameters in density-based clustering procedures such as DBSCAN, OPTICS, and DENCLUE, some additional problems such as existence of clusters with varied densities and existence of outliers need to be addressed. Therefore, we develop a new density-based clustering algorithm named CRAD which is based on a new neighbor searching function with a robust data depth as the dissimilarity measure. Our experiments prove that the new CRAD is highly competitive at detecting clusters with varying densities, compared with the existing algorithms such as DBSCAN, OPTICS and DBCA.