The Use Of Spatially Imputed Variables: Four Papers Addressing Implications Of Imputation-Based Measurement Error In A Spatial Context



Journal Title

Journal ISSN

Volume Title



Missing data point out information loss, compared to the complete dataset. Across disciplines, extensive studies emphasize the difference between real values and imputed values(i.e., the imputation-based measurement error). Comparing the goodness of fit and/or the error expects to select the best imputation model. However, the best imputation model may not handle large missing spatial data(over a 90% missing rate or more). Substituting missing data by imputed values may cause misleading regression inferences. In the aspatial context, the imputation-based measurement error models illustrate the challenges. To our knowledge, few studies assess the impacts of spatial imputations. On the one hand, making the best use of spatial autocorrelation (SA) may compensate for the data loss error(i.e., real values − available data). On the other hand, specifying SA may also introduce the imputation specification error(i.e., correctly specified imputed values − misspecified imputed values). The major contribution of this dissertation is to formulate the spatial measurement error model. The goal is to explain how spatially imputed values influence (spatial) regression analysis, compared to the complete data. The key challenge for assessing spatial imputation is the imputation specification error. In the aspatial model, the data loss error causes biased results when imputed values are used in a response variable. The results using imputed values in a covariate remain unbiased. When the imputation specification error is introduced, the results may become biased. Theoretically, the imputation specification error’s performance is like the classical measurement error because of incorrect measurements. Combining the imputation specification error with the data loss error further complicates the assessments. Like the aspatial model, the location, where the spatially imputed values are used in (spatial) regression analysis, is influential. In the spatial model, the comparison between the two error sources performs differently when spatially imputed values are used in a covariate and/or a response variable. A few empirical experiments support the combined influences other than theoretical derivatives. Fortunately, effective information, summarized from previous literature with good imputation performance, can reduce biased results. The results also prove effective information to adjust biased results. The effect is especially for substituting missing spatial data in a response variable. However, when spatially imputed values are used in a response variable, even though ancillary variables can reduce biased results, misspecifications on SA may introduce biased results, which ancillary variables cannot fix. The influence is like the increasing data loss error, instead of the imputation specification error. When spatially imputed values are used in a covariate, the more considerable challenge is the imputation specification error due to different specification strategies to specify ancillary variables. SA specifications, even though also introduce the imputation specification error, their influence on the biased results is not influential. To sum up, this dissertation presents the influences of spatially imputed values for (spatial) regression analysis. Especially, substituting missing data in a response variable relies on the good choice of spatial imputation. Nevertheless, the choice of spatial imputations for missing data in a covariate is not influential.



Missing observations (Statistics), Autocorrelation (Statistics), Spatial analysis (Statistics)