Browsing by Author "Griffith, Daniel A."

Now showing 1 - 12 of 12

A Spatial-Filtering Zero-Inflated Approach to the Estimation of the Gravity Model of Trade
(MDPI) Metulini, Rodolfo; Patuelli, Roberto; Griffith, Daniel A.; 0000-0001-5125-6450 (Griffith, DA); 14855602 (Griffith, DA); Griffith, Daniel A.
Nonlinear estimation of the gravity model with Poisson-type regression methods has become popular for modelling international trade flows, because it permits a better accounting for zero flows and extreme values in the distribution tail. Nevertheless, as trade flows are not independent from each other due to spatial and network autocorrelation, these methods may lead to biased parameter estimates. To overcome this problem, eigenvector spatial filtering (ESF) variants of the Poisson/negative binomial specifications have been proposed in the literature on gravity modelling of trade. However, no specific treatment has been developed for cases in which many zero flows are present. This paper contributes to the literature in two ways. First, by employing a stepwise selection criterion for spatial filters that is based on robust (sandwich) p-values and does not require likelihood-based indicators. In this respect, we develop an ad hoc backward stepwise function in R. Second, using this function, we select a reduced set of spatial filters that properly accounts for importer-side and exporter-side specific spatial effects, as well as network effects, both at the count and the logit processes of zero-inflated methods. Applying this estimation strategy to a cross-section of bilateral trade flows between a set of 64 countries for the year 2000, we find that our specification outperforms the benchmark models in terms of model fitting, both considering the AIC and in predicting zero (and small) flows.
Eigenvector Spatial Filtering for Large Data Sets: Fixed and Random Effects Approaches
(Wiley, 2018-03-25) Murakami, Daisuke; Griffith, Daniel A.; 0000-0001-5125-6450 (Griffith, DA); 14855602 (Griffith, DA); Griffith, Daniel A.
Eigenvector spatial filtering (ESF) is a spatial modeling approach, which has been applied in urban and regional studies, ecological studies, and so on. However, it is computationally demanding, and may not be suitable for large data modeling. The objective of this study is developing fast ESF and random effects ESF (RE-ESF), which are capable of handling very large samples. To achieve it, we accelerate eigen-decomposition and parameter estimation, which make ESF and RE-ESF slow. The former is accelerated by utilizing the Nystrom extension, whereas the latter is by small matrix tricks. The resulting fast ESF and fast RE-ESF are compared with nonapproximated ESF and RE-ESF in Monte Carlo simulation experiments. The result shows that, while ESF and RE-ESF are slow for several thousand samples, fast ESF and RE-ESF require only several seconds for the samples. It is also suggested that the proposed approaches effectively remove positive spatial dependence in the residuals with very small approximation errors when the number of eigenvectors considered is 200 or more. Note that these approaches cannot deal with negative spatial dependence. The proposed approaches are implemented in an R package "spmoran."
Geovisualizing Attribute Uncertainty of Interval and Ratio Variables: A Framework and an Implementation for Vector Data
(2017-12-14) Koo, Hyeongmo; Chun, Yongwan; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 14855602 (Griffith, DA); Griffith, Daniel A.
This is a prototype implementation for attribute uncertainty visualization based on bivariate. Specifically, the uncertainty visualizations implemented based on three different ways. First, an overlaid symbols on a choropleth map (OSCM) strategy is implemented to visualize attribute uncertainty. A choropleth map is used to represent attributes at the ratio scale, and additional overlaid symbols, such as textures (spacing), circles (size), and bars (size), visualize attribute uncertainty Second, a coloring properties to proportional symbols (CPPS) strategy is applied. A proportional symbol map is more appropriate to represent raw counts or frequencies, and attribute uncertainty can be represented by color saturation and color value in the hue-saturation-value (HSV) color model of proportional symbols. Finally, a composite symbols (CS) strategy is utilized to represent the possible range of an attribute value with its confidence interval. Symbols in CS are constructed with three different sizes of symbol overlaid for each individual location. Two of these symbols represent uncertainty by visualizing the upper and lower limits of attribute values for a given confidence level. Thus, the CS strategy allows users to directly compare uncertainties with corresponding attribute values and their confidence intervals. The ESRI ArcGIS add-in installation file is compatible with ArcGIS 10.x, and developed in .NET framework 4.5 and ArcObject 10.5. It requires Microsoft Windows Vista or higher.
Implementing Moran Eigenvector Spatial Filtering for Massively Large Georeferenced Datasets
(Taylor And Francis Ltd.) Griffith, Daniel A.; Chun, Yongwan; 0000-0001-5125-6450 (Griffith, DA); 0000-0002-4957-1379 (Chun, Y); 14855602 (Griffith, DA); 297769863 (Chun, Y); Griffith, Daniel A.; Chun, Yongwan
Moran eigenvector spatial filtering (MESF) furnishes an alternative method to account for spatial autocorrelation in linear regression specifications describing georeferenced data, although spatial auto-models also are widely used. The utility of this MESF methodology is even more impressive for the non-Gaussian models because its flexible structure enables it to be easily applied to generalized linear models, which include Poisson, binomial, and negative binomial regression. However, the implementation of MESF can be computationally challenging, especially when the number of geographic units, n, is large, or massive, such as with a remotely sensed image. This intensive computation aspect has been a drawback to the use of MESF, particularly for analyzing a remotely sensed image, which can easily contain millions of pixels. Motivated by Curry, this paper proposes an approximation approach to constructing eigenvector spatial filters (ESFs) for a large spatial tessellation. This approximation is based on a divide-and-conquer approach. That is, it constructs ESFs separately for each sub-region, and then combines the resulting ESFs across an entire remotely sensed image. This paper, employing selected specimen remotely sensed images, demonstrates that the proposed technique provides a computationally efficient and successful approach to implement MESF for large or massive spatial tessellations. ©2019 Informa
Spatial Autocorrelation for Massive Spatial Data: Verification of Efficiency and Statistical Power Asymptotics
(Springer Verlag) Luo, Q.; Griffith, Daniel A.; Wu, H.; 0000-0001-5125-6450 (Griffith, DA); 14855602 (Griffith, DA); Griffith, Daniel A.
Being a hot topic in recent years, many studies have been conducted with spatial data containing massive numbers of observations. Because initial developments for classical spatial autocorrelation statistics are based on rather small sample sizes, in the context of massive spatial datasets, this paper presents extensions to efficiency and statistical power comparisons between the Moran coefficient and the Geary ratio for different variable distribution assumptions and selected geographic neighborhood definitions. The question addressed asks whether or not earlier results for small n extend to large and massively large n, especially for non-normal variables; implications established are relevant to big spatial data. To achieve these comparisons, this paper summarizes proofs of limiting variances, also called asymptotic variances, to do the efficiency analysis, and derives the relationship function between the two statistics to compare their statistical power at the same scale. Visualization of this statistical power analysis employs an alternative technique that already appears in the literature, furnishing additional understanding and clarity about these spatial autocorrelation statistics. Results include: the Moran coefficient is more efficient than the Geary ratio for most surface partitionings, because this index has a relatively smaller asymptotic as well as exact variance, and the superior power of the Moran coefficient vis-à-vis the Geary ratio for positive spatial autocorrelation depends upon the type of geographic configuration, with this power approaching one as sample sizes become increasingly large. Because spatial analysts usually calculate these two statistics for interval/ration data, this paper also includes comments about the join count statistics used for nominal data. ©2019 Springer-Verlag GmbH Germany, part of Springer Nature.
Spatially Explicit Machine Learning Approaches for House Price Models
(May 2023) Chen, Meifang 1989-; Cordell, Rebecca; Chun, Yongwan; Griffith, Daniel A.; Kim, Dohyeong; Qiu, Fang
Spatial data or georeferenced data are special in that it has spatial reference, meaning that it is linked with geographic coordinates on Earth. The spatial component allows for the identification of spatial patterns, relationships and trends among spatial objects. Spatial objects are usually not randomly or independently distributed, but spatially autocorrelated. In spatial data analysis, spatial autocorrelation has been well recognized with the advocate of spatial statistical techniques, such as spatial clustering, spatial interpolation, spatial regression, and spatial simulation. However, spatial effects or spatial context is largely absent in mainstream machine learning methods. With the popularity of machine learning in various applications in both industry and academia, a new research area has emerged in the spatial community: spatial explicit machine learning. It refers to the use of machine learning algorithms to analyze and predict spatial data with the explicit integration of spatial effects or patterns. It is expected to improve the model accuracy and prediction by incorporating spatial relationships or patterns in the data that have not been captured by traditional machine learning models and, subsequently, to gain better understanding of the data generation mechanism. This research utilizes Franklin County, OH residential house transaction data to explore three different data-driven approaches to integrate spatial perspectives into traditional machine learning algorithms: 1) imposing spatial constraints on unsupervised learning to delineate spatially constrained housing submarkets ; 2) integrating spatial weights into the cost function of supervised learning to improve house price prediction accuracy; and 3) enhancing data input using spatial feature engineering in tree-based ensemble learning for modeling multiscale spatial effects. It intends to contribute new insights for spatially explicit machine learning to the literature. Overall, three studies explore spatially explicit machine learning methods from three different aspects, and the empirical results show that spatially explicit machine learning methods are preferred over traditional machine learning methods when data have strong positive spatial autocorrelation, or more general, data include spatial information that is important for classification, clustering, or prediction tasks.
Spatially Simplified Scatterplots for Large Raster Datasets
(Taylor & Francis, 2016-05-24) Bin, Li; Griffith, Daniel A.; Becker, Brian; 0000 0001 0872 2508 (Griffith, DA); Griffith, Daniel A.
Scatterplots are essential tools for data exploration. However, this tool poorly scales with data-size, with overplotting and excessive delay being the main problems. Generalization methods in the attribute domain focus on visual manipulations, but do not take into account the inherent nature of information redundancy in most geographic data. These methods may also result in alterations of statistical properties of data. Recent developments in spatial statistics, particularly the formulation of effective sample size and the fast approximation of the eigenvalues of a spatial weights matrix, make it possible to assess the information content of a georeferenced data-set, which can serve as the basis for resampling such data. Experiments with both simulated data and actual remotely sensed data show that an equivalent scatterplot consisting of point clouds and fitted lines can be produced from a small subset extracted from a parent georeferenced data-set through spatial resampling. The spatially simplified data subset also maintains key statistical properties as well as the geographic coverage of the original data.
The Importance of Scale in Spatially Varying Coefficient Modeling
(Routledge Journals, Taylor & Francis Ltd, 2018-02) Murakami, Daisuke; Lu, Binbin; Harris, Paul; Brunsdon, Chris; Charlton, Martin; Nakaya, Tomoki; Griffith, Daniel A.; 0000-0001-5125-6450 (Griffith, DA); 14855602 (Griffith, DA); Griffith, Daniel A.
Although spatially varying coefficient (SVC) models have attracted considerable attention in applied science, they have been criticized as being unstable. The objective of this study is to show that capturing the "spatial scale" of each data relationship is crucially important to make SVC modeling more stable and, in doing so, adds flexibility. Here, the analytical properties of six SVC models are summarized in terms of their characterization of scale. Models are examined through a series of Monte Carlo simulation experiments to assess the extent to which spatial scale influences model stability and the accuracy of their SVC estimates. The following models are studied: (1) geographically weighted regression (GWR) with a fixed distance or (2) an adaptive distance bandwidth (GWRa); (3) flexible bandwidth GWR (FB-GWR) with fixed distance or (4) adaptive distance bandwidths (FB-GWRa); (5) eigenvector spatial filtering (ESF); and (6) random effects ESF (RE-ESF). Results reveal that the SVC models designed to capture scale dependencies in local relationships (FB-GWR, FB-GWRa, and RE-ESF) most accurately estimate the simulated SVCs, where RE-ESF is the most computationally efficient. Conversely, GWR and ESF, where SVC estimates are naively assumed to operate at the same spatial scale for each relationship, perform poorly. Results also confirm that the adaptive bandwidth GWR models (GWRa and FB-GWRa) are superior to their fixed bandwidth counterparts (GWR and FB-GWR).
The Use Of Spatially Imputed Variables: Four Papers Addressing Implications Of Imputation-Based Measurement Error In A Spatial Context
(2020-11-19) Liau, Yan-Ting; Griffith, Daniel A.
Missing data point out information loss, compared to the complete dataset. Across disciplines, extensive studies emphasize the difference between real values and imputed values(i.e., the imputation-based measurement error). Comparing the goodness of fit and/or the error expects to select the best imputation model. However, the best imputation model may not handle large missing spatial data(over a 90% missing rate or more). Substituting missing data by imputed values may cause misleading regression inferences. In the aspatial context, the imputation-based measurement error models illustrate the challenges. To our knowledge, few studies assess the impacts of spatial imputations. On the one hand, making the best use of spatial autocorrelation (SA) may compensate for the data loss error(i.e., real values − available data). On the other hand, specifying SA may also introduce the imputation specification error(i.e., correctly specified imputed values − misspecified imputed values). The major contribution of this dissertation is to formulate the spatial measurement error model. The goal is to explain how spatially imputed values influence (spatial) regression analysis, compared to the complete data. The key challenge for assessing spatial imputation is the imputation specification error. In the aspatial model, the data loss error causes biased results when imputed values are used in a response variable. The results using imputed values in a covariate remain unbiased. When the imputation specification error is introduced, the results may become biased. Theoretically, the imputation specification error’s performance is like the classical measurement error because of incorrect measurements. Combining the imputation specification error with the data loss error further complicates the assessments. Like the aspatial model, the location, where the spatially imputed values are used in (spatial) regression analysis, is influential. In the spatial model, the comparison between the two error sources performs differently when spatially imputed values are used in a covariate and/or a response variable. A few empirical experiments support the combined influences other than theoretical derivatives. Fortunately, effective information, summarized from previous literature with good imputation performance, can reduce biased results. The results also prove effective information to adjust biased results. The effect is especially for substituting missing spatial data in a response variable. However, when spatially imputed values are used in a response variable, even though ancillary variables can reduce biased results, misspecifications on SA may introduce biased results, which ancillary variables cannot fix. The influence is like the increasing data loss error, instead of the imputation specification error. When spatially imputed values are used in a covariate, the more considerable challenge is the imputation specification error due to different specification strategies to specify ancillary variables. SA specifications, even though also introduce the imputation specification error, their influence on the biased results is not influential. To sum up, this dissertation presents the influences of spatially imputed values for (spatial) regression analysis. Especially, substituting missing data in a response variable relies on the good choice of spatial imputation. Nevertheless, the choice of spatial imputations for missing data in a covariate is not influential.
Uncertainty and Context in GIScience and Geography: Challenges in the Era of Geospatial Big Data
(Taylor & Francis Ltd, 2019-01-17) Chun, Yongwan; Kwan, Mei-Po; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 0000-0001-5125-6450 (Griffith, DA); 297769863 (Chun, Y); 14855602 (Griffith, DA); Chun, Yongwan; Griffith, Daniel A.
No abstract available.
Uncertainty in the Effects of the Modifiable Areal Unit Problem under Different Levels of Spatial Autocorrelation: A Simulation Study
(Taylor & Francis Ltd, 2018-11-13) Lee, Sang-Il; Lee, Monghyeon; Chun, Yongwan; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 0000-0001-5125-6450 (Griffith, DA); 297769863 (Chun, Y); 14855602 (Griffith, DA); Chun, Yongwan; Griffith, Daniel A.
The objective of this paper is to investigate uncertainties surrounding relationships between spatial autocorrelation (SA) and the modifiable areal unit problem (MAUP) with an extensive simulation experiment. Especially, this paper aims to explore how differently the MAUP behaves for the level of SA focusing on how the initial level of SA at the finest spatial scale makes a significant difference to the MAUP effects on the sample statistics such as means, variances, and Moran coefficients (MCs). The simulation experiment utilizes a random spatial aggregation (RSA) procedure and adopts Moran spatial eigenvectors to simulate different SA levels. The main findings are as follows. First, there are no substantive MAUP effects for means. However, the initial level of SA plays a role for the zoning effect, especially when extreme positive SA is present. Second, there is a clear and strong scale effect for the variances. However, the initial SA level plays a non-negligible role in how this scale effect deploys. Third, the initial SA level plays a crucial role in the nature and extent of the MAUP effects on MCs. A regression analysis confirms that the initial SA level makes a substantial difference to the variability of the MAUP effects.
Validation of a Remote Sensing Model to Identify Simulium damnosum s.l. Breeding Sites in Sub-Saharan Africa
(2013-07-25) Jacob, Benjamin G.; Novak, Robert J.; Toe, Laurent D.; Sanfo, Moussa; Griffith, Daniel A., 1948-; Lakwo, Thomson L.; Habomugisha, Peace; Katabarwa, Moses N.; Unnasch, Thomas R.; Griffith, Daniel A.
Background: Recently, most onchocerciasis control programs have begun to focus on elimination. Developing an effective elimination strategy relies upon accurately mapping the extent of endemic foci. In areas of Africa that suffer from a lack of infrastructure and/or political instability, developing such accurate maps has been difficult. Onchocerciasis foci are localized near breeding sites for the black fly vectors of the infection. The goal of this study was to conduct ground validation studies to evaluate the sensitivity and specificity of a remote sensing model developed to predict S. damnosum s.l. breeding sites. Methodology/Principal Findings: Remote sensing images from Togo were analyzed to identify areas containing signature characteristics of S. damnosum s.l. breeding habitat. All 30 sites with the spectral signature were found to contain S. damnosum larvae, while 0/52 other sites judged as likely to contain larvae were found to contain larvae. The model was then used to predict breeding sites in Northern Uganda. This area is hyper-endemic for onchocerciasis, but political instability had precluded mass distribution of ivermectin until 2009. Ground validation revealed that 23/25 sites with the signature contained S. damnosum larvae, while 8/10 sites examined lacking the signature were larvae free. Sites predicted to have larvae contained significantly more larvae than those that lacked the signature. Conclusions/Significance: This study suggests that a signature extracted from remote sensing images may be used to predict the location of S. damnosum s.l. breeding sites with a high degree of accuracy. This method should be of assistance in predicting communities at risk for onchocerciasis in areas of Africa where ground-based epidemiological surveys are difficult to implement.