Browsing by Author "Chun, Yongwan"

Now showing 1 - 15 of 15

Building Recognizability in Urban Environments
(2019-05) Tsang, Yuen Ting Yolanda; Chun, Yongwan
One of the main scopes of geospatial information sciences is providing the necessary tools and techniques to better understand the interaction between humans and their surrounding environments. As the recognition of buildings in an environment leads to the interaction between human and their surroundings, it can be also studied through the lens of geospatial information sciences. This paper uses a quantitative survey and regression analysis to demonstrate a quantitative approach to predict factors that influence visual recognition or recognizability of buildings in an urban environment. Distance away from buildings, presence of vegetation, frequent downtown visits, and physical forms of buildings contribute significantly to the visual recognition of urban buildings. The result can be beneficial to urban planners, architects, city planners, urban geographers, and city tourism board for better integrating vegetation and buildings in a cityscape. The ultimate goal of understanding people’s visual recognition and perception of urban objects is to raise inhabitant’s satisfaction, capture their attention, and make strong impressions towards the city.
Cloud Detection and PM2.5 Estimation Using Machine Learning
(2021-12-01T06:00:00.000Z) Yu, Xiaohe; Lary, David J.; Hicks, Donald A.; Chun, Yongwan; Yuan, May; Qiu, Fang
Earth observation (EO) is the gathering of information about the physical, chemical, and biological systems of the planet via remote-sensing technologies, supplemented by Earthsurveying techniques, which encompasses the collection, analysis, and presentation of data. Research on exploring effective methods for earth observation data analysis has increased over the years because of the increasing amount of data generated by earth observation systems, such as remote sensing imagery and weather radars. Researchers have therefore taken an interest in machine learning, a technique that allows computer algorithms to learn from samples. In general, the more comprehensive our training samples are, the better the machine learning performance will be. This feature makes machine learning an ideal approach for analyzing earth observation data. Particulate matter of fine size, such as particulate matter 2.5 (PM2.5), poses a severe health risk to humans and is associated with many different health problems. PM2.5 concentrations are influenced by factors such as meteorological conditions, local population density, and the geographic context. As a result of the large quantity of information provided by Earth observation, they become a valuable tool for studying PM2.5. They are huge and come from different platforms, with different spatial and temporal resolutions, and in different formats, which challenge the approaches for PM2.5 studies. This dissertation shows how machine learning methods can be used to address these challenges in three subtopics connected to modeling and estimation for PM2.5. Satellite-based remote sensing products provide important variables that can be used to study regional and global PM2.5, such as the Aerosol Optical Depth (AOD). Nevertheless, AOD products in cloudy areas cannot be retrieved, and the quality of AOD data in nearby cloud areas cannot be guaranteed. Accordingly, the first study aims to detect cloud pixels based on remote sensing images. This study investigates the cloud detection with a set of machine learning models on four subsets of 88 Landsat8 images that have been carefully labelled by analysts. Four subsets of training data are used to train 16 machine learning models with different input feature selections. The performance of these models is then compared with that of the Fmask algorithm, which is widely used for cloud detection. When testing on the 88 annotated images, the best performance was observed with a model that incorporates unsupervised self-organizing map (SOM) classification results among the input features. In comparison with Fmask4.0, the model improves the correctness by 10.11% and reduces the cloud omission error by 6.39%. Focusing on the other 8 independent validation images that were never sampled as part of the model training, the model trained on the second largest training subset with additional 5 input features has the best overall performance. Compared with Fmask4.0, this model improves the overall correctness by 3.26% and reduces the cloud omission error by 1.28%. In the second study, high temporal resolution PM2.5 models are developed based on data from weather radar systems and the meteorological data from the European Centre for MediumRange Weather Forecasts (ECMWF). A dataset covering the period from July 2019 to June 2021 was collected for model training, which included the Next Generation Weather Radar (NEXRAD) retrieved from a repository on Amazon Web Services (AWS), meteorological data from ECMWF, and the PM2.5 ground observations from 31 sensors deployed across Dallas county, Collin county, and Tarrant county. The models are classified in groups to demonstrate the effectiveness of NEXRAD in high temporal PM2.5 modeling. The model utilizing NEXRAD data achieves an 0.855 score of the correlation of determination (R2 ), while the model without NEXRAD has a 0.7 R2 for PM2.5. The third study establishes a nationwide PM2.5 estimation model by using high temporal resolution AOD data from the GOES-16 geostationary satellite, meteorological variables from ECMWF and a set of ancillary data from a variety of sources, which achieves 3.0µg/m3 and 5.8 µg/m3 as the value of mean absolute error (MAE) and root mean square error (RMSE). The model performances are then further evaluated by time, elevation, soil order, population density, and lithology. The historical PM2.5 estimation surfaces are then reconstructed and the PM2.5 surfaces during the period of California Santa Clara Unite (SCU) Lightning Complex fires are demonstrated.
Environment and Anthropogenic Activities Influence Cetacean Habitat Use in Southeastern Brazil
(Inter-Research, 2019-05-09) Tardin, R. H.; Chun, Yongwan; Jenkins, C. N.; Maciel, I. S.; Simão, S. M.; Alves, M. A. S.; 0000-0002-4957-1379 (Chun, Y); 297769863 (Chun, Y); Chun, Yongwan
Investigating the influence of coastal development on marine environments is a priority to maintain healthy seas. Cetaceans are top predators, keystone and umbrella species and thus are good candidate models to evaluate the extent of anthropogenic impacts on coastal habitats. We employed a generalized linear model with spatial eigenvector mapping (SEV-GLM) to understand the influence of environmental and anthropogenic activities on migrant (humpback whale Megaptera novaeangliae) and non-migrant (Bryde’s whale Balaenoptera brydei and common bottlenose dolphin Tursiops truncatus) cetacean habitat use off Cabo Frio, Rio de Janeiro, Brazil. We hypothesized that both environmental and anthropogenic activities influence their habitat use. Data were collected during 118 boat trips between December 2010 and June 2014. The best SEV-GLM predicted humpback whales would increase linearly with distance to coast, with minimum sea surface temperature (SST) around 19.4-19.8°C and maximum SST around 25.5-26°C, with low variations in chlorophyll a (chl a) concentrations. The model also predicted that humpback whales would occur up to 10 km from diving areas, increasing linearly with distance to fishing grounds. The best non-migrant cetacean SEV-GLM predicted that they would occur more frequently around depths from 30-60 m, increasing with low SST and high chl a concentration. For the anthropogenic component, the model predicted that non-migrant cetaceans would occur up to 10 km from fishing grounds. Our study modeled the influence of anthropogenic activities on cetaceans, and indicates specific priority areas for cetacean conservation, contributing at a local and national scale. © Inter-Research 2019
Geovisualizing Attribute Uncertainty of Interval and Ratio Variables: A Framework and an Implementation for Vector Data
(2017-12-14) Koo, Hyeongmo; Chun, Yongwan; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 14855602 (Griffith, DA); Griffith, Daniel A.
This is a prototype implementation for attribute uncertainty visualization based on bivariate. Specifically, the uncertainty visualizations implemented based on three different ways. First, an overlaid symbols on a choropleth map (OSCM) strategy is implemented to visualize attribute uncertainty. A choropleth map is used to represent attributes at the ratio scale, and additional overlaid symbols, such as textures (spacing), circles (size), and bars (size), visualize attribute uncertainty Second, a coloring properties to proportional symbols (CPPS) strategy is applied. A proportional symbol map is more appropriate to represent raw counts or frequencies, and attribute uncertainty can be represented by color saturation and color value in the hue-saturation-value (HSV) color model of proportional symbols. Finally, a composite symbols (CS) strategy is utilized to represent the possible range of an attribute value with its confidence interval. Symbols in CS are constructed with three different sizes of symbol overlaid for each individual location. Two of these symbols represent uncertainty by visualizing the upper and lower limits of attribute values for a given confidence level. Thus, the CS strategy allows users to directly compare uncertainties with corresponding attribute values and their confidence intervals. The ESRI ArcGIS add-in installation file is compatible with ArcGIS 10.x, and developed in .NET framework 4.5 and ArcObject 10.5. It requires Microsoft Windows Vista or higher.
Impacts of Location Uncertainty on Statistical Modeling of Georeferenced Data
(2017-12) Lee, Monghyeon; Griffith, Daniel A; Chun, Yongwan
Uncertainty in data analysis has been a critical topic in numerous fields, such as public health, medicine, civil engineering, ecology and other natural sciences, and many of the social sciences, including geospatial information sciences. It may occur in any step of a study, such as collecting, recording, and analyzing data, and interpreting analysis results. Uncertainty is often propagated to analysis outcomes. The outcomes to which serious uncertainties are transferred likely yield misleading conclusions about a phenomenon, and constitute inaccurate results. Locational uncertainty, which is the difference between a true and a represented location, is a unique source of uncertainty in a spatial data analysis. Furthermore, locational uncertainty may interact with uncertainties from other sources (e.g., measurement, specification, sampling, or stochastic noise), and makes outcomes more unreliable. Propagation of uncertainty has been widely investigated. However, locational uncertainty propagation and combining uncertainties from different sources merit more attention, because the propagation and combination of uncertainties are quite complicated and can seriously corrupt analysis outcomes. This research examines uncertainty in spatial data analysis using two sources of public health data: Florida cancer data and Syracuse blood lead level data. The research 1) presents a study about how locational uncertainty propagates through an analysis involving an urban hierarchy in terms of spatial relationships between poverty and cancer using the Florida cancer data, 2) explores relationships and propagations of location and measurement uncertainties using pediatric blood lead level data for Syracuse, New York, and 3) examines a reverse transformation (i.e., a geometric centerline recovery method) from a kernel density surface to points using the Florida cancer data.
Implementing Moran Eigenvector Spatial Filtering for Massively Large Georeferenced Datasets
(Taylor And Francis Ltd.) Griffith, Daniel A.; Chun, Yongwan; 0000-0001-5125-6450 (Griffith, DA); 0000-0002-4957-1379 (Chun, Y); 14855602 (Griffith, DA); 297769863 (Chun, Y); Griffith, Daniel A.; Chun, Yongwan
Moran eigenvector spatial filtering (MESF) furnishes an alternative method to account for spatial autocorrelation in linear regression specifications describing georeferenced data, although spatial auto-models also are widely used. The utility of this MESF methodology is even more impressive for the non-Gaussian models because its flexible structure enables it to be easily applied to generalized linear models, which include Poisson, binomial, and negative binomial regression. However, the implementation of MESF can be computationally challenging, especially when the number of geographic units, n, is large, or massive, such as with a remotely sensed image. This intensive computation aspect has been a drawback to the use of MESF, particularly for analyzing a remotely sensed image, which can easily contain millions of pixels. Motivated by Curry, this paper proposes an approximation approach to constructing eigenvector spatial filters (ESFs) for a large spatial tessellation. This approximation is based on a divide-and-conquer approach. That is, it constructs ESFs separately for each sub-region, and then combines the resulting ESFs across an entire remotely sensed image. This paper, employing selected specimen remotely sensed images, demonstrates that the proposed technique provides a computationally efficient and successful approach to implement MESF for large or massive spatial tessellations. ©2019 Informa
Measuring Global Spatial Autocorrelation with Data Reliability Information
(Routledge) Koo, Hyeongmo; Wong, D. W. S.; Chun, Yongwan; Koo, Hyeongmo; Chun, Yongwan
Assessing spatial autocorrelation (SA) of statistical estimates such as means is a common practice in spatial analysis and statistics. Popular SA statistics implicitly assume that the reliability of the estimates is irrelevant. Users of these SA statistics also ignore the reliability of the estimates. Using empirical and simulated data, we demonstrate that current SA statistics tend to overestimate SA when errors of the estimates are not considered. We argue that when assessing SA of estimates with error, one is essentially comparing distributions in terms of their means and standard errors. Using the concept of the Bhattacharyya coefficient, we proposed the spatial Bhattacharyya coefficient (SBC) and suggested that it should be used to evaluate the SA of estimates together with their errors. A permutation test is proposed to evaluate its significance. We concluded that the SBC more accurately and robustly reflects the magnitude of SA than traditional SA measures by incorporating errors of estimates in the evaluation.
Negative Spatial Autocorrelation and its Impacts on Georeferenced Data Analyses : With Case Studies of Cancer Incidences
(2020-05) Hu, Lan; Chun, Yongwan
Spatial autocorrelation has been a popular research topic in spatial analysis for decades, mainly attributable to its frequent detection in georeferenced phenomenon. In addition, the presence of spatial autocorrelation complicates statistical analysis, because it violates the independence assumption in conventional statistics. However, most research, to date, focus on positive spatial autocorrelation while works about negative spatial autocorrelation relatively are scant. Negative spatial autocorrelation has long been neglected in literature, largely because of its rare observation in empirical data. This dissertation aims to contribute to the understanding of negative spatial autocorrelation with two major goals. One goal is to examine the impacts of spatial autocorrelation on statistical random variables with both positive and negative spatial autocorrelation being assessed and contrasted with each other. The literature is replete with acknowledgments that positive spatial autocorrelation inflates the variance of a random variable, and it also may alter other random variable distributional properties. Moreover, due to different quantifications of negative and positive spatial autocorrelation, their impacts on random variables are expected to differ. The other goal is to explore simultaneous materialization of negative spatial autocorrelation with positive spatial autocorrelation in empirical data, and a potential treatment of spatial autocorrelation mixture in spatial statistical analysis. Moran scatterplot and local Moran statistics can furnish efficient methods to uncover spatial autocorrelation mixture patterns. Other statistical methodologies are also employed to identify and capture negative spatial autocorrelation, including a spatial autoregressive model with two-spatial autocorrelation-parameters, the mixed regressive spatial autoregressive moving average model, and Moran eigenvector spatial filtering method.
Privacy Compliance in U.S. Universities
(2021-12-01T06:00:00.000Z) Royal, K; Harrington, James; Chun, Yongwan; Sabharwal, Meghna; Maxwell, Sarah; McCaskill, John
Privacy law and compliance with those laws is a complex undertaking. This paper uses a mixed methods approach to review the scope and breadth of compliance with privacy laws at four-year universities in the United States. Starting with a Delphi method with privacy professionals defining the triggers for privacy laws, the laws most important for U.S. universities, and then the elements of a successful privacy program along with the risk factors for noncompliance, the researcher then examines publicly available information on a sample population of universities and lastly performs a legal review based on the Delphi findings and the Document Analysis. Both scholars and practitioners should find the paper useful. The outcomes identify what data subjects and activities trigger privacy laws at U.S. universities, what programmatic elements are required for a privacy compliance program to be successful, and what risk factors universities face in their privacy compliance efforts. All of this is reviewed through the Complexity Theory lens, considering both universities and privacy laws as complex adaptive systems.
Remote Sensing Videography: Potentials, Methods, and Applications
(2020-12-01T06:00:00.000Z) Shi, Fan; Qiu, Fang; Elliot, Euel; Briggs, Ronald; Chun, Yongwan; Cummings, Anthony
In the last several years, new satellite sensors capable of capturing videos have been developed and launched. Unlike satellite images, the temporal resolution of a satellite video is determined by its frame rate. For example, the sensors onboard SkySat satellites can film panchromatic videos at the 1-meter spatial resolution with a frame rate of 30 frames per second (fps), which means the sensors’ temporal resolution is approximately 0.03 seconds. With the potential to detect and tracking moving objects, satellite videography provides a new perspective for Earth observation and enables important applications that may not be possible by using traditional remote sensing images. Moving object detection and tracking has been a hot topic in remote sensing. However, such research using satellite video data have been scarcely investigated. The first objective of this study was to utilize a satellite video to detect and tracking airplanes. To achieve this, two different methods have been developed. The first method includes an Improved Gaussian-based Background Subtractor (IPGBBS) algorithm for moving airplane detection and a Primary Scale Invariant Feature Transform keypoint matching (P-SIFT KM) algorithm for moving airplane tracking. The second method includes a Normalized Frame Difference Labeling (NFDL) algorithm for moving airplane detection and a template matching with improved similarity measures (TM-ISMs) for moving airplane tracking. The second objective of this study is to achieve traffic monitoring with satellite video data, which involves moving vehicle detection and tracking, vehicle motion property extraction, and traffic property extraction. The performance of the developed methods for moving airplane detection and tracking were compared with state-of-the-art approaches. Experimental results show that IPGBBS possesses higher detection accuracy than state-of-the-art approaches, and NFDL exhibits the highest detection accuracy. Moreover, TM-ISMs achieve notably higher tracking accuracy than TMTSMs. The developed method for traffic monitoring was tested on a satellite video of an urban area and demonstrated high accuracy for moving vehicle detection and tracking, which contributed to the effective extraction of both vehicle motion properties and traffic properties.
Spatially Explicit Machine Learning Approaches for House Price Models
(May 2023) Chen, Meifang 1989-; Cordell, Rebecca; Chun, Yongwan; Griffith, Daniel A.; Kim, Dohyeong; Qiu, Fang
Spatial data or georeferenced data are special in that it has spatial reference, meaning that it is linked with geographic coordinates on Earth. The spatial component allows for the identification of spatial patterns, relationships and trends among spatial objects. Spatial objects are usually not randomly or independently distributed, but spatially autocorrelated. In spatial data analysis, spatial autocorrelation has been well recognized with the advocate of spatial statistical techniques, such as spatial clustering, spatial interpolation, spatial regression, and spatial simulation. However, spatial effects or spatial context is largely absent in mainstream machine learning methods. With the popularity of machine learning in various applications in both industry and academia, a new research area has emerged in the spatial community: spatial explicit machine learning. It refers to the use of machine learning algorithms to analyze and predict spatial data with the explicit integration of spatial effects or patterns. It is expected to improve the model accuracy and prediction by incorporating spatial relationships or patterns in the data that have not been captured by traditional machine learning models and, subsequently, to gain better understanding of the data generation mechanism. This research utilizes Franklin County, OH residential house transaction data to explore three different data-driven approaches to integrate spatial perspectives into traditional machine learning algorithms: 1) imposing spatial constraints on unsupervised learning to delineate spatially constrained housing submarkets ; 2) integrating spatial weights into the cost function of supervised learning to improve house price prediction accuracy; and 3) enhancing data input using spatial feature engineering in tree-based ensemble learning for modeling multiscale spatial effects. It intends to contribute new insights for spatially explicit machine learning to the literature. Overall, three studies explore spatially explicit machine learning methods from three different aspects, and the empirical results show that spatially explicit machine learning methods are preferred over traditional machine learning methods when data have strong positive spatial autocorrelation, or more general, data include spatial information that is important for classification, clustering, or prediction tasks.
Spatio-Temporal Modeling of Crime in Urban Environments : Three Case Studies in Seoul, South Korea and Dallas, TX
(2019-11-19) Jung, Yeondae; Chun, Yongwan
Increasing availability of georeferenced data provides researchers a new source they can use to study criminal behavior and law enforcement in space and time dimensions. Although these studies can help broaden our understanding of crime and criminal justice and examine criminological theories, researchers or practitioners analyzing patterns of crime in space and time should be aware of the characteristics of the data and how to handle it with the proper quantitative method. Since spatio-temporal data may violate the independence assumption in conventional regression due to a spatial and/or a temporal structure, an analysis ignoring these effects can result in statistically misleading inferences. Thus, this dissertation is devoted to exploring three issues in the analysis of crime and law enforcement, which may need a methodological adjustment to account for structures in both space and time dimensions. The first chapter introduces the topics, the rationales, and the methodologies in the three papers detailed in Chapters 2 through 4 in this dissertation. Chapter 2 investigates how temperature, as well as socio-economic factors, are associated with crime in an urban environment. Using a Bayesian analysis on three years of monthly data in Seoul, South Korea, this study shows that an association of temperature to assaults varies with economic status and commercial land use of an area. Chapter 3 investigates crime density in four time periods in a day with two types of population measures and environmental variables with a case study in a sub-district of Seoul. The results show that ambient population better explains the variations of assaults for all time periods than residential population. In addition, socio-economic factors that are also significantly associated with the assault are identified, even after population factors are accounted for. Chapters 2 and 3 compare the results of models with different spatial and/or temporal structures and find that the model accounting for both structures better explains the data. Chapter 4 connects crime clearance rates to installed public surveillance cameras using four years of data from Dallas, TX. Focusing on the interaction between pre/post installation and camera distance, the study shows that crime clearance rates are higher after camera installation. However, the effects of surveillance cameras are shown to be dependent on crime types. Finally, Chapter 5 summarizes the main findings and implications of each paper, and discusses the delimitations and limitations to be addressed in future research.
Uncertainty and Context in GIScience and Geography: Challenges in the Era of Geospatial Big Data
(Taylor & Francis Ltd, 2019-01-17) Chun, Yongwan; Kwan, Mei-Po; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 0000-0001-5125-6450 (Griffith, DA); 297769863 (Chun, Y); 14855602 (Griffith, DA); Chun, Yongwan; Griffith, Daniel A.
No abstract available.
Uncertainty in the Effects of the Modifiable Areal Unit Problem under Different Levels of Spatial Autocorrelation: A Simulation Study
(Taylor & Francis Ltd, 2018-11-13) Lee, Sang-Il; Lee, Monghyeon; Chun, Yongwan; Griffith, Daniel A.; 0000-0002-4957-1379 (Chun, Y); 0000-0001-5125-6450 (Griffith, DA); 297769863 (Chun, Y); 14855602 (Griffith, DA); Chun, Yongwan; Griffith, Daniel A.
The objective of this paper is to investigate uncertainties surrounding relationships between spatial autocorrelation (SA) and the modifiable areal unit problem (MAUP) with an extensive simulation experiment. Especially, this paper aims to explore how differently the MAUP behaves for the level of SA focusing on how the initial level of SA at the finest spatial scale makes a significant difference to the MAUP effects on the sample statistics such as means, variances, and Moran coefficients (MCs). The simulation experiment utilizes a random spatial aggregation (RSA) procedure and adopts Moran spatial eigenvectors to simulate different SA levels. The main findings are as follows. First, there are no substantive MAUP effects for means. However, the initial level of SA plays a role for the zoning effect, especially when extreme positive SA is present. Second, there is a clear and strong scale effect for the variances. However, the initial SA level plays a non-negligible role in how this scale effect deploys. Third, the initial SA level plays a crucial role in the nature and extent of the MAUP effects on MCs. A regression analysis confirms that the initial SA level makes a substantial difference to the variability of the MAUP effects.
Visualizing and Modeling Spatial Data Uncertainty
(2018-05) Koo, Hyeongmo; 0000-0002-5446-1668 (Koo, H); Chun, Yongwan
This dissertation extends the understanding of spatial data uncertainty, which inevitably exists in any process of Geographic Information Sciences involving measuring, representing, and modeling the world. This dissertation consists of three specific sub-topics in visualizing and modeling spatial data uncertainty. First, a framework for attribute uncertainty visualization is suggested based on bivariate mapping techniques, and this framework is implemented in a popular GIS environment. The framework and implementation support many visual variables that have been investigated in the literature. This research outcome can provide flexibility to enhance communication and visualization effectiveness for uncertainty visualization. The second sub-topic is a development of optimal map classification methods by simultaneously considering attribute estimates and their uncertainty. This study expands the discussion of constructing an optimal map classification result in which data uncertainty is incorporated in a map classification process. This method utilizes a shortest path problem in an acyclic network based on dissimilarity measures with various cost and objective functions. Finally, modeling positional uncertainty acquired through street geocoding is investigated to understand potential factors of the uncertainty and then to identify impacts of the uncertainty on spatial analysis results. This study accounts for spatial autocorrelation among geocoded points in a modeling process, which has been barely included in this type of modeling. This research has contributions to increasing explanation and to extending geocoding uncertainty modeling by suggesting additional covariates and considering spatial autocorrelation.