Show simple item record

dc.contributor.advisorLary, David J.
dc.creatorWu, Daji
dc.date.accessioned2018-06-21T20:31:48Z
dc.date.available2018-06-21T20:31:48Z
dc.date.created2018-05
dc.date.issued2018-05
dc.date.submittedMay 2018
dc.identifier.urihttp://hdl.handle.net/10735.1/5872
dc.description.abstractThe concentration of airborne particulate matter (PM$_{2.5}$) is a significant environmental and health issue. Many tools have been used to examine the relationship between PM$_{2.5}$ abundance and meteorological variables. Some of these relationships are non-linear, non-Gaussian, and even unknown. Machine Learning provides a broad range of practical solutions to help examine and provide physical insights into these relationships. In this thesis we have used a variety of machine learning approaches. Unsupervised machine learning was used to classify the morphology of PM$_{2.5}$ seasonal cycles in East Asia. Machine learning is able to objectively classify the seasonal cycles, and without apriori assumptions, is able to clearly distinguish between urban and rural areas. We show an example of this in the Sichuan Basin of China. Further, a supervised machine learning approach, random forest is able to identify the key factors associated with each distinct shape of the seasonal cycle, such as the key role placed by the surface type and the built environment. While random forests can be improved by using an optimized ensemble of machine learning approaches (boosting \& bagging), which explores a variety of ensemble methods to choose the algorithm with the best performance with tuned hyperparameters. This optimized approach automatically provides the most important meteorological and surface variables associated with PM$_{2.5}$ concentration. The variables highlighted by optimized machine learning were then examined together with five traditional meteorological features via multiple linear regression (MLR) models, which provide comprehensive physical mechanistic insights into the effect of these variables on the variation of the PM$_{2.5}$ annual cycles, e.g., how these environment variables interact with PM$_{2.5}$ in specific areas. Lastly, the SHapley Additive exPlanation (SHAP) values, which is a consistent measurement of individualized feature attributions in ensemble tree models, were employed to get more information about the impacts of those environmental variables in ensemble tree models. SHAP provided individualized attributions of predictors on the final output. SHAP values were calculated based on ensemble tree models and it didn't assume any linear relationships between predictors and PM$_{2.5}$ concentration like MLR. Results of these impacts given by SHAP were consistent with MLR, but more generally applicable.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.rightsCopyright ©2018 is held by the author. Digital access to this material is made possible by the Eugene McDermott Library. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
dc.subjectAir—Pollution—Research
dc.subjectMachine learning
dc.subjectAir—Pollution—Meteorological aspects
dc.subjectSelf-organizing maps
dc.titleProviding Physical Insights into the Morphology of Spatial and Temporal Distributions of Atmospheric Aerosols Using Machine Learning
dc.typeDissertation
dc.date.updated2018-06-21T20:31:50Z
dc.type.materialtext
dc.contributor.ORCID0000-0002-1791-1009 (Daji, W)
thesis.degree.grantorThe University of Texas at Dallas
thesis.degree.departmentPhysics
thesis.degree.levelDoctoral
thesis.degree.namePHD


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record