Using a Comprehensive Characterization of the Physical Environment and Machine Learning to Forecast the Abundance of Airborne Pollen




Journal Title

Journal ISSN

Volume Title



It is known that approximately 50 million Americans have allergic diseases. Airborne pollen is a significant trigger for several of these allergic diseases. Among all sources, Ambrosia (ragweed) is known for its abundant production of pollen and its potent allergic effect. It is prevalent in North America and the Northern temperate regions in general. Hence, estimating and predicting the daily atmospheric concentration of pollen (ragweed pollen in particular) is useful for both people with allergies and for the health professionals who care for them. In this study we show that a suite of meteorological and land surface parameters along with atmospheric trajectory analysis can be used together with machine learning to successfully estimate (forecast) the daily pollen concentration. Our main data sources are from the MERRA (Modern-Era Retrospective analysis for Research Applications), ECMWF (European Centre for Medium Medium-Range Weather Forecast) as well as the NEXt-generation weather RADar (NEXRAD). We used supervised machine learning methods ranging from linear models such as the Bayesian ridge, the random forest and gradient boosting (ensemble tree based learners), simple and deep neural networks and support vector machines. The performance of the different machine learning methods are independently validated using a test data partitioned based on the holdout method from the total dataset. Additionally, in order to estimate pollen over a large spatial scale, we developed neural network and random forest machine learning models to estimate pollen over a 300 km × 300 km region centered at a NEXRAD radar site at a resolution of 0.5 km × 0.5 km . In this case the models are developed over a 10 km × 10 km area solely on the basis of NEXRAD parameters and applied to all pixels having enough NEXRAD measurements. The feasibility of estimating the daily pollen concentration using only the NEXRAD radar data and machine learning methods would lay the foundation to forecast daily pollen at a fine spatial resolution over the contiguous United States. The results show that despite very different approaches used by the neural network and random forest, the two machine learning methods highlighted high levels of pollen in approximately the same regions. Various independent variable importance estimation techniques were used to calculate and rank the relative importance of the available weather and land surface parameter for estimating pollen abundance. Surface albedo, soil temperature, vegetation greenness fraction, wind speed were among most influential parameters for forecasting allergic pollen. Among the NEXRAD measurements the reflectivity and direction of the wind were the top predictors. The physical interpretation of each predictor variable and its influence on the prediction of allergic pollen are presented towards the end of the dissertation.



Pollen--Allergenicity, Allergy, Ragweeds, Weather forecasting, Machine learning, Atmospheric pressure


©2019 Gebreab K. Zewdie. All Rights Reserved.