Ensemble Post-Processing Using Random Forests

Date: Jan 8 2020 | 1:00 - 2:00pm
Location: National Center for Atmospheric Research, Boulder, Foothills Lab Building 2, Room 1001
Eric Loken, University of Oklahoma

Ensembles are important forecast tools because they provide users with uncertainty information. However, ensembles frequently suffer from systematic biases, forecast displacement errors, under-dispersion, and/or sub-optimal reliability. For some areas of prediction, such as severe weather, even convection-allowing ensembles may not be able to explicitly simulate the field of interest (e.g., severe hail or tornadoes), making skillful probabilistic forecasts more difficult to produce. Machine learning techniques, such as the random forest (RF), can improve ensemble-based forecasts during post-processing by non-linearly relating ensemble forecast variables to observed weather. 

In this talk, it is demonstrated that a RF can improve next-day (12-36 h) ensemble-derived probabilistic precipitation and severe weather forecasts over the contiguous United States. Unlike previous studies, RFs herein use temporally-aggregated grid-point based ensemble forecasts as predictors. RF forecasts are shown to outperform skillful baselines, such as Gaussian smoothing of raw ensemble probabilities and (for severe weather) updraft helicity-based proxies. Remarkably, RF probabilistic severe weather forecasts derived from Storm Scale Ensemble of Opportunity (SSEO) output also outperform corresponding Storm Prediction Center human forecasts. Overall, results suggest that RFs represent valuable ensemble post-processing tools that enhance forecast reliability and resolution while reducing displacement errors. The RF approach performs best for convection-parameterizing ensembles and more common weather events. It also benefits severe weather forecasts more than precipitation forecasts, presumably since severe weather cannot be explicitly simulated. Promisingly, the RF technique is inexpensive to run in real-time (after training) and requires less than a year of training data to attain a high degree of skill.