Transitions Newsletter Header

Bridges to Operations: Informing NCEP Legacy Operational Model Retirement Through Scorecards

Contributed by Michelle Harrold (NCAR and DTC) and Jeff Beck (NOAA GSL, CU CIRES and DTC)
Winter 2023

NOAA is undergoing a massive, community-driven initiative to unify the NCEP operational model suite under the Unified Forecast System (UFS) umbrella. A key component of this effort is transitioning from the legacy systems to unified Finite-Volume Cubed-Sphere (FV3)-based deterministic and ensemble operational global and regional systems. For the UFS, the goal is to consolidate operational models around a common software framework, reduce the complexity of the NCEP operational suite, and maximize available HPC resources, which is especially imperative with a shift toward using ensemble-based operational systems. As such, a number of current operational systems are slated to be retired; however, before systems can be phased out, the upcoming systems need to perform on par or better than the systems they are replacing.

To address this evaluation requirement, the DTC was charged with creating performance summary “scorecards” to inform model developers, key stakeholders, and decision makers on the retirement readiness of legacy systems, as well as highlight areas that can be targeted for improvement in future versions of UFS-based operational systems. Scorecards are used as a graphical synthesis tool that allow users to objectively assess statistically significant differences between two models (e.g., the UFS-based Rapid Refresh Forecast System (RRFS)  and one of the operational systems) for user-defined combinations of forecast variables, thresholds, and levels for select verification measures.

Scorecards are used as a graphical synthesis tool that allow users to objectively assess statistically significant differences between two models, e.g., the UFS-based Rapid Refresh Forecast System and one of the operational systems, for user-defined combinations of forecast variables, thresholds, and levels for select verification measures.

This exercise focused on evaluating the UFS-based Global Forecast System (GFS) against the North American Mesoscale (NAM) model and Rapid Refresh (RAP) model, as well as the UFS-based Global Ensemble Forecast System (GEFS) against the Short-Range Ensemble Forecast (SREF) system. The eventual goal is to replace the NAM and RAP with the GFS for medium-range forecasting, and replace the SREF with the GEFS as a medium-range ensemble-based system. The scorecards were created with the METplus Analysis Suite using verification output from April 2021 - March 2022; verification output was provided by NOAA/EMC (special thanks to Logan Dawson and Binbin Zhou at EMC for facilitating the data transfer!). The provided verification output allowed for deterministic, ensemble, and probabilistic grid-to-grid and grid-to-point evaluations over four seasonal aggregations.

Key results indicated promising GFS results against the NAM, but the RAP is still largely competitive. When evaluating the GFS against the NAM, precipitation is consistently better forecast in the GFS. For upper-air fields, the GFS generally performs as well or better than the NAM; however, convective-season upper-air forecasts could be a target for improvements in the GFS, as the NAM was most competitive during this period. When evaluating the GFS against the RAP, surface-based and low-to-mid-level verification for the GFS is generally on par or worse than the RAP. The GFS scores well aloft and with cold season precipitation, but an area of focused improvement should be directed toward warm season precipitation. When considering GEFS versus SREF, the GEFS performance was slightly better overall in the fall and winter seasons, but worse in the spring and summer seasons. Results have been shared through presentations at the weekly NOAA/EMC Model Evaluation Group (MEG) meeting as well as a UFS Short-Range Weather/Convective-Allowing Model (SRW/CAM) Application Team meeting. Examples of the scorecards created during this evaluation are provided in Figures 1 and 2.

 

Figure 1. Scorecard of Gilbert Skill Score (GSS) for 3-h accumulated precipitation at specified thresholds and forecast lead times for the 00 UTC initializations over the period July 1, 2021 - Sept. 30, 2021. Results indicate that after the 12-h forecast lead time, when there are statistically significant differences, the GFS outperforms the NAM.

Figure 2. Scorecard of Continuous Ranked Probability Score (CRPS), bias, and root-mean squared-error (RMSE) for 2-m temperature, 2-m relative humidity, U-component of the wind, V-component of the wind, pressure reduced to mean-sea level (PRMSL), and total cloud for various forecast lead times for the 12/15 UTC initializations over the period April 1, 2021 - June 30, 2021. Results are mixed, with GEFS having slightly worse performance overall; however, GEFS does frequently outperform SREF at 06-, 24-,30-,48-,54-,72-, and 78-h forecast lead times.