Hierarchical Model Development and Single Column Models

Summer 2018

Earth system models connect the atmosphere, ocean, and land, and depend on proper representations of dynamics and physics, initial conditions, and interactions of these processes to predict future conditions. Standard meteorological variables are used to validate typical numerical weather prediction models but are gross measures of these countless interactions and limit their usefulness for guiding model improvement. Some fraction of error in these metrics can be the result of specific physical parameterizations, but it can be difficult to trace the source. One solution is to isolate these parameterizations – compare them with something measurable. These process-level metrics can help us begin to understand and then address the systematic biases in a given parameterization before we can consider the root causes of systematic biases in a more fully-coupled model.

Single Column Model (SCM) testing is part of the hierarchical model development approach by the Global Model Test Bed (GMTB) under the Developmental Testbed Center (DTC).  DTC/GMTB is a joint effort between NCAR and NOAA/ESRL, in collaboration with their external research-to-operations partners, and led by personnel in NCAR/RAL/JNT and NOAA/ESRL/GSD.  Single column models (SCMs) are an excellent way to evaluate the performance of a set of model physics because many physical processes primarily interact in the vertical, with horizontal transport by dynamics.  Here, the model physical parameterizations are connected (as a column) and are provided with the necessary initial conditions and lateral forcing to investigate the evolution of the profile.  SCM forcing may be from model, observational (e.g. from field programs) or idealized/synthetic data sets, to explore the response of the physics in different conditions, as well as to “stress test” parameterizations. In addition, computational resources required to run a SCM are orders of magnitude smaller than a fully-coupled model, and so may run in seconds on a laptop.  SCMs with options to turn on and off various parameterizations, then allow for the examination of the interactions of those parameterizations, e.g. land plus surface-layer turbulence plus atmospheric boundary-layer.

The question to answer is, “do we obtain the same performance when the parameterizations are run separately as we do when they are coupled?”  A model can be tuned to obtain some required level of performance, but the more complex the system, the more tuning may be accommodating a number of compensating errors, rather than making improvements to the model physics.  What we are ultimately after is "getting the right answers for the right reasons," first testing a parameterization in isolation, then progressively adding parameterization interactions, up to a SCM. Using SCMs can enhance interactions with the Research-to-Operations (R2O) community, where they often work on physics development, but may not have their focus on or computer resources to do fully-coupled model runs, which could include data flow, data assimilation, model output post-processing, etc.

Note that at higher resolutions (model grid boxes that are on the order of 5-10 km or less), the evaluation of some physics (most notable convection and convective systems) requires at least a limited-area model to examine processes and identify systematic biases, where circulations are induced between grid boxes.  This is a part of the hierarchy of model testing and development, where the follow-on steps are then regional, continental, and global-scale models, which have more traditional NWP metrics of performance. One must still get the physics right with process-level metrics of performance. We must “look under the hood” to see what is really going on if we are to make real improvements in the performance of Earth system and numerical weather prediction models.

 

 

Important local land-atmosphere interactions
Important local land-atmosphere interactions for conditions of daytime surface heating, where arrows indicate model processes for radiation, boundary-layer, and land.  Solid arrows indicate the direction of feedbacks that are normally positive (leading to an increase of the recipient variable).  Dashed arrows indicate negative feedbacks.  Two consecutive negative feedbacks make a positive feedback.

 

“Building a Weather-Ready Nation by Transitioning Academic Research to NOAA Operations” Workshop

Spring 2018

The “Building a Weather-Ready Nation by Transitioning Academic Research to NOAA Operations” Workshop was held at the NOAA Center for Weather and Climate Prediction in College Park, Maryland, on November 1-2, 2017. NOAA and UCAR organized the meeting that drew more than one hundred participants from universities, government laboratories, operational centers, and the private sector. Members of the organizing committee included Reza Khanbilvardi, City College of New York, Chandra Kondragunta, NOAA/OAR, Jennifer Mahoney, NOAA/ESRL, Fred Toepfer, NOAA/NWS, and Hendrik Tolman, NOAA/NWS. John Cortinas, NOAA/OAR, and Bill Kuo, UCAR, served as Co-Chairs. A draft of the Workshop Report is available here.


The workshop was informative, stimulating and productive, as it allowed the academic community to have a direct dialogue with NOAA on research to operations transition issues.

The workshop was designed to inform the academic community about NOAA’s transition policies and processes and to encourage the academic community to actively participate in transitioning research to improve NOAA’s weather operations. It was also an opportunity to strengthen engagement between the research and operational communities.

The first day of the workshop consisted of a series of invited presentations on the policies, needs, requirements, gaps, successes, and challenges of NOAA transitions. During the second day, working groups discussed issues and made recommendations to improve the process of transitioning Research to Operations (R2O). In particular, the discussions led to several interesting suggestions on the participation of academic community in the NOAA R2O activities:

  1. NOAA needs to recognize the academic community has a different rewards system from that of an operational organization. Scientific publication is critical for the career advancement of university professors and students. Therefore, the academic community will be much more interested in research that can lead to publication.The availability of computing resources is critical for successful R2O in weather and climate modeling. Given the limited NOAA computing resources and the challenges of obtaining security clearance to use NOAA computing facilities, an alternative solution is needed. Making NOAA operational models and data, and computing resources available through the cloud is an attractive solution.

  2. The academic community cannot work for free. Therefore, appropriate funding to support their participation in R2O is critical. Good examples include the Hurricane Forecast Improvement Project (HFIP), Next Generation Global Prediction System (NGGPS), and Joint Technology Transfer Initiative (JTTI) announcement of opportunities.

Through NOAA support, several students were invited to participate in the workshop. One Ph.D. student from the University of Maryland shared that her eyes were opened to real-life challenges and issues that are confronting our field, something she was not able to learn from her classes. Interacting with these enthusiastic next-generation scientists, who are not afraid to tackle the challenging problems in our field, was the most rewarding part of the workshop.

Many participants commented that the workshop was informative, stimulating and productive, as it allowed the academic community to have a direct dialogue with NOAA on research to operations transition issues. The participants agreed that it would be desirable to have such a workshop once every two years.

 


The Global Model Test Bed: Bringing the U. S. scientific community into NCEP global forecast model development

Autumn 2017
"Entraining a vibrant, diverse external community into NCEP global model development will bring broader dividends. ....Surely the U.S. can marshal its intellectual resources to do even better and create the world's best unified modeling system using the GMTB as a collaborative platform."

The DTC is at the core of an exciting new effort to more effectively bring the U. S. scientific community into the development of our national global weather forecast model, the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS).  This effort is part of NOAA’s ongoing Next Generation Global Prediction System (NGGPS) Program, which started in 2014.  NGGPS is a multimillion dollar effort to support implementation, testing and refinement of community-driven improvements that aim to transform the GFS into a unified weather and seasonal forecast system (UFS) with world-leading forecast skill.  

Up to now, the GFS has been developed primarily within NCEP’s Environmental Modeling Center (EMC) in College Park, MD.  This has naturally led to barriers to effective participation of the external community. These barriers include lack of documentation about how to run the model and the implementation of the equations and parameterizations, limited model diagnostics and metrics, and lack of a well-organized and accessible code repository.  Two further issues are software engineering that does not easily support the testing of major changes in the model physics and dynamics, and complications in accessing NOAA high-performance computing resources for model testing.  

DTC’s Global Model Test Bed (GMTB), led by Ligia Bernadet and Grant Firl, is an ambitious project to make GFS/UFS model development much more user-friendly, catalyzing partnerships between EMC and research groups in national laboratories and academic institutions.   The GMTB aims to implement transparent and community-oriented approaches to software engineering, metrics, documentation, code access, and model testing and evaluation.   

A first step in this direction, in collaboration with EMC, has been the design of an Interoperable Physics Driver and design of standard interfaces for a Common Community Physics Package.  These software frameworks allow for easy interchange of dynamical cores or different physical parameterizations.  For instance, the suite of physical parameterizations used in the GFDL or CESM climate models, or a new cumulus or microphysical parameterization can be tried out within GFS.

At present, the GMTB supports the use of a single-column version of the global model. The single column is useful for running case studies that isolate particular physical processes such as stable boundary layers or tropical oceanic deep cumulus convection, and global atmospheric simulations with specified geographical distribution of sea-surface temperatures.  Global hindcast simulations can be evaluated using the same set of metrics currently used at EMC for weather forecasts, which focus on forecast lead times of 10 days or less.  GMTB has already performed an evaluation of an alternative cumulus parameterization scheme within GFS using this approach, and may soon be testing alternative microphysical or boundary-layer parameterizations.

To realize the vision of a unified model that can be used out to seasonal timescales, the GFS must also be systematically tested at lower grid resolution in an ocean-coupled mode.  A ‘test harness’ of hindcast cases must be implemented for evaluating model performance in that setting, in which skill in forecasting modes of low-frequency variability such as ENSO, the Madden-Julian Oscillation, and the North Atlantic Oscillation, as well as land-atmosphere coupling, becomes paramount.  Metrics of two-week to 6-month forecast skill must be agreed upon by EMC and the broader community and balanced with more typical measures of shorter-range weather forecast skill.   GMTB will need to implement both the test harness of coupled model simulations and the unified metrics suite.

Over the long term, GMTB will need to address a variety of other nontrivial challenges to be successful.  The most important is maintaining a close working relationship with EMC, such that the codes, metrics, and cases that EMC uses for evaluating new model developments for operational readiness are the same as those used by outside developers.  GMTB also needs streamlined access to dedicated high-performance computing such that a new user can quickly work on modifying and running GFS without lengthy delays in obtaining needed approvals and resources.  The above vision also places responsibility for GMTB to be the help desk for outside GFS/UFS model developers, which will require adequate trained staff and extensive improvement of model documentation. GTMB will need to play an important role in model evaluation, promoting transparent, trusted decision-making about what model developments are ready to be considered for operational testing and implementation (though NCEP will have the final word on what gets implemented for operations).   Lastly, an important issue for the future scope of GMTB is whether and how to bring data assimilation, another key element of the forecast process, into this vision.

Entraining a vibrant, diverse external community into NCEP global model development will bring broader dividends.  More eyes will lead to more insight into model strengths and weaknesses, and young scientists will naturally learn about GFS and provide a talent pool for making it a world-leading model. The framework of interoperability could be broadened to include climate models such as CESM, allowing further cross-talk between the weather and climate modeling communities. The UK Met Office has demonstrated the strength of this approach; surely the U. S. can marshal its intellectual resources to do even better and create the world’s best unified modeling system using the GMTB as a collaborative platform.

 


Sample results for Hurricane Matthew from GMTB evaluation of an alternative cumulus parameterization scheme, showing the daily averaged Upward Short Wave Radiative Flux (USWRF) and cloudiness 2 Oct 2016 0Z. Left is a control GFS run, middle is experimental run, right is the difference in the low cloud coverage between the two runs. Conclusion, the experimental run.

Evaluation of New Cloud Verification Methods

Winter 2017
“These metrics can provide very succinct information about many aspects of forecast performance without having to resort to complicated, computationally expensive techniques. ”

The DTC has been tasked by the US Air Force to investigate new approaches to evaluate cloud forecast predictions. Accurate cloud forecasts are critical to the Air Force national intelligence mission because clouds can mask key targets, obscure sensors, and are a hazard to Remotely Piloted Aircraft. This work that will help forecast users and developers understand their  characteristics of these predictions, and suggest ways to make the predictionsm more accurate.

Clouds have significant impacts on many kinds of decisions.  Among other applications, accurate cloud forecasts are critical to the national intelligence mission. Clouds can mask key targets, obscure sensors, and are a hazard to Remotely Piloted Aircraft. The locations of clouds, as well as other cloud characteristics (e.g., bases, tops), are difficult to predict because clouds are 3three-dimensional and they form and dissipate quickly at multiple levels in the atmosphere. In addition, cloud predictions integrate across multiple components of numerical weather prediction systems. Evaluation of cloud predictions is not straightforward for many of the same reasons.

The DTC has been tasked by the US Air Force to investigate new approaches to evaluate cloud forecast predictions that will help forecast users and developers understand their characteristics and suggest ways to make the predictions more accurate.

The DTC effort, in collaboration with staff at the Air Force’s 557th Weather Wing, focuses on testing a variety of verification approaches. , including tTraditional verification methods for continuous and categorical forecasts provide a baseline evaluation of quality (e.g.,  Mean Error, Mean Absolute Error, Gilbert Skill Score, Probability of Detection). that provide a baseline evaluation of quality, sSpatial methods (e.g., the Method for Object-based Diagnostic Evaluation [MODE]) and field deformation approaches that provide greater diagnostic information about cloud prediction capabilities., Nand new distance metrics that characterize the distances between forecast and observed cloud features. This evaluation will help identify new tools, including a cloud-centric NWP index, to consider for implementation and operational application in the Model Evaluation Tools (MET) verification software suite.

For the evaluation, the team is focusing initially on forecast and observed total cloud amount (TCA) fractional cloud coverage datasets for six cloud products for one week-periods for each of four seasons, for six cloud products:

  • WorldWide Merged Cloud Analysis (WWMCA) developed by the Air Force;
  • A WWMCA reanalysis (WWMCAR) product that includes latent observations not included in the real-time version of WWMCA;
  • Forecasts (out to 72 h) of TCA from the USAF Global Air Land Weather Exploitation Model (GALWEM) which is the Air Force implementation of the United Kingdom’s Unified Model;
  • TCA forecasts (out to 72 h) from the NCEP Global Forecast System (GFS) model;
  • Biascorrected versions of the GALWEM and GFS predictions (GALWEM-DCF and GFS-DCF); and
  • Shortterm TCA predictions (out to 9 h) from the Advective cloud model (ADVCLD).

Datasets used in the evaluation were from one week-periods for each of four seasons.

Methods and results

Results of the application of the various verification methods indicated that continuous approaches are not very meaningful for evaluating cloud predictions, particularly because due to they are discontinuous in nature of clouds.  In contrast, categorical approaches can provide information that is potentially quite useful, particularly when applied to thresholds that are relevant for AF decision-making (e.g., overcast, clear conditions), and when the results are presented using a multivariate approach such as the performance diagrams first applied by Roebber (WAF, 2009).  The MODE spatial method also shows great promise for diagnosing errors in cloud predictions (e.g., size biases, displacements).  However, more effort is required to identify optimal configurations of the MODE tool for application to clouds for AF decision making.

Initial testing of field deformation methods indicated that these approaches have a good are potentially of being useful for evaluation of cloud forecasts. Field deformation methods evaluate how much a forecast would have to change in order to best match the observed field. Information about the amount and type of deformation required can be estimated, along with the resulting reduction in error.

The results also indicated that, in general, cloud amount forecasts lend themselves to verification through binary image metrics because a cloud’s presence or absence can be ascertained through categories of cloud amount thresholds. These metrics can provide very succinct information about many aspects of forecast performance in this context without having to resort to complicated, computationally expensive techniques. For example, Baddley’s ∆ metric gives an overall useful summary of how well two cloud-amount products compare in terms of size, shape, orientation and location of clouds., and tThe Mean Error Distance (MED) gives meaningful information about misses and false alarms, but is sensitive to small changes in the field. In addition to the distance metrics, a geometric index that measures three geometric characteristics (area, connectivity, and shape ) could potentially provide additional useful information, especially when the cloud field is not too complex (i.e., is comprised of a small number of features).

Ongoing and future efforts

Ongoing efforts on this project are focused on extending the methods to global cloud amounts (the initial work focused on North America), and further refinements and tests of the methods.  For example, MODE configurations are being identified in collaboration with the AF 557th Weather Squadron, to ensure the configurations are relevant for AF decision-making.  In addition, canonical evaluations (i.e., with “artificial” but realistic cloud distributions) of the distance metrics [1] are being examined to determine if any unknown biases or poor behavior exist that would influence the application of these methods. As these extensions are completed, a set of tools will be identified that provide meaningful – and complete – information about performance of TCA forecasts.  Further efforts will focus on other cloud parameters such as cloud bases and tops.

The canonical evaluations only apply to the distance metrics, not all of the methods.

Community Modeling Workshop Outcome

Summer 2017
“The most common feedback from the workshop participants noted the increase in transparency within the EMC and NOAA at large, the increasing effort to engage the entire community, and the general sense of positive momentum of the community coming together to embrace the opportunity to use NGGPS as a foundation to build a true community modeling resource for the Nation.”

DTC Article on NOAA Community Modeling Workshop and SIP Working Group meetings

The NOAA Community Modeling Workshop and meetings of the Strategic Implementation Plan (SIP) Working Groups were held 18-20 April  2017 at the National Center for Weather and Climate Prediction in College Park, Maryland.  The goal of the meetings was to seek engagement with the Earth system science community to form and shape the nascent unified modeling community being built upon the Next Generation Global Prediction System (NGGPS), and to consider how to best execute shared infrastructure, support, management, and governance.  Other topics addressed include identifying “best practices,” discussing how a community-based unified modeling system will actually work, and to evolve and coordinate between SIP/NGGPS Working Groups (WGs). A complete set of documents for the meeting, including the agenda, participant list, presentations, and summary reports are found on the workshop webpage: https://ral.ucar.edu/events/2017/community-modeling-workshop. For more information on the SIP effort, see the “Director’s Corner” article in the Winter 2017 issue of DTC Transitions.



The NOAA Community Modeling Workshop, which ran from 18 April through noon on 19 April, was designed to interact with the broader model R&D community.  As such, this portion was completely open to the public, and included a dial-in capability for the plenary sessions.  The opening talks set the stage by describing the approach and goals of the Next Generation Global Prediction System (NGGPS), and a summary of the SIP and its goals and objectives.  These opening talks were followed by a panel discussion of senior leaders from the weather enterprise, including the Directors of NWS and NOAA Research, UCAR President, and senior leaders from academia, private sector, NASA, National Science Foundation, and DoD (Navy).  Each were asked to provide their perspective on three items:

  1. What aspects of a NOAA-led community to develop next-generation unified modeling system would your organization and sector find advantageous?  In other words, how do you think your organization/sector would benefit?
  2. For which parts of a community unified modeling effort would your organization or sector be best able (and most likely) to contribute? In other words, what do you feel is the best role for your organization/sector to play?
  3. From the perspective of your organization or sector, what do you see as the greatest challenges to be overcome (or barriers that must be broken down) to make this a successful community enterprise?

The remainder of the presentations were panel discussions featuring co-chairs from 12 active SIP WGs, each of whom provided their perspective on the ongoing activities of their WG and the overall effort to migrate the NGGPS global model, under development within NOAA, into a community-based unified modeling system.

The workshop concluded on the morning of 19 April with a series of parallel break-out groups, each of which was asked to provide their assessment based on what they saw and heard during the presentations to identify two categories of items:

  1. Best practices: What are the major things that we’re getting right?
  2. Gaps: What are the major things that we’re missing, or heading down the wrong track?

Note: Reports from these breakout sessions can be found in the workshop summary.

The SIP Working Group meeting, which ran from the afternoon of 19 April through the end of 20 April, consisted of a series of meetings between the various SIP Working Groups (WG) aimed at advancing the technical planning within each WG and ensuring that this technical planning is well-coordinated across WGs.  These meetings, also referred to as Cross-WG meetings, were also designed to identify areas of overlap vs. gaps between the WGs, and to help facilitate technical exchange.

Each WG was asked to provide (1) an overall assessment of the effectiveness of the workshop, (2) a summary of “immediate needs” they felt needed to be worked ASAP to ensure success in the long term, and (3) items on the “critical path” that were most important upon which others depended.  A summary of the “immediate needs” and “critical path” items are provided in the SIP meeting summary, which includes the full reports from each WG.

The overall consensus of the meeting participants for both portions of the workshop was very positive, with the most common feedback noting the increase in transparency within the Environmental Modeling Center and NOAA at large, the increasing effort to engage the entire community, and the general sense of positive momentum of the community coming together to embrace the opportunity to use NGGPS as a foundation to build a true community modeling resource for the Nation.

U.S. Air Force Weather Modeling and the DTC

Spring 2016
During the next decade, the Air Force will be working with our national and international modeling partners toward a goal of consolidated capabilities

The longstanding mission of the U.S. Air Force Weather (AFW) enterprise is to maximize America’s power through the exploitation of timely, accurate, and relevant weather information, anytime, everywhere. To meet this mission, the Air Force has operated a broad range of numerical weather models to analyze and predict environmental parameters that impact military operations. The internally developed Global Spectral Model (GSM, a separate effort from the NCEP GSM) was the first operational model run by the Air Force implemented in the early 1980s. The GSM was replaced by the Relocatable Window Model (RWM) in 1990 and then in the late ‘90s Mesoscale Model 5 (MM5) went into operations. In 2006, the Weather Research and Forecasting (WRF) model became the mainstay for Air Force operations and has remained so for most of the last decade. On 1 Oct 2015, the new Global Air-Land Weather Exploitation Model (GALWEM), based on the United Kingdom Met Office’s (UKMO) Unified Model, was implemented as the Air Force’s primary weather model to meet the warfighter’s global requirements. (Figure below provides timeline of USAF weather model evolution).


Timeline of Air Force weather model evolution

The Air Force is a Charter member of the DTC, as well as a number of other interagency partnerships working toward a shared goal of rapidly and cost effectively advancing U.S. weather modeling capabilities. These include the National Unified Operational Prediction Capability (NUOPC), the National Earth System Prediction Capability (ESPC), and the Joint Center for Satellite Data Assimilation (JCSDA). The science insertion, validation studies, and user product improvements developed through these partnerships have benefited AFW significantly.


Sample of the USAF GALWEM model output used by AFW.

The Air Force’s contributions to the DTC have focused on verification and tuning of the WRF model for a range of domains around the world; support to the standardization, documentation, and baseline management of the Gridpoint Statistical Interpolation (GSI) data assimilation system used by all of the DTC partners; and enhancing the community Model Evaluation Tools (MET) to more effectively verify clouds and other aviation parameters and to improve ensemble verification techniques.

During the next decade, the Air Force will be working with our national and international modeling partners toward a goal of consolidated capabilities to assimilate, analyze and predict parameters critical to military operations in a single model solution, independent from the model(s) used downstream of the DA system. The converged solution is expected to improve overall efficiency and reduce costs while streamlining new science insertion. We look forward to working closely with the DTC to achieve this goal.


Sample of the USAF GALWEM model output used by AFW.

Expanding Capability of DTC Verification

Winter 2016

Robust testing and evaluation of research innovations is a critical component of the Research-to-Operations (R2O) process and is performed for NCEP by the Developmental Testbed Center (DTC).
At the foundation of the DTC testing and evaluation (T&E) system is the Model Evaluation Tools (MET), which is also supported to the community through the DTC. The verification team within the DTC has been working closely with DTC teams as well as the research and operational communities (e.g. NOAA HIWPP program and NCEP/EMC respectively) to enhance MET to better support both internal T&E activities and testing performed at NOAA Centers and Testbeds.

METv5.1 was released to the community in October 2015. It includes a multitude of enhancements to the already extensive capabilities. The additions can be grouped into new tools, enhanced controls over pre-existing capabilities, and new statistics. It may be fair to say there is something new for everyone.

New tools:  Sometimes through the development process, user needs drive the addition of new tools. This was the case for the METv5.1 release. The concept of automated regridding within the tools was first brought up during a discussion with the Science Advisory Board. The concept was embraced as a way to make the Mesoscale Model Evaluation Testbed (MMET) more accessible to researchers and was added. The MET team took it one step further and not only added the capability to all MET tools that ingest gridded data but also developed a stand-alone tool (regrid_data_plane) to facilitate regridding, especially of NetCDF files. 

For those who use or would like to use the Method for Object-based Diagnostic Evaluation (MODE) tool in MET, a new tool (MODE-Time Domain or MTD) that tracks objects through time has been developed. In the past, many MET users have performed separate MODE runs at a series of forecast valid times and analyzed the resulting object attributes, matches and merges as functions of time in an effort to incorporate temporal information in assessments of forecast quality. MTD was developed as a way to address this need in a more systematic way. Most of the information obtained from such multiple coordinated MODE runs can be obtained more simply from MTD. As in MODE, MTD applies a convolution field and threshold to define the space-time objects. It also computes the single 3D object attributes (e.g. centroid, volume, and velocity) and paired 3D object attributes (e.g. centroid distance, volume ratio, speed difference).



To address the needs of the Gridpoint Statistical Interpolation (GSI) Data Assimilation community tool, the DTC Data Assimilation Team and MET team worked together to develop a set of tools to read the GSI binary diagnostic files. The files contain useful information about how a single observation was used in the analysis by providing details such as the innovation (O-B), observation values, observation error, adjusted observation error, and quality control information. When MET reads GSI diagnostic files, the innovation (O-B; generated prior to the first outer loop) or analysis increment (O-A; generated after the final outer loop) is split into separate values for the observation (OBS) and the forecast (FCST), where the forecast value corresponds to the background (O-B) or analysis (O-A). This information is then written into the MET matched pair format. Traditional statistics (e.g. Bias, Root Mean Square Error) may then be calculated using the MET Stat-Analysis tool. Support for ensemble based DA methods is also included. Currently, three observation types are supported, Conventional, AMSU-A and AMSU-B.

Enhanced Controls: Working with DTC teams and end users usually provides plenty of opportunities to identify optimal ways to enhance existing tools. One example of this occurred during the METv5.1 release. Finer controls of thresholding were added to several tools to allow for more complex definitions of events used in the formulation of categorical statistics. This option is useful if a user would like to look at a particular subset of data without computing multi-categorical statistics (e.g. the skill for predicting precipitation between 25.4 mm and 76.2 mm). The thresholding may also now be applied to the computation of continuous statistics. This option is useful when assessing model skill for a sub-set of weather conditions (e.g. during freezing conditions or cloudy days as indicated by a low amount of incoming shortwave radiation).

Another example includes combining several tools such as Gen_Poly_Mask and Gen_Circle_Mask into a more generalized tool Gen_Vx_Mask. The Gen-Vx-Mask tool may be run to create a bitmap verification masking region to be used by the MET statistics tools. This tool enables the user to generate a masking region once for a domain and apply it to many cases. The ability to compute the union, intersection or symmetric difference of two masks was also added to Gen_Vx_Mask to provide finer control  for a verification region. Gen_Vx_Mask now supports the following types of masking definitions: 


MET’s Conditional Continuous Verification. Above panel shows the geographic bias of temperature at surface weather stations in 0.5 degree increments from -4 in purple to +4 in red. The mean bias over entire dataset is -0.43 K.

MET’s Conditional Continuous Verification. Above panel shows bias for all stations observed to be greater than 300 K. The mean bias for the warmer temperatures shows a greater cold bias of -0.87 K.

  1. Polyline (poly) masking reads an input ASCII file containing Lat/Lon locations. This option is useful when defining geographic sub-regions of a domain.
  2. Circle (circle) masking reads an input ASCII file containing Lat/Lon locations and for each grid point, computes the minimum great-circle arc distance in kilometers to those points. This option is useful when defining areas within a certain radius of radar locations.
  3. Track (track) masking reads an input ASCII file containing Lat/Lon locations of a “track” and for each grid point, computes the minimum great-circle arc distance in kilometers. This option is useful when defining the area within a certain distance of a hurricane track.
  4. Grid (grid) masking reads an input gridded data file, extracts the field specified using the its grid definition. This option is useful when using a model nest to define the corresponding area of the parent domain.
  5. Data (data) masking reads an input gridded data file, extracts the field specified by some threshold. The option is useful when thresholding topography to define a mask based on elevation or when thresholding land use to extract a particular category.

Additional examples of enhanced controls include the user being able to define a rapid intensification / rapid weakening event for a tropical cyclone in a more generic way with TC-Stat. This capability was then included in the Stat-Analysis tools to allow for identification of ramp events for renewables or extreme change events for other areas of study.


“The MET team strives to provide the NWP community with a state-of-the-art verification package where MET incorporates newly developed and advanced verification methodologies.”

New Statistics: In support of the need for expanded probabilistic verification capability of both regional and global ensembles, the MET team added a “climo_mean” specification to the Grid-Stat, Point-Stat, and Ensemble-Stat configuration files. If a climatological mean is included, the Anomaly Correlation is reported in the continuous statistics output. If a climatological or reference probability field is provided, Brier Skill Score and Continuous Ranked probability score are reported in the probabilistic score output. Additionally, the decomposition of the Mean Square Error field was also included in the continuous statistics computations. These options are particularly useful to the global NWP community and were added to address the needs of the NCEP/EMC Global Climate and Weather Prediction Branch.

In conclusion, the MET team strives to provide the NWP community with a state-of-the-art verification package. “State-of-the-art” means that MET will incorporate newly developed and advanced verification methodologies, including new methods for diagnostic and spatial verification but also will utilize and replicate the capabilities of existing systems for verification of NWP forecasts. We encourage those in the community to share your requirements, ideas, and algorithms with our team so that MET may better serve the entire verification community. Please contact us at met_help@ucar.edu.

NOAA Selects GFDL’s Dynamical Core

Autumn 2016

In August 2014, numerical weather prediction modelers attended a workshop to discuss dynamic core requirements and attri- butes for the NGGPS, and developed a battery of tests to be conducted in three phases over 18 months. Six existing dynamical cores were identified as potential candidates for NGGPS.

During Phase 1, a team of evaluators ran benchmarks to look at performance, both meteorological and computational, and the stability of the core. The performance benchmark measured the speed of each candidate model at the resolution run currently in National Centers for Environmental Prediction (NCEP) operations, and at a much higher resolution expected to be run operation- ally within 10 years. They also evaluated the ability of the models to scale across many tens of thousands of processor cores.



Assessment of the test outcomes from Phase 1 resulted in the recommendation to reduce the candidate pool to two cores, NCAR’s Model for Prediction Across Scales (MPAS) and GFDL’s Finite-Volume on a Cubed Sphere (FV3), prior to Phase 2.

In Phase 2, the team evaluated the two remaining candidates on meteorological performance using both idealized physics and the operational GFS physics package. Using initial conditions from operational analyses produced by NCEP’s Global Data Assimila- tion System (GDAS), each dynamical core ran retrospective forecasts covering the entire 2015 calendar year at the current opera- tional 13 km horizontal resolution. In addition, two cases, Hurricane Sandy in October 2012, and the May 18-20, 2013 tornado outbreak in the Great Plains were run with enhanced resolution (approximately 3 km) over North America. The team assessed the ability of the dynamical cores to predict severe convection without a deep convective parameterization, using operational initial conditions and high-resolution orography.

The results of Phase 2 tests showed that GFDL’s FV3 satisfied all the criteria, had a high level of readiness for operational imple- mentation, and was computationally highly efficient. As a result, the panel of experts recommended to NOAA leadership that FV3 become the atmospheric dynamical core of the NGGPS. NOAA announced the selection of FV3 on July 27, 2016.

Phase 3 of the project, getting underway now, will involve integrating the FV3 dynamical core with the rest of the operational global forecast system, including the data assimilation and post-processing systems. See results, https://www.weather.gov/sti/sti- modeling_nggps_implementation_atmdynamics.

Contributed by Jeff Whitaker.


Hindcast of the 2008 hurricane season, simulated by the FV3-powered GFDL model at 13 km resolution.

NGGPS Dynamical Core: Phase 1 Evaluation Criteria

  • Simulate important atmospheric dynamical phenomena, such as baroclinic and orographic waves, and simple moist convection
  • Restart execution and produce bit-reproducible results on the same hardware, with the same processor layout (using the same executable with the same model configuration)
  • High computational performance (8.5 min/day) and scalability to NWS operational CPU processor counts needed to run 13 km and higher resolutions expected by 2020
  • Extensible, well-documented software that is performance portable
  • Execution and stability at high horizontal resolution (3 km or less) with realistic physics and orography
  • Evaluate level of grid imprinting for idealized atmospheric flows

Phase 2 Evaluation Criteria

  • Plan for relaxing the shallow atmosphere approximation (deep atmosphere dynamics) to support tropospheric and space-weather requirements.
  • Accurate conservation of mass, tracers total energy, and entropy that have particular importance for weather and climate application.
  • Robust model solutions under a wide range of realistic atmospheric initial conditions, including strong hurricanes, sudden stratospheric warmings, and intense upper-level fronts with associated strong jet-stream wind speeds using a common (GFS) physics package
  • Computational performance and scalability of dynamical cores with GFS physics
  • Demonstrated variable resolution and/or nesting capabilities, including physically realistic simulations of convection in the high-resolution region
  • Stable, conservative long integrations with realistic climate statistics
  • Code adaptable to NOAA Environmental Modeling System (NEMS)/ Evaluated Earth System Modeling Framework (ESMF)
  • Detailed dycore (dynamical core) documentation, including documentation of vertical grid, numerical filters, time-integration scheme and variable resolu- tion and/or nesting capabilities.
  • Performance in cycled data assimilation tests to uncover issues that might arise when cold-started from another assimilation system
  • Implementation plan including costs

The need for a Common Community Physics Package

Summer 2016

While national modeling centers can benefit from the expertise in the broader community of parameterization developers, the social and technical barriers to a community researcher implementing and testing a new parameterization or set of parameterizations (a physics suite) in an operational model are high.

Physical parameterization codes are often implemented so that they are strongly linked to a particular model dynamical core, with dependencies on grid structure, prognostic variables, and even time-stepping scheme. Dependencies amongst schemes are also common. For example, information from a deep convection scheme may be needed in a gravity wave drag scheme. While the dependencies are generally justified based on computational efficiency arguments, it complicates the replacement of parameterizations and of suites, marginalizing tremendous scientific talent.

To address these difficulties, and engage the broad community of physics developers in the Weather Service’s Next-Generation Global Prediction System (NGGPS), the DTC’s Global Model Test Bed (GMTB) is participating in developing the Common Community Physics Package (CCPP). The schematic (below) shows the DTC’s proposed modeling meta-structure for NGGPS, with the CCPP shown in the gray box. Specific parameterizations in the CCPP shown here are for example only; other parameterization or set of parameterizations could be displayed in the blue boxes.

Although requirements are sure to evolve depending on priorities and funding, an initial set is in place to inform the CCPP design. They reflect the following vision for the CCPP: (1) a low barrier to entry for physics researchers to test their ideas in a sandbox, (2) a hierarchy of testing capabilities, ranging from unit tests to global model tests, (3) a set of peer-reviewed metrics for validation and verification, and (4) a community process by which new or modified parameterizations become supported within the CCPP. We recognize that an easier technical implementation path for a physical parameterization does not replace the scientific expertise necessary to ensure that it functions correctly or works well as part of a suite. A test environment intended to ease that process is also under development at GMTB, beginning with a single-column model linked to the GFS physics.

The low barrier to entry implies a highly modular code and clear dependencies. Dependencies, and the interface between physics and a dynamical core, will be handled by a thin “driver” layer (dark green box in the schematic). Variables are defined in the driver, and exchanged between model dynamics and various parameterizations. The current driver, being used for the NGGPS dynamic core test participants to run their models with the GFS physics suite, is a descendant of the National Unified Operational Prediction Capability (NUOPC) physics driver. Going forward it will be called the Interoperable Physics Driver. Continuing NUOPC input is critical to success, and the Driver development is proceeding with the NUOPC physics group’s knowledge and input.

The DTC is uniquely qualified to fulfill a leading role in physics development and community support, and the emerging CCPP is a critical element to bridge research and operations. The result will be a capability for operational centers to more rapidly adopt codes that reflect evolving scientific knowledge, and an operationally relevant environment for the broad community of physics developers to test ideas.


NITE: NWP Information Technology Environment

Summer 2015

Over the years, the DTC has put in place several mechanisms to facilitate the use of operational models by the general community, mostly by supporting operational codes (for data assimilation, forecasting, postprocessing etc.) and organizing workshops and tutorials.

By stimulating the use of operational codes by the research community, composed of universities, NCAR, and government laboratories, several new NWP developments have been transitioned to NCEP operations. However, in spite of the relative success of the DTC, there are still significant gaps in the collaboration between the research and operational groups. The NITE project focuses on infrastructure design elements that can be used to facilitate this collaborative environment.



During the past year, the DTC received funding from NOAA to create a design for an infrastructure to facilitate development of NCEP numerical models by scientists both within and outside of EMC. Requirements for NITE are based on a survey of potential users and developers of NCEP models, information obtained during site visits to the NOAA Environmental Modeling Center, the UK Meteorological Office, and the European Centre for Medium-Range Weather Forecasting, discussions with focus groups, and reviews of various existing model development systems.

The NITE design has been developed with the following goals in mind: 

  • modeling experiments easier to run;
  • a single system available to NCEP and collaborators;
  • results relevant for R2O;
  • reproducibility and records of experiments; and
  • general to any NCEP modeling suite.

The following elements are included in the system design:

Data management and experiment database Scientists need access to input datasets (model and observations), a mechanism for storing selected output from all experiments, and tools for browsing, interrogating, subsetting, and easily retrieving data. To facilitate sharing information, key aspects of the experiment setup, such as provenance of source code and scripts, configuration files, and namelist parameters, need to be recorded in a searchable database.


“NWP Information Technology Environment (NITE): an infrastructure to facilitate development of NCEP numerical models.”

Source code management and build systems Source code repositories for all workflow components need to be available and accessible to the community. Fast, parallel build systems should be implemented to efficiently build all workflow components of a suite before experiments are conducted.

Suite definition and configuration tools All configurable aspects of a suite are abstracted to files that can be edited to create the experiments. Predefined suites are provided as a starting point for creating experiments, with scientists also having the option to compose their own suites.

Scripts The scripting is such that each workflow component (e.g., data assimilation) is associated with a single script, regardless of which suite is being run.

Workflow automation system The workflow automation system handles all job submission activity. Hence, the scripts used to run workflow components do not contain job submission commands.

Documentation and training Documentation and training on all workflow components and suites are readily available through electronic means.

In addition to the elements above, standardized tools for data visualization and forecast verification need to be available to all scientists.

Next steps for NITE:  Modernization of the modeling infrastructure at NCEP is very important for community involvement with all NCEP suites, and with the Next Generation Global Prediction System (NGGPS) in particular. The recommended implementation approach for NITE includes several phases, to minimize disruption to operational systems, and limit implementation costs, while providing useful, incremental capabilities that will encourage collaboration. Ongoing discussions between EMC and DTC, especially in the context of NGGPS infrastructure modernization, will likely lead to NITE implementation in the coming years.

See http://dtcenter.org/eval/NITE


NITE design a software infrastructure

DTC: The Next Ten Years

Winter 2015

The transition of research advances into operations (abbreviated as R2O), particularly those operations involving numerical weather prediction, satellite meteorology, and severe weather forecasting, has always been a major challenge for the atmospheric science community.

With a preeminent mission to facilitate R2O in mind, NOAA and NCAR established the DTC in 2003. Since then, the DTC has worked toward this goal in three specific ways:  by providing community support for operational NWP systems, by performing testing and evaluation of promising NWP innovations, and by promoting interactions between the research and operational NWP communities via workshops, a newsletter, and a robust visitor program. Early DTC activities, which were primarily focused on evaluation of opportunities afforded by the then-new Weather Research and Forecasting model (WRF), included the testing and evaluation of two WRF model dynamic cores (one developed at NCAR and the other at EMC), rapid refresh applications; and a real-time high resolution winter forecast experiment. As a neutral party not involved with the development of either core, the DTC played a vital, independent role in these tests, especially their planning, their evaluation, and the provision of statistical results to all parties.



In its other role, that of community support, the DTC began providing users of the operational NMME model with documentation, tutorials, and help desk access in 2005. Since then, this DTC activity has grown in extent and complexity, and today also includes community support for the HWRF end-to-end tropical cyclone prediction system, the Unified Post Processer (UPP), Gridpoint Statistical Interpolation (GSI) and GSI ensemble hybrid data assimilation systems, and the Model Evaluation Tools (MET) verification system. In April 2015, the DTC will host its first Nonhydrostatic Multiscale Model on the B-grid (NMMB) tutorial at College Park, MD. Since its inception, the DTC has in fact organized or co-sponsored 27 community workshops, and has hosted 49 visitor projects selected on the basis of their potential to facilitate interaction between the operational and research NWP communities. The accompanying figures illustrate the distribution and evolution of DTC visitors and users of DTC-supported systems.


“The DTC has organized or co-sponsored 27 community workshops and has hosted 49 visitor projects.”

These activities have so far been primarily focused on regional and national weather modeling. Now, with continued advances in computing technology, global operational NWP using nonhydrostatic models at cloud-permitting resolution is within reach. With this possibility in mind, all major international operational centers are actively developing advanced global models. The United States National Weather Service, for example, initiated a major R2O project in 2014 to develop a Next-Generation Global Prediction System (NGGPS) that would reach mesoscale resolution. The boundary between regional and global modeling at these scales becomes murky indeed, and previous work of the DTC (testing of model physics in regional models, for example) becomes very relevant to global models as well. Recognizing this opportunity, the DTC Executive Committee unanimously voted earlier this year to expand the DTC’s scope to include global modeling. This decision marks a change that will have a profound impact on the direction of the DTC for the next ten years. Here, I offer my perspective on what, in this new context, the DTC should be focusing on in the future.



Storm-scale NWP. While significant progress has been made in NWP over the past decade, society’s expectations have often exceeded improvements. An excellent example is the recent January blizzard forecast for New York City, for which the inability to adequately convey forecast uncertainties in numerical guidance was widely recognized. In a previous but related report, the UCAR Community Advisory Committee for NCEP (or UCACN) pointed out that NCEP does not have an operational ensemble prediction system at convection-permitting (that is, storm-scale) resolution. The development and operation of a prediction system of this kind is a major undertaking, with significant computing demands and challenging scientific and technical issues. Among them are questions concerning initial condition perturbations, model perturbations, calibration, post-processing, and verification, just to name a few. These are also areas of active research attracting the interest of a significant fraction of the 24,000 registered WRF users. Since convection-resolving ensemble prediction is in fact a theme that cross-cuts all its current task areas, the DTC should be well positioned to facilitate R2O toward this end that is useful to both operations and research.

Unified modeling. From an R2O perspective, it is highly beneficial to reduce the number of operational systems, thereby allowing the research community to focus on a smaller number of systems.  Unified modeling (UM), which seeks to limit the proliferation of competing modeling elements, has been recognized worldwide as the most cost-effective approach to deal with the increased number and complexity of numerical weather, climate and environmental prediction systems at all space and time scales. A UM framework also allows sharing of modeling efforts (e.g., improvements in physical parameterizations) across different modeling systems. The UCACN has urged NCEP to migrate toward a UM approach for its future model development, and has suggested an interim goal of reducing NCEP modeling systems to only two: A global weather and climate system (GFS/CFS) and a very-high resolution convection resolving system. With nesting capability, the global high-resolution nonhydrostatic model planned for the NGGPS project could be a suitable candidate for a UM framework at NCEP.  It is true that migration toward UM is a significant challenge for any operational center, involving as it does a major culture change in addition to numerous technical issues. In its capacity for testing and evaluation, the DTC can help facilitate such a transition at NCEP.


“When fully developed, the global system will be an earth modeling system with fully coupled atmosphere, ocean, ice, land, waves, and aerosol components.”

Earth system modeling. When fully developed, the NGGPS will be an earth modeling system with fully coupled atmosphere, ocean, ice, land, waves, and aerosol components. The interactions between these components will require compatibility within the NOAA Environmental Modeling System (NEMS) and the Earth System Modeling Framework (ESMF). The NGGPS is expected to provide improved forecasts at a wide range of time scales, from a few hours to 30 days. For this major undertaking to be successful, the community at large will have to contribute at every step of its development. The DTC can encourage and facilitate these contributions to NGGPS code development by managing that code in a way that allows effective access by external developers, and by performing independent testing and evaluation of system upgrades proposed by the external community.

NWP IT Environment. For each NWP system it supports, the DTC typically maintains a community repository separate from the repository maintained at operational centers. Maintaining a separate community repository is a mixed blessing. On the one hand, a separate repository shields operations from potentially problematic code changes that have not been fully tested. On the other hand, ensuring proper synchronization between the two repositories (a necessary step if the research community is to take advantage of the latest developments at operational centers) becomes a greater challenge. Taking advantage of experience at other operational centers (e.g., ECMWF and UKMO), the DTC in collaboration with EMC has started exploring the possibility of developing an NWP IT Environment (NITE) concept for community support for operational systems. The basic idea of NITE is to maintain an IT infrastructure at the operational center itself (i.e., at EMC) that supports the development, testing, and evaluation of operational models by scientists both within and outside the center. Given the complexity of the NGGPS system, maintaining duplicate systems (repositories) for its many modeling components is neither feasible nor cost effective. This leaves a NITE infrastructure as perhaps the only viable option. The DTC should continue to work with EMC to support NITE development, with the potential of a profound impact on how R2O in NWP is conducted for the coming decade.

Microphysics, from Water Vapor to Precipitation

Summer 2014

NCAR-RAL has a long track record of transitioning numerical weather prediction (NWP) model cloud microphysical schemes from research to operations.

Beginning in the 1990s, a scheme by Reisner et al (1998) was created within MM5 (Fifth-Generation Penn State/NCAR Mesoscale Model) but also transitioned to the Rapid Update Cycle (RUC) model. A few years later, the scheme was modified and updated for both MM5 and RUC by Thompson et al (2004). Then, as the Rapid Refresh (RAP) model was replacing the RUC, an entirely rewritten microphysics scheme by Thompson et al (2008) was created for operational use in the Weather Research and Forecast (WRF) and RAP models. A primary goal of each of these efforts was to improve upon the explicit prediction of supercooled liquid water and aircraft icing while also improving quantitative precipitation forecasts (QPF) and surface sensible weather elements such as precipitation type.

The established pathway for transition to operations for the Thompson et al (2008) microphysics scheme is greatly facilitated through the WRF code repository and a continual collaboration with NOAA’s Earth System Research Laboratory (ESRL) and Global Sciences Division (GSD), especially the team led by Stan Benjamin. Various improvements to the scheme are rapidly implemented into prototype operations at NOAA-GSD for further testing before they eventually transition to the National Centers for Environmental Prediction (NCEP) Environmental Modeling Center (EMC) in the fully operational RAP model at NCEP.


The two-panel figure shows a 48 hour forecast of model lowest level radar reflectivity valid at 0 UTC 02 Feb 2011 made by the WRF-ARW (top panel) model and NEMS NMMB model (bottom panel).

A more recent DTC effort has included the testing and evaluation of the Thompson et al (2008) microphysics scheme into the Hurricane WRF (HWRF) model to see if it improves tropical cyclone track and intensity forecasts. During development, the scheme’s developers had not previously worked in the area of tropical cyclone prediction, but focused instead on mid-latitude weather. The current test may reveal potential improvements to tropical storm prediction or shortcomings in the microphysics scheme that could lead to future improvements.

A second DTC effort is the incorporation of the Thompson et al (2008) microphysics scheme into NCEP’s NEMS-NMMB (NOAA Environmental Modeling System-Nonhydrostatic Multiscale Model on B-grid) model, which is also the current North American Model (NAM). As the NAM transitions to higher and higher resolution, the potential use of alternative microphysics schemes is being considered. To achieve this goal, a number of structural code changes to NEMS-NMMB model were made to accept the larger number of water species used by the Thompson et al (2008) scheme, as compared to number of species in the operational microphysics scheme. However, the extent of code changes directly within the microphysical module was very minimally different than the existing WRF code, which greatly facilitates future WRF-code transitions to NEMS-NMMB.

The two-panel figure above shows a 48 hour forecast of model lowest level radar reflectivity valid at 0000 UTC 02 Feb 2011 made by the WRF-ARW (top panel) model and NEMS-NMMB model (bottom panel). Particularly evident in a comparison of the two model cores are sporadic low-value dBZ forecasts seen in broad areas of the NMMB and to a much lesser degree in the WRF, suggesting a much greater presence of drizzling clouds in the NMMB. Also shown in the figure at the beginning of the article (page 1) is the WRF-predicted explicit precipitation type with blue/pink/green shades representing snow, graupel, and rain, respectively, along with an overlay of colored symbols to represent the surface weather observations of various precipitation types. The notable lack of graupel observations vis-à-vis forecasts likely reflects deficiencies of automated observations.

AMS, Thompson et al. 2008, http://journals.ametsoc.org/doi/abs/10.1175/2008MWR2387.1 and 2014, http://journals.ametsoc.org/doi/abs/10.1175/JAS-D-13-0305.1

Keeping up with Model Testing & Evaluation Advances: New Verification Displays

Winter 2014
As numerical model predictions and functions proliferate and move toward ever higher resolution, verification techniques and procedures must also advance and adapt.

Assisting with the Transition of promising NWP Techniques from research to Operations

The ability to consolidate and integrate numerous verification results that are increasingly differentiated in intent and type largely depends on the effectiveness of graphical displays. In response to these needs, several new kinds of displays have recently been added to the DTC and MET arsenal, or are in process of development and assessment at the DTC.



An example is the regional display of verification scores in the figure above, where results from relatively long verification periods at point locations are shown (in this case, dewpoint temperature bias at surface observation sites). Although time resolution is sacrificed, these plots represent an important way to assess topographic, data density, and other geographic effects on model accuracy. In the first figure, for instance, the cluster of red symbols (portraying too-high dewpoints) in the mountains of Colorado, and along the east coast offer clues useful for assessing model inaccuracies. The opposite tendency (low-biased dewpoints, or toodry forecasts) are pronounced over Texas and Oklahoma, and in the Central Valley of California. The figure below is an example of new utilities used by the Ensemble Task to compute and display ensemble-relevant verification results. In this case, it is one way to present the spread-skill relationship, an important characteristic of ensemble systems. As is commonly seen, these particular CONUS-based ensemble members display an under-dispersive relationship; the struggle to create ensemble systems that accurately represent the natural variability is a difficult one still.

 



Among ongoing and future product directions are display options for time series evaluation of forecast consistency, in particular for “revision series” of hurricane track locations (figure below). The objective of this kind of graphic is to examine the consistency of a model’s track prediction with its own prior forecasts at the same location and time. For many users, this consistency in forecasts through time is a desirable quality; if updating forecasts change much or often, a user may believe they are of low quality, possibly even random. For instance, in the figure, the model shows consistent updates in the Caribbean, and inconsistent (zigzagging) ones as the storm moves northward. These latter forecasts of hurricane location might thus be considered less reliable.


Evaluating WRF performance over time

Autumn 2014
As modifications and additions are made to WRF code and released to the community, users often ask, “Is WRF really improving?”

Time series plot of 2m T (C) bias across CONUS domain over the warm season for WRF versions 3.4 (green), 3.4.1 (blue), 3.5 (red), 3.5.1 (orange), and v3.6 (purple). Median values of distribution are plotted with 99% confidence intervals. The gray boxes around forecast hour 30 and 42 correspond to the times shown in next figure.

This is a hard question to answer, largely because “WRF” means something different to each user with a specific model configuration for their application. With the numerous options available in WRF, it is difficult to test all possible combinations, and resulting improvements and/ or degradations of the system may differ for each particular configuration. Prior to a release, the WRF code is run through a large number of regression tests to ensure it successfully runs a wide variety of options; however, extensive testing to investigate the skill of the forecast is not widely addressed. In addition, code enhancements or additions that are meant to improve one aspect of the forecast may have an inadvertent negative impact on another.

In an effort to provide unbiased information regarding the progression of WRF code through time, the DTC has tested one particular configuration of the Advanced Research WRF (ARW) dynamic core for several releases of WRF (versions 3.4, 3.4.1, 3.5, 3.5.1, and 3.6). For each test, the end-to-end modeling system components were the same: WPS, WRF, the Unified Post Processor (UPP) and the Model Evaluation Tools (MET). Testing was conducted over two three-month periods (a warm season during July-September 2011 and a cool season during January-March 2012), effectively capturing model performance over a variety of weather regimes. To isolate the impacts of the WRF model code itself, 48-h cold start forecasts were initialized every 36h over a 15-km North American domain.

The particular physics suite used in these tests is the Air Force Weather Agency (AFWA) operational configuration, which includes WSM5 (microphysics), Dudhia/RRTM (short/long wave radiation), M-O (surface layer), Noah (land surface model), YSU (planetary boundary layer), and KF (cumulus). To highlight the differences in forecast performance with model progression, objective model verification statistics are produced for surface and upper air temperature, dew point temperature and wind speed for the full CONUS domain and 14 sub-regions across the U.S. Examples of the results (in this case, 2 m temperature bias) are shown in the figures. A consistent cold bias is seen for most lead times during the warm season for all versions (figure on page 1). While there was a significant degradation in performance during the overnight hours with versions 3.4.1 and newer, a significant improvement is noted for the most recent version (v3.6). Examining the distribution of 2 m temperature bias spatially by observation site (figure below), it is clear that for the 30-hour forecast lead time (valid at 06 UTC), v3.6 is noticeably colder over the eastern CONUS. However, for the 42-hour forecast lead time (valid at 18 UTC), v3.4 is significantly colder across much of the CONUS. For the full suite of verification results, please visit: WRF Version Testing website at www.dtcenter.org/eval/meso_mod/version_tne


The four-panel figure shows average 2 m temperature (C) bias by observation station over the warm season for WRF version 3.4 (left) and 3.6 (right) at forecast hour 30 (top) and 42 (bottom).

Mesoscale Model Evaluation Testbed

Spring 2013

The DTC provides a common framework for researchers to demonstrate the merits of new developments through the Mesoscale Model Evaluation Testbed (MMET).

Established in the Fall of 2012, MMET provides initialization and observation data sets for several case studies and week-long extended periods that can be used by the entire numerical weather prediction (NWP) community for testing and evaluation. The MMET data sets also include baseline results generated by the DTC for select operational configurations.

To date, MMET includes nine cases that are of interest for the National Centers for Environmental Prediction/ Environmental Modeling Center (NCEP/EMC). A brief description of each case, along with access to the full data sets is available at http:// www.dtcenter.org/eval/mmet. Researchers are encouraged to run several case studies spanning multiple weather regimes to illustrate the versatility of this new innovation for operational use.



“Researchers are encouraged to run several case studies to illustrate the versatility of the system.”

One particular case available in MMET is 28 February 2009, when nearly 7 inches of snow fell in Memphis, TN. A squall line marched through the Southeast along the leading edge of a cold front, prompting three tornado and several high-wind reports. The next two days (1-2 March), snow fell from Atlanta to New York, dropping up to a foot of snow in some areas. The figure above shows the two day precipitation accumulation. This case is of interest to NCEP/ EMC because the North American Mesoscale (NAM) model quantitative precipitation forecast valid 1 March shifted precipitation too far north, missing a rain/snow mix in Georgia and falsely predicting snow in western parts of the Carolinas.

If improved forecast accuracy is demonstrated through objective verification results with MMET cases, the technique can be submitted for further extensive testing by the DTC.

Community users can nominate innovations for more extensive DTC testing by filling out the nomination form (http://www.dtcenter.org/eval/ mmet/candidates/form_submission. php).

As MMET continues to mature, additional cases will be made available to broaden the variety of available events in the collection. Submissions for additional cases to be included in MMET are accepted at: http://www. dtcenter.org/eval/mmet/cases/form_ submission.php. For more information on the testing protocol process defined to accelerate the transition of mesoscale modeling techniques from research to operations, please see http://www.dtcenter.org/eval/mmet/ testing_protocol.pdf.

Comments and questions regarding MMET or any stage of the testing protocol process can be directed to Jamie Wolff (jwoff@ucar.edu).

THE 2013 HURRICANE WRF

Summer 2013

As the 2013 hurricane season continues in the North Atlantic and eastern North Pacific basins, a newly minted HWRF model is providing forecasts for the National Hurricane Center (NHC) on a new machine and with significant code additions. On July 24, the operational HWRF went live on the Weather and Climate Operational Supercomputing System (WCOSS). A research version for testing continues in use on the jet computers at the NOAA ESRL Global Systems Division. New, more efficient code promises to provide quicker processing, allowing timely forecasts and the opportunity to use more sophisticated physics routines.


HWRF simulated satellite image of TC Dorian

This year’s HWRF has several new features and options. Among the most significant are:

1. New data assimilation options. The HWRF can now assimilate wind information from the tail Doppler radar (TDR) on hurricane flight aircraft.

2. Use of hybrid data assimilation system, which allows better use of observations to initialize the model.

3. Increased code efficiency, which allows physics packages to run at 30 second intervals as compared to last year’s 90 seconds.


“Ambitious plans for HWRF in 2014 and beyond include new data and multiple moving nests.”

Additionally, this year’s HWRF public release, for the first time, supports idealized tropical cyclone simulations and hurricane forecasts in basins beyond the eastern North Pacific and North Atlantic.

The DTC conducts testing and evaluation of HWRF, and also serves as an accessible repository for HWRF code. Software version control assures that HWRF developers at EMC, GSD, and other operational and research institutions obtain consistent results. Particular attention has been paid to facilitate the inclusion of new research developments into the operational configuration of all HWRF components. For instance, updated model routines for the Princeton Ocean Model for Tropical Cyclones (POM-TC), developed at the University of Rhode Island, can be seamlessly introduced.

Ambitious plans for the HWRF in 2014 and beyond include code allowing multiple moving nests in the same run, additional targets for data assimilation (dropsondes, etc.), and post-landfall forecasts of ocean surge, waves, precipitation, and inundation.

See the HWRF v3.5a public release announcement in this issue.

SREF and the Impact of Resolution and Physics Changes

Autumn 2013

As operational centers move inexorably toward ensemble-based probabilistic forecasting, the role of the DTC as a bridge between research and operations has expanded to include testing and evaluation of ensemble forecasting systems.

In 2010 the ensemble task area in the DTC was designed with the ultimate goal of providing an environment in which extensive testing and evaluation of ensemble-related techniques developed by the NWP community could be conducted. Because these results should be immediately relevant to the operational centers (e.g., NCEP/EMC and AFWA), the planning and execution of these DTC evaluation activities has been closely coordinated with the operational centers. All of the specific components of the ensemble system have been subject to evaluation, including ensemble design, post-processing, products, and verification. More information about the DTC Ensemble Task organization and goals can be found at: http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-11-00209.1


"It appears that finer resolution improves SREF forecast performance more than changes in microphysics.”

Recently, efforts of the DTC Ensemble team have included evaluation of the impact that changes in the National Centers for Environmental Prediction/Environmental Modeling Center (NCEP/EMC) Short-Range Ensemble Forecast (SREF) configuration have had on its performance. The focus has been on two areas: the impact of increased horizontal resolution and the impact due to changes in the model microphysical schemes. In an initial experiment, SREF performance using 16 km horizontal grid spacing (the current operational setting) was compared with the performance of SREF with potential future horizontal grid spacing of 9 km. In the second experiment the focus was on changes in microphysical parameterizations.



In the current operational version of SREF only one microphysical scheme (Ferrier) is used. That version has now been compared with results from an experimental ensemble configuration that includes two other microphysics options (called WSM6 and Thompson). Although these preliminary tests have used only SREF members from one WRF core (WRF-ARW), future tests will add NMMB members into the analysis. The sets of comparison ensemble systems each consisted of seven members: a control, and two pairs of three members with varying initial perturbations. This preliminary study was performed over the transition month of May 2013, and over the continental US domain. By good fortune, the time period captured one of the most active severe weather months in recent history, promising an interesting dataset for future in-depth studies.

Verification for the set of runs was performed using the DTC’s Model Evaluation Tools (MET) for both single-value and probabilistic measures aggregated over the entire month of study. Some of the relevant results are illustrated in the accompanying figures, each of which displays arithmetic means from the corresponding ensemble system. The first figure shows box plots of bias corrected root mean square error (BCRMSE) with analysis and two lead times for 850 mb temperature for the operational 16 km SREF (yellow), a parallel configuration with a different combination of microphysics (red), and the experimental 9 km setting (purple). For this preliminary run, it appears that finer resolution improves SREF forecast performance more than changes in microphysics. Indeed, the pairwise differences between the 16 km and 9 km SREF forecasts in the second figure represent a comparison for the 24 hr lead time that is statistically significant, albeit for a limited data sample. Additional detailed analyses of an expanded set of these data are under way.