Grid-Stat

Grid-Stat cindyhg Thu, 04/25/2019 - 10:24

Grid-Stat Functionality

The Grid-Stat tool provides verification statistics for a matched forecast and observation grid. All of the forecast gridpoints in the region of interest are matched to observation gridpoints on the same grid. All the matched gridpoints falling inside a verification masking region defined by the user are used to compute the verification statistics. The Grid-Stat tool functions in much the same way as the Point-Stat tool, except that no interpolation is required because the forecasts and observations are on the same grid. However, the interpolation parameters may be used to perform a smoothing operation on the input data prior to verifying it. The output statistics generated by Grid-Stat are largely the same as those generated by Point-Stat.

Grid-Stat Usage

View the usage statement for Grid-Stat by simply typing the following:

grid_stat

At a minimum, the input gridded fcst_file, the input gridded obs_file, and the configuration config_file must be passed in on the command line.

The forecast and observation fields must be interpolated to a common grid before Grid-Stat can compute statistics. As of version 5.1, the MET tools are able to regrid data on the fly using use the regrid section in the configuration files. Alternatively, users may choose to regrid their entire GRIB1 or GRIB2 files using the copygb and/or wgrib2 utilities.

Configure

Configure cindyhg Thu, 04/25/2019 - 10:26

The behavior of Grid-Stat is controlled by the contents of the configuration file passed to it on the command line. The default Grid-Stat configuration file may be found in the $MET_BASE/config/GridStatConfig_default file. The configurations used by the test script may be found in the met-8.0/scripts/config/GridStatConfig* files. Prior to modifying the configuration file, users are advised to make a copy of the default:

cp $MET_BASE/config/GridStatConfig_default $MET_TUTORIAL_DATA/config/GridStatConfig_APCP_12
cp $MET_BASE/config/GridStatConfig_default $MET_TUTORIAL_DATA/config/GridStatConfig_POP_12

The configurable items for Grid-Stat are used to specify how the verification is to be performed. The Grid-Stat configuration file should look very similar to the one for Point-Stat. The configurable items include specifications for the following:

  • The verification domain.
  • The forecast fields to be verified at the specified vertical level or accumulation interval.
  • The threshold values to be applied.
  • The economic cost-loss value ratios to be evaluated.
  • The reference climatological dataset.
  • The areas over which to aggregate statistics - as predefined grids or configurable lat/lon polylines.
  • The confidence interval methods to be used.
  • The smoothing methods to be applied (as opposed to interpolation methods).
  • The types of verification methods to be used.

You may find a complete description of the configurable items in the MET Users Guide or in the $MET_BASE/config/README file. Please take some time to review them.

For this tutorial, we'll run Grid-Stat twice - once to verify the 12-hour accumulated precipitation output of PCP-Combine and once to apply the probabilistic verification methods to a 12-hour probability of precipitation forecast. In the first run, we'll use NetCDF for both the forecast and observation files. In the second run, we'll use a GRIB forecast file and a NetCDF observation file. While we'll use Grid-Stat to verify only one field at a time, it may be configured to verify more than one field at a time.

Open up the $MET_TUTORIAL_DATA/config/GridStatConfig_APCP_12 file for editing with your preferred text editor and edit it as follows:

  • In the fcst dictionary, set
       field = [
         {
           name       = "APCP_12";
           level      = "(*,*)";
           cat_thresh = [ >0, >=5.0, >=10.0 ];
         }
       ];

    To verify the NetCDF variable of that name and apply the 3 thresholds listed. Accumulated precipitation is in millimeters.

  • Set obs = fcst;
    To use the settings from the fcst dictionary above.
  • In the mask dictionary, set grid = [ "G212" ]; To accumulate statistics over NCEP Grid 212 domain.
  • In the mask dictionary, set
       poly = [ "${MET_TUTORIAL_DATA}/output/gen_vx_mask/CONUS_G212_poly.nc",
                "${MET_BASE}/poly/EAST.poly",
                "${MET_BASE}/poly/WEST.poly" ];

    To accumulate statistics over the entire CONUS using the NetCDF output of the Gen-Poly-Mask tool and over the regions defined by the EAST and WEST polyline files.

  • In the nbrhd dictionary, set width = [ 3, 5 ];
    To select two neighborhood sizes over which to accumulate neighborhood statistics.
  • In the nbrhd dictionary, set cov_thresh = [ >=0.5, >=0.75 ];
    To define the fractional coverage threshold values of interest.
  • Set
    output_flag = {
       fho    = NONE;
       ctc    = BOTH;
       cts    = BOTH;
       mctc   = NONE;
       mcts   = NONE;
       cnt    = BOTH;
       sl1l2  = BOTH;
       sal1l2 = NONE;
       vl1l2  = NONE;
       val1l2 = NONE;
       vcnt   = NONE;
       pct    = NONE;
       pstd   = NONE;
       pjc    = NONE;
       prc    = NONE;
       eclv   = NONE;
       nbrctc = BOTH;
       nbrcts = BOTH;
       nbrcnt = BOTH;
       grad   = NONE;
    };

    To indicate that contingency table counts (CTC), contingency table statistics (CTS), continuous statistics (CNT), scalar partial sums (SL1L2), neighborhood contingency table counts (NBRCTC), neighborhood contingency table statistics (NBRCTS), and neighborhood continuous statistics (NBRCNT) should be output.

  • In the nc_pairs_flag dictionary, check that diff = true;
    To indicate that the NetCDF difference field should be output.

Note that we are not requesting multi-category contingency table output, MCTC and MCTS lines. While we are specifying multiple thresholds (>0.0, >=5.0, >=10.0), they are not all of the same type (">" versus ">=") which would cause an error.

Save and close this file and open up the $MET_TUTORIAL_DATA/config/GridStatConfig_POP_12 file for editing with your preferred text editor and edit it as follows:

  • In the fcst dictionary, set
       field = [
         {
           name       = "POP";
           level      = "Z0";
           prob       = TRUE;
           cat_thresh = [ >=0.0, >=0.25, >=0.50, >=0.75, >=1.0 ];
         }
       ];

    To verify the 12-hour probability of precipitation forecast from the input GRIB file and apply the probabilistic thresholds listed.

  • Set the obs dictionary to
    obs = {
       field = [
         {
           name       = "APCP_12";
           level      = "(*,*)";
           cat_thresh = [ >=0.0 ];
         }
       ];
    };

    To verify against the NetCDF variable of that name in the observation file and define the probabilistic event as any non-zero precipitation.

  • In the mask dictionary, set grid = [ "G212" ]; To accumulate statistics over NCEP Grid 212 domain.
  • In the mask dictionary, set
      poly = [ "${MET_TUTORIAL_DATA}/output/gen_vx_mask/CONUS_G212_poly.nc",
                "${MET_BASE}/poly/EAST.poly",
                "${MET_BASE}/poly/WEST.poly" ];

    To accumulate statistics over the entire CONUS using the NetCDF output of the Gen-Poly-Mask tool and over the regions defined by the EAST and WEST polyline files.

  • Set
    output_flag = {
       fho    = NONE;
       ctc    = NONE;
       cts    = NONE;
       mctc   = NONE;
       mcts   = NONE;
       cnt    = NONE;
       sl1l2  = NONE;
       sal1l2 = NONE;
       vl1l2  = NONE;
       val1l2 = NONE;
       vcnt   = NONE;
       pct    = BOTH;
       pstd   = BOTH;
       pjc    = BOTH;
       prc    = BOTH;
       eclv   = NONE;
       nbrctc = NONE;
       nbrcts = NONE;
       nbrcnt = NONE;
       grad   = NONE;
    };

    To indicate that probability contingency table counts (PCT), probability statistics (PSTD), joint/continuous probabilistic statistics (PJC), and probabilistic ROC curve points (PRC) should be output.

  • In the nc_pairs_flag dictionary, check that diff = true;
    To indicate that the NetCDF difference field should be output.

Save and close this file.

Run

Run cindyhg Thu, 04/25/2019 - 10:26

Next, we'll run Grid-Stat twice on the command line using the following two commands:

grid_stat \
$MET_TUTORIAL_DATA/output/pcp_combine/sample_fcst_24L_2005080800V_12A.nc \
$MET_TUTORIAL_DATA/output/pcp_combine/sample_obs_2005080800V_12A.nc \
$MET_TUTORIAL_DATA/config/GridStatConfig_APCP_12 \
-outdir $MET_TUTORIAL_DATA/output/grid_stat \
-v 2
grid_stat \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080312/pop5km_2005080312F096.grib_G212 \
$MET_TUTORIAL_DATA/output/pcp_combine/sample_obs_2005080800V_12A.nc \
$MET_TUTORIAL_DATA/config/GridStatConfig_POP_12 \
-outdir $MET_TUTORIAL_DATA/output/grid_stat \
-v 2

 

With the bootstrap confidence intervals turned off (in boot dictionary, n_rep = 0;), these Grid-Stat commands should run very quickly - in a matter of seconds.

In the first command, which verifies a precipitation forecast, Grid-Stat performs 28 verification tasks. The 28 tasks are a result of: (1 field (APCP at 12-hours) * 4 masking regions) + (1 field * 2 neighborhood sizes * 3 raw thresholds * 4 masking regions)

In the second command, which verifies a probability of precipitation forecast, Grid-Stat performs only 4 verification tasks. The 4 tasks are a result of: (1 field (APCP at 12-hours) * 4 masking regions). Note that the neighborhood verification methods are not applied to probability forecasts.

In general, the MET tools check the output flag values to determine which verification methods to apply. Only those methods required to produce the output statistics requested are performed.

The output file names from these two commands are of the form:

  • For the first command:
ls $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_240000L_20050808_000000V*
  • For the second command:
ls $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_1080000L_20050808_000000V*

Note how similar these output filenames are. Since they're both verified against the same observation file, the valid times listed (20050808_000000V) are the same. The only difference is in the forecast lead times, 24 hours for the first command and 108 hours for the second command. If you'd like to differentiate these output file names in a more descriptive way, you could set the output_prefix parameter in the configuration files. For example, setting the output_prefix parameter to APCP in the first configuration file and POP in the second configuration file would result in the following naming conventions:

grid_stat_APCP_240000L_20050808_000000V* for the first command
grid_stat_POP_1080000L_20050808_000000V* for the second command

The output_prefix configuration option may be used to generate unique output file names.

Output

Output cindyhg Thu, 04/25/2019 - 10:27

The output of Grid-Stat is one or more ASCII files containing statistics summarizing the verification performed and a NetCDF file containing difference fields. In this example, the output is written to the $MET_TUTORIAL_DATA/output/grid_stat directory as we requested on the command line. That output directory should now contain 15 files, 9 from the first Grid-Stat command and 6 from the second.

The first command generates CTC, CTS, CNT, SL1L2, NBRCTC, NBRCTS, and NBRCNT ASCII files, a STAT file, and a NetCDF difference fields file. The second command generates PCT, PSTD, PJC, and PRC ASCII files, a STAT file, and a NetCDF difference fields file.

The format of the CTC, CTS, CNT, and SL1L2 ASCII files are the same as was described for the Point-Stat tool. What's new for the Grid-Stat tool is the neighborhood method output (NBRCTC, NBRCTS, and NBRCNT) and the probability methods output (PCT, PSTD, PJC, and PRC). While Point-Stat is also able to use the probabilistic verification methods, it is NOT able to use the neighborhood verification methods since the observations are not gridded. Neighborhood verification is only available in Grid-Stat.

For the neighborhood methods, rather than comparing forecast/observation values at individual grid points, areas of forecast values are compared to areas of observation values. At each grid box, a fractional coverage value is computed for each field as the number of grid points within the neighborhood (centered on the current grid point) that exceed the specified raw threshold value. The forecast/observation fractional coverage values are then compared rather than the raw values themselves.

For the probability methods, the probabilistic forecast values are thresholded using multiple thresholds between 0 and 1 to define a multi-row contingency table. The observation field is also thresholded to define a binary yes/no field. The pairs of probabilistic forecast values and binary yes/no observation values are used to fill the multi-row contingency table. The output probability counts and statistics are derived from this multi-row contingency table.

Since the lines of data in these ASCII files are so long, we strongly recommend configuring your text editor to NOT use dynamic word wrapping. The files will be much easier to read that way.

Open up the $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_240000L_20050808_000000V_nbrctc.txt NBRCTC file using the text editor of your choice and note the following:

  • The format of this file is almost identical to that of the CTC file.
  • The INTERP_MTHD column is set to NBRHD, indicating that the neighborhood method was applied.
  • The INTERP_PNTS column is set to 9 or 25, indicating that the neighborhood was defined over a 3-by-3 or 5-by-5 square.
  • The LINE_TYPE column is set to NBRCTC, indicating that the columns to follow contain neighborhood contingency table counts.
  • The COV_THRESH column is set to >=0.500 or >=0.7500, indicating the coverage thresholds that were applied to the coverage fields to define these contingency tables.

The same types of differences exist between the CTS and the NBRCTS files.

Close this file, open up the $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_240000L_20050808_000000V_nbrcnt.txt NBRCNT file, and note the following:

  • The format of this file is NOT very similar to that of the CNT files.
  • The two statistics included in this file are the Fractions Skill Score (FSS column) and the Fractions Brier Score (FBS column) and their corresponding confidence intervals. See the MET Users Guide for a description of the neighborhood methods.

Close this file and use the ncview utility (if available on your machine) to view the NetCDF output of Grid-Stat:

ncview $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_240000L_20050808_000000V_pairs.nc &

Click through the variable names in the ncview window to see plots of the forecast, observation, and difference fields for each masking region. Now dump the header using the ncdump utility (if available on your machine):

ncdump -h $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_240000L_20050808_000000V_pairs.nc

View the NetCDF header to see how the variable names are defined.

Next, open up the $MET_TUTORIAL_DATA/output/grid_stat/grid_stat_1080000L_20050808_000000V_pct.txt PCT probabilistic output file using the text editor of your choice and note the following:

 

  • The LINE_TYPE column is set to PCT, indicating that the columns to follow contain information about the probability contingency table counts.
  • Since the number of forecast thresholds the user may choose is variable, the number of columns in this line (and the other probability lines) is variable. This line contains columns named OY_iON_i, and THRESH_i for i = 1 to 5, the probability thresholds chosen.

Close this file and see the MET Users Guide for a description of the other output probability line types.