Ensemble-Stat

Ensemble-Stat cindyhg Thu, 04/25/2019 - 10:46

Ensemble-Stat Functionality

The Ensemble-Stat tool may be used to derive several summary fields, such as the ensemble mean, spread, and relative frequencies of events (i.e. similar to a probability). The summary fields produced by Ensemble-Stat may then be verified using the other MET statistics tools. Ensemble-Stat may also be used to verify the ensemble directly by comparing it to gridded and/or point observations. Statistics are then derived using those observations, such as rank histograms and the continuous ranked probability score.

Ensemble-Stat Usage

View the usage statement for Ensemble-Stat by simply typing the following:

ensemble_stat

At a minimum, the input gridded ensemble files and the configuration config_file must be passed in on the command line. You can specify the list of ensemble files to be used either as a count of the number of ensemble members followed by the file name for each (n_ens ens_file_1 ... ens_file_n) or as an ASCII file containing the names of the ensemble files to be used (ens_file_list). Choose whichever way is most convenient for you. The optional -grid_obs and -point_obs command line options may be used to specify gridded and/or point observations to be used for computing rank histograms and other ensemble statistics.

As with the other MET statistics tools, all ensemble data and gridded verifying observations must be interpolated to a common grid prior to processing. This may be done using the automated regrid feature in the Ensemble-Stat configuration file or by running copygb and/or wgrib2 first.

Configure

Configure cindyhg Thu, 04/25/2019 - 10:47

The behavior of Ensemble-Stat is controlled by the contents of the configuration file passed to it on the command line. The default Ensemble-Stat configuration file may be found in the $MET_BASE/config/EnsembleStatConfig_default file. The configuration used by the test script may be found in the met-8.0/scripts/config/EnsembleStatConfig file. Prior to modifying the configuration file, users are advised to make a copy of the default:

cp $MET_BASE/config/EnsembleStatConfig_default $MET_TUTORIAL_DATA/config/EnsembleStatConfig_APCP_24

The configurable items for Ensemble-Stat are broken out into two sections. The first section specifies how the ensemble should be processed to derive summary fields, such as the ensemble mean and spread. The second section specifies how the ensemble should be verified directly, such as the computation of rank histograms and spread/skill. The configurable items include specifications for the following:

  • Section 1: Ensemble Processing (ens dictionary)
    • The ensemble fields to be summarized at the specified vertical level or accumulation interval.
    • The threshold values to be applied in computing ensemble relative frequencies (e.g. the percent of ensemble members exceeding some threshold at each point).
    • Thresholds to specify how many of the ensemble members must actually be present with valid data.
  • Section 2: Verification (fcst and obs dictionaries)
    • The forecast and observation fields to be verified at the specified vertical level or accumulation interval.
    • The matching time window for point observations.
    • The type of point observations to be matched to the forecasts.
    • The areas over which to aggregate statistics - as predefined grids or configurable lat/lon polylines.
    • The interpolation or smoothing methods to be used.

You may find a complete description of the configurable items in the MET Users Guide or in the $MET_BASE/config/READMEfile. Please take some time to review them.

For this tutorial, we'll configure Ensemble-Stat to summarize and verify 24-hour accumulated precipitation. While we'll run Ensemble-Stat on a single field, please note that it may be configured to operate on multiple fields. The ensemble we're verifying consists of 6 members defined over the west coast of the United States.

Open up the $MET_TUTORIAL_DATA/config/EnsembleStatConfig_APCP_24 file and edit it as follows:

In the ens dictionary, set

   field = [
     {
       name       = "APCP";
       level      = [ "A24" ];
       cat_thresh = [ >0, >=5.0, >=10.0 ];
     }
   ];

To read 24-hour accumulated precipitation from the input GRIB files and compute ensemble relative frequencies for the thresolds listed.

In the fcst dictionary, set

   field = [
     {
       name       = "APCP";
       level      = [ "A24" ];
     }
   ];

To also verify the 24-hour accumulated precipitation fields.

  • In the fcst dictionary, set message_type = [ "ADPSFC" ];
    To verify against surface observations.
  • In the mask dictionary, set grid = [ "FULL" ]; To accumulate statistics over the full model domain.

In the mask dictionary, set

  poly = [ "${MET_BASE}/poly/NWC.poly",
            "${MET_BASE}/poly/SWC.poly" ];

To also verify over the northwest coast (NWC) and southwest coast (SWC) subregions.

In the output_flag dictionary, set

   output_flag = {
      ecnt  = BOTH;
      rhist = BOTH;
      phist = BOTH;
      orank = BOTH;
      ssvar = BOTH;
      relp  = BOTH;
   }

To write to the ".stat" output file as well the optional "_type.txt" file, a more readable ASCII file sorted by line type.

Save and close this file.

Run

Run cindyhg Thu, 04/25/2019 - 10:48

Next, run Ensemble-Stat on the command line using the following command:

ensemble_stat \
6 $MET_TUTORIAL_DATA/input/sample_fcst/2009123112/*gep*/d01_2009123112_02400.grib \
$MET_TUTORIAL_DATA/config/EnsembleStatConfig_APCP_24 \
-grid_obs $MET_TUTORIAL_DATA/input/sample_obs/ST4/ST4.2010010112.24h \
-point_obs $MET_TUTORIAL_DATA/output/ascii2nc/precip24_2010010112.nc \
-outdir $MET_TUTORIAL_DATA/output/ensemble_stat -v 2

The command above uses the ASCII2NC output of that was generated by make test. Please ensure that it has been run.

Ensemble-Stat is now performing the tasks we requested in the configuration file. Note that we've passed the input ensemble data directly on the command line by specifying the number of ensemble members (six) followed by their names using wildcards. We've also specified one gridded StageIV analysis field (-grid_obs) and one file containing point rain gauge observations (-point_obs) to be used in computing rank histograms. This tool should run pretty quickly.

When Ensemble-Stat is finished, it will have created nine output files in the $MET_TUTORIAL_DATA/output/ensemble_statdirectory: seven ASCII statistics files (.stat_ecnt.txt_rhist.txt_phist.txt_orank.txt_ssvar.txt , and _relp.txt ) , a NetCDF file with gridded fields of ensemble forecast values (_ens.nc), and a NetCDF file with gridded fields of observation ranks (_orank.nc) (if gridded observations provided).

Output

Output cindyhg Thu, 04/25/2019 - 10:50

The output of Ensemble-Stat is a NetCDF file containing the gridded fields of ensemble forecast values, a NetCDF file containing the gridded fields of observation ranks (if gridded observations provided), and one or more ASCII files containing statistics summarizing the verification performed. In this example, the output is written to the $MET_TUTORIAL_DATA/output/ensemble_stat directory as we requested on the command line.

All of the ASCII statistics output is written to the file ending in .stat. While other MET statistics tools write many output line types, Ensemble-Stat currently writes six, ECNTRHISTPHISTORANKSSVAR, and RELP. The ECNT line type contains continuous ensemble statistics. The RHIST line type contains counts for a ranked histogram. The PHIST line type contains counts for a probability integral transform histogram. The ORANK line type is similar to the matched pair (MPR) output of Point-Stat. For each point observation value, one ORANK line is written out containing the observation value, it's rank, and the corresponding ensemble values for that point. The SSVAR line contains binned spread/skill information. Lastly, the RELP line is the relative position, which is similar to the rank histogram line type, but no ranking is done. For each observation location, the ensemble variance is computed at that point. Those variance values are binned based on the ens_ssvar_bin_size configuration setting, and one SSVAR line is written for each bin summarizing the data it contains.

The STAT file contains all the ASCII output while the _ecnt.txt_rhist.txt_phist.txt_orank.txt_ssvar.txt, and _relp.txtfiles contain the same data but sorted by line type. Since so much data can be written for the ORANK line type, we recommend disabling the output of the optional text file using the output_flag parameter in the configuration file.

Since the lines of data in these ASCII file are so long, we strongly recommend configuring your text editor to NOT use dynamic word wrapping. The files will be much easier to read that way.

Open up the $MET_TUTORIAL_DATA/output/ensemble_stat/ensemble_stat_20100101_120000V_rhist.txt RHIST file using the text editor of your choice and note the following:

  • There are 6 lines in this output file resulting from using 3 verification regions (FULLNWC, and SWC) and two observations datasets (ADPSFC point observations and gridded observations).
  • Each line contains columns for the observations ranks (RANK_#), the continuous ranked probability score (CRPS), and the ignorance score (IGN).
  • There is output for 7 ranks - since we verified a 6-member ensemble, there are 7 possible ranks the observation values could attain.

Close this file, and open up the $MET_TUTORIAL_DATA/output/ensemble_stat/ensemble_stat_20100101_120000V_phist.txt PHIST file, and note the following:

  • There are 5 lines in this output file resulting from using 3 verification regions (FULLNWC, and SWC) and two observations datasets (ADPSFC point observations and gridded observations), where the ADPSFC point observations for the SWC region were all zeros for which the probability integral transform is not defined.
  • Each line contains columns for the BIN_SIZE and counts for each bin. The bin size is set in the configuration file using the ens_phist_bin_size field. In this case, it was set to .05, therefore creating 20 bins (1/ens_phist_bin_size).

Close this file, and open up the $MET_TUTORIAL_DATA/output/ensemble_stat/ensemble_stat_20100101_120000V_orank.txt ORANK file, and note the following:

  • This file contains 1866 lines, 1 line for each observation value falling inside each verification region (vx_mask).
  • Each line contains 42 columns header information, the observation location and value, it's rank, and the 6 values for the ensemble members at that point.
  • When there are ties, Ensemble-Stat randomly assigns a rank from all the possible choices. This can be seen in the SWC masking region where all of the observed values are 0 and the ensemble forecasts are 0 as well. Ensemble-Stat randomly assigns a rank between 1 and 7.

Close this file. You have opened the _rhist.txt_phist.txt, and _orank.txt output files, but not the _ecnt.txt_relp.txt, and _ssvar.txt files. For more detailed information regarding those files, take a look at Table 9.2, Table 9.5, and Table 9.7, respectively, in the METv8.0 User's Guide.

se the ncview utility to view the NetCDF ensemble fields file:

ncview $MET_TUTORIAL_DATA/output/ensemble_stat/ensemble_stat_20100101_120000V_ens.nc

This file contains variables for the following:

  1. Ensemble Mean
  2. Ensemble Standard Deviation
  3. Ensemble Mean minus 1 Standard Deviation
  4. Ensemble Mean plus 1 Standard Deviation
  5. Ensemble Minimum
  6. Ensemble Maximum
  7. Ensemble Range
  8. Ensemble Valid Data Count
  9. Ensemble Relative Frequency (for 3 thresholds)

The output of any of these summary fields may be disabled using the output_flag parameter in the configuration file.

Use the ncview utility to view the NetCDF gridded observation rank file:

ncview $MET_TUTORIAL_DATA/output/ensemble_stat/ensemble_stat_20100101_120000V_orank.nc

This file is only created when you've verified using gridded observations and have requested its output using the output_flag parameter in the configuration file. Click through the variables in this file. Note that for each of the three verification areas (FULLNWC, and SWC) this file contains 4 variables:

  1. The gridded observation value
  2. The observation rank
  3. The probability integral transform
  4. The ensemble valid data count

Close this file.