METplus Practical Session Guide (Version 5.0) | MET Tool: Ensemble-Stat > Output

The output from Ensemble-Stat is one or more ASCII files containing statistics summarizing the verification performed, and a NetCDF file containing the gridded matched pairs.

All of the line types are written to the file ending in .stat. The Ensemble-Stat tool currently writes 12 output line types: ECNT, RPS, RHISTPHISTRELPSSVAR, PCT, PSTD, PJC, PRC, ECLV, and ORANK.

  1. The ECNT line type contains contains continuous ensemble statistics such as spread and skill. Ensemble-Stat uses assumed observation errors to compute both perturbed and unperturbed versions of these statistics. Statistics to which observation error have been applied can be found in columns which include the _OERR (for observation error) suffix.
  2. The RPS line type contains the Ranked Probability Score, as well as the number of ensembles that were used to calculate the score, the Ranked Probability Skill Score, and the Ranked Probability Score decomposed into its terms of reliability, resolution, and uncertainty.
  3. The RHIST line type contains counts for a ranked histogram. This ranks each observation value relative to ensemble member values. Ideally, observation values would fall equally across all available ranks, yielding a flat rank histogram. In practice, ensembles are often under-(U shape) or over-(inverted U shape) dispersive. In the event of ties, ranks are randomly assigned.
  4. The PHIST line type contains counts for a probability integral transform histogram. This scales the observation ranks to a range of values between 0 and 1 and allows ensembles of different size to be compared. Similarly, when ensemble members drop out, RHIST lines cannot be aggregated together but PHIST lines can.
  5. The RELP line is the relative position, which indicates how often each ensemble member's value was closest to the observation's value. In the event of ties, credit is divided equally among the tied members.
  6. The PCT line type contains the contingency table counts for probabilistic forecasts.
  7. The PSTD line type is the probabilistic statistics for dichotomous outcomes for derived ensemble relative frequencies.
  8. The PJC line type contains joint and conditional factorization for derived ensemble relative frequencies
  9. The PRC line type has the receiver operating characteristic for derived ensemble relative frequencies
  10. The ECLV line type  is the economic cost/loss relative value for derived ensemble relative frequencies
  11. The ORANK line type is similar to the matched pair (MPR) output of Point-Stat. For each point observation value, one ORANK line is written out containing the observation value, its rank, and the corresponding ensemble values for that point. When verifying against a griddedanalysis, the ranks can be written to the NetCDF output file.
  12. The SSVAR line contains binned spread/skill information. For each observation location, the ensemble variance is computed at that point. Those variance values are binned based on the ens_ssvar_bin_size configuration setting. The skill is determined by comparing the ensemble mean value to the observation value. One SSVAR line is written for each bin summarizing the all the observation/ensemble mean pairs that it contains.

The STAT file contains all the ASCII output while the _ecnt.txt_rhist.txt_phist.txt_orank.txt_ssvar.txt, and _relp.txt files contain the same data but sorted by line type. Since so much data can be written for the ORANK line type, we recommend disabling the output of the optional text file using the output_flag parameter in the configuration file.

Since the lines of data in these ASCII file are so long, we strongly recommend configuring your text editor to NOT use dynamic word wrapping. The files will be much easier to read that way.
Open up the ensemble_stat_20100101_120000V_rhist.txt RHIST file using the text editor of your choice and note the following:
vi ensemble_stat_20100101_120000V_rhist.txt
  • There are 6 lines in this output file resulting from using 3 verification regions in the VX_MASK column (FULLNWC, and SWC) and two observations datasets in the OBTYPE column (ADPSFC point observations and gridded observations).
  • Each line contains columns for the observation ranks (RANK_#) and a handful of ensemble statistics (CRPS, CRPSS, IGN, and SPREAD).
  • There is output for 7 ranks - since we verified a 6-member ensemble, there are 7 possible ranks the observation values could attain.
Close this file, and open up the ensemble_stat_20100101_120000V_phist.txt PHIST file, and note the following:
vi ensemble_stat_20100101_120000V_phist.txt
  • There are 5 lines in this output file resulting from using 3 verification regions (FULLNWC, and SWC) and two observations datasets (ADPSFC point observations and gridded observations), where the ADPSFC point observations for the SWC region were all zeros for which the probability integral transform is not defined.
  • Each line contains columns for the BIN_SIZE and counts for each bin. The bin size is set in the configuration file using the ens_phist_bin_size field. In this case, it was set to .05, therefore creating 20 bins (1/ens_phist_bin_size).
Close this file, and open up the ensemble_stat_20100101_120000V_orank.txt ORANK file, and note the following:
vi ensemble_stat_20100101_120000V_orank.txt
  • This file contains 1866 lines, 1 line for each observation value falling inside each verification region (VX_MASK).
  • Each line contains 44 columns, including header information, the observation location and value, its rank, and the 6 values for the ensemble members at that point.
  • When there are ties, Ensemble-Stat randomly assigns a rank from all the possible choices. This can be seen in the SWC masking region where all of the observed values are 0 and the ensemble forecasts are 0 as well. Ensemble-Stat randomly assigns a rank between 1 and 7.

 

Use the ncview utility to view the NetCDF gridded observation rank file:

 

ncview ensemble_stat_20100101_120000V_orank.nc &

This file is only created when you've verified using gridded observations and have requested its output using the output_flag parameter in the configuration file. Click through the variables in this file. Note that for each of the three verification areas (FULLNWC, and SWC) this file contains 4 variables:

  1. The gridded observation value
  2. The observation rank
  3. The probability integral transform
  4. The ensemble valid data count

In ncview, the random assignment of tied ranks is evident in areas of zero precipitation.

Close this file.

Feel free to explore using this dataset. Some options to try are:

  • Try setting skip_const = TRUE; in the config file to discard points where all ensemble members and the observation are tied (i.e. zero precip).  If you want to save it to a different file, make sure you set output_prefix to something meaningful, such as "run2", or "skip-constant".
  • Try setting obs_thresh = [ >0.01 ]; in the config file to only consider points where the observation meets this threshold. How does this differ from the using skip_const?
  • Use wgrib to inventory the input files and add additional entries to the ens.field list. Can you process 10-meter U and V wind?