Series-Analysis

Series-Analysis griggs Wed, 04/24/2019 - 15:18

Series-Analysis Functionality

The Series-Analysis tool accumulates statistics separately for each horizontal grid location over a series. Often, the series is defined over time or height, though any type of series is possible. This differs from the Grid-Stat tool which computes statistics aggregated over spatial masking regions. The Series-Analysis tool computes statistics for each individual grid point and can be used to quantify how model performance varies over the domain.

Series-Analysis Usage

View the usage statement for Series-Analysis by simply typing the following:

series_analysis

At a minimum, the input gridded fcst file(s) or ASCII file containing the list of file names to be used, the input gridded obs file(s) or ASCII file containing the list of file names to be used, the NetCDF out file containing the computed statistics, and the configuration config file containing the desired settings must be passed in on the command line.

As with the other MET statistics tools, all gridded forecast and observation data must be interpolated to a common grid prior to processing. This may be done using the automated regrid feature in the Series-Analysis configuration file or by running copygb and/or wgrib2 first.

General

General griggs Wed, 04/24/2019 - 15:21

Stat-Analysis Functionality

The Stat-Analysis tool ties together results from the MET statistics tools by providing a way to filter their STAT output files and aggregate the results through time and/or space. The Stat-Analysis tool processes the STAT output of the Point-Stat, Grid-Stat, Wavelet-Stat, and Ensemble-Stat tools and performs one or more analysis jobs on the data. The Stat-Analysis tool may be run by specifying a single analysis job on the command line or multiple analysis jobs using a configuration file. The analysis job types are summarized below:

  • The filter job simply filters out STAT lines from STAT files that meet the filtering options specified.
  • The summary job operates on one column of data from a single STAT line type. It produces summary information for that column of data: mean, standard deviation, min, max, and the 10th, 25th, 50th, 75th, and 90th percentiles. For example, it can be used to look at the summary of a statistic like RMSE across many cases in time.
  • The aggregate job operates on a single line type and aggregates the STAT data in the lines which meet the filtering criteria. It dumps out a line containing the aggregated data. For example, it can be used to sum contingency table counts (CTC) across many cases and dump out the aggregated counts (CTC). The input line type is the same as the output line type.
  • The aggregate_stat job performs almost the same function as the aggregate job, but the output line type differs from the input line type. For example, it can be used to aggregate contingency table counts (CTC) across many cases and dump out statistics generated from the aggregated contingency table (CTS).
  • The go_index job computes the GO Index, a performance metric used primarily by the United States Air Force. The GO Index is a specific application of the more general Skill Score Index (ss_index) job type which is the weighted mean of skill scores computed for a user-defined set of variables, levels, lead times, and statistics.
  • The ramp job operates on a time-series of forecast and observed values and is analogous to the RIRW (Rapid Intensification and Weakening) job supported by the tc_stat tool. The amount of change from one time to the next is computed for forecast and observed values. Those changes are thresholded to define events which used to populate a 2x2 contingency table.

The Stat-Analysis Tool really performs two main steps:

  1. Filter the input STAT lines using the filtering parameters set in the configuration file and/or on the job command line and write the results to a temporary file.
  2. For each analysis job, read filtered data from the temporary file and perform the job.

When processing a large amount of data with STAT-Analysis, grouping similar jobs into a configuration file is more efficient than running them separately on the command line.

Stat-Analysis Usage

View the usage statemet for Stat-Analysis by simply typing the following:

stat_analysis

At a minimum, you must specify at least one directory or file in which to find STAT data (using the -lookin path command line option) and either a configuration file (using the -config config_file command line option) or a job command on the command line.

When -lookin is set to one or more explicit file names, STAT-Analysis reads them regardless of their suffix. When -lookin is set to a directory, STAT-Analysis searches it recursively for files with the .stat suffix.

The more data you pass to STAT-Analysis, the longer it will take to run. When possible, users should limit the input data to what is required to perform the desired analysis.

Configure

Configure griggs Wed, 04/24/2019 - 15:23

The behavior of Series-Analysis is controlled by the contents of the configuration file passed to it on the command line. The default Series-Analysis configuration file may be found in the $MET_BASE/config/SeriesAnalysisConfig_default file. Prior to modifying the configuration file, users are advised to make a copy of the default:

cp $MET_BASE/config/SeriesAnalysisConfig_default $MET_TUTORIAL_DATA/config/SeriesAnalysisConfig_tutorial

The configurable items for Series-Analysis are used to specify how the verification is to be performed. The configurable items include specifications for the following:

  • The verification domain.
  • The forecast fields to be verified at the specified vertical level or accumulation interval.
  • The threshold values to be applied.
  • The area over which to limit the computation of statistics - as predefined grids or configurable lat/lon polylines.
  • The confidence interval methods to be used.
  • The smoothing methods to be applied.
  • The types of statistics to be computed.

You may find a complete description of the configurable items in the MET Users Guide or in the $MET_BASE/config/README file. Please take some time to review them.

For this tutorial, we'll run Series-Analysis to verify a time series of 3-hour accumulated precipitation. We'll use GRIB1 for the forecast files and NetCDF for the observation files.

Open up the $MET_TUTORIAL_DATA/config/SeriesAnalysisConfig_tutorial file for editing with your preferred text editor and edit it as follows:

  • Set
      cat_thresh  = [ >0.0, >=5.0 ];
  • Change obs = fcst; to
    obs = {
       field = [
          {
            name  = "APCP_03";
            level = [ "A3" ];
          }
       ];
    };
  • In the mask dictionary, set grid = "G212";
    To limit the computation of statistics to the NCEP Grid 212 domain.
  • In the output_stats dictionary, set
       fho    = [ "F_RATE", "O_RATE" ];
       ctc    = [ "FY_OY", "FN_ON" ];
       cts    = [ "CSI", "GSS" ];
       mctc   = [];
       mcts   = [];
       cnt    = [ "RMSE" ];
       sl1l2  = [];
       sal1l2 = [];
       pct    = [];
       pstd   = [];
       pjc    = [];
       prc    = [];

    To indicate that the forecast rate (FHO: F_RATE), observation rate (FHO: O_RATE), number of forecast yes and observation yes (CTC: FY_OY), number of forecast no and observation no (CTC: FN_ON), critical success index (CTS: CSI), and the Gilbert Skill Score (CTS: GSS) should be output for each threshold, along with the root mean squared error (CNT: RMSE).

Save and close this file.

The configurable block_size specifies the number of grid points to be processed concurrently. The total number of data points is determined by the number of grid points (Nx * Ny) and the number of series entries. Setting block_size too high will consume too much memory and make the tool run slowly. Setting it too low will require many passes through the data and make the tool run slowly. Its default value of 1024 is pretty low for most newer machines.

 Consider increasing the default block_size to speed up the Series-Analysis tool

Run

Run griggs Wed, 04/24/2019 - 15:25

Next, we'll run Series-Analysis on the command line using the following command:

series_analysis \
-fcst $MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_03.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_06.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_09.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_12.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_15.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_18.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_21.tm00_G212 \
$MET_TUTORIAL_DATA/input/sample_fcst/2005080700/wrfprs_ruc13_24.tm00_G212 \
-obs $MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080703V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080706V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080709V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080712V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080715V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080718V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080721V_03A.nc \
$MET_TUTORIAL_DATA/input/sample_obs/ST2ml_3h/sample_obs_2005080724V_03A.nc \
-out $MET_TUTORIAL_DATA/output/series_analysis/series_analysis_2005080700_2005080800_3A.nc \
-config $MET_TUTORIAL_DATA/config/SeriesAnalysisConfig_tutorial \
-v 2

These statistics will be produced separately for each grid location and accumulated over a time series, eight three-hour accumulations over a 24-hour period.

Output

Output griggs Wed, 04/24/2019 - 15:28

The output of Series-Analysis is one NetCDF file containing the requested output statistics for each grid location on the same grid as the input files.

You may view the output NetCDF file that Series-Analysis wrote using the ncdump utility (if available on your machine). Run the following command to view the header of the NetCDF output file:

ncdump -h $MET_TUTORIAL_DATA/output/series_analysis/series_analysis_2005080700_2005080800_3A.nc

In the NetCDF header, we see that the file contains 13 arrays of data. For each threshold (>0.0 and >=5.0), there are values for the requested statistics: F_RATE, O_RATE, FY_OY, FN_ON, CSI, and GSS. The file also produced the requested RMSE for each grid location over the 24-hour period.

Next, run the ncview utility to display the contents of the NetCDF output file:

ncview $MET_TUTORIAL_DATA/output/series_analysis/series_analysis_2005080700_2005080800_3A.nc

Click through the different variables to see how the performance varies over the domain. Looking at the series_cnt_RMSE variable, are the errors larger in the south eastern or north western regions of the United States?