General

General griggs Wed, 04/24/2019 - 15:21

Stat-Analysis Functionality

The Stat-Analysis tool ties together results from the MET statistics tools by providing a way to filter their STAT output files and aggregate the results through time and/or space. The Stat-Analysis tool processes the STAT output of the Point-Stat, Grid-Stat, Wavelet-Stat, and Ensemble-Stat tools and performs one or more analysis jobs on the data. The Stat-Analysis tool may be run by specifying a single analysis job on the command line or multiple analysis jobs using a configuration file. The analysis job types are summarized below:

  • The filter job simply filters out STAT lines from STAT files that meet the filtering options specified.
  • The summary job operates on one column of data from a single STAT line type. It produces summary information for that column of data: mean, standard deviation, min, max, and the 10th, 25th, 50th, 75th, and 90th percentiles. For example, it can be used to look at the summary of a statistic like RMSE across many cases in time.
  • The aggregate job operates on a single line type and aggregates the STAT data in the lines which meet the filtering criteria. It dumps out a line containing the aggregated data. For example, it can be used to sum contingency table counts (CTC) across many cases and dump out the aggregated counts (CTC). The input line type is the same as the output line type.
  • The aggregate_stat job performs almost the same function as the aggregate job, but the output line type differs from the input line type. For example, it can be used to aggregate contingency table counts (CTC) across many cases and dump out statistics generated from the aggregated contingency table (CTS).
  • The go_index job computes the GO Index, a performance metric used primarily by the United States Air Force. The GO Index is a specific application of the more general Skill Score Index (ss_index) job type which is the weighted mean of skill scores computed for a user-defined set of variables, levels, lead times, and statistics.
  • The ramp job operates on a time-series of forecast and observed values and is analogous to the RIRW (Rapid Intensification and Weakening) job supported by the tc_stat tool. The amount of change from one time to the next is computed for forecast and observed values. Those changes are thresholded to define events which used to populate a 2x2 contingency table.

The Stat-Analysis Tool really performs two main steps:

  1. Filter the input STAT lines using the filtering parameters set in the configuration file and/or on the job command line and write the results to a temporary file.
  2. For each analysis job, read filtered data from the temporary file and perform the job.

When processing a large amount of data with STAT-Analysis, grouping similar jobs into a configuration file is more efficient than running them separately on the command line.

Stat-Analysis Usage

View the usage statemet for Stat-Analysis by simply typing the following:

stat_analysis

At a minimum, you must specify at least one directory or file in which to find STAT data (using the -lookin path command line option) and either a configuration file (using the -config config_file command line option) or a job command on the command line.

When -lookin is set to one or more explicit file names, STAT-Analysis reads them regardless of their suffix. When -lookin is set to a directory, STAT-Analysis searches it recursively for files with the .stat suffix.

The more data you pass to STAT-Analysis, the longer it will take to run. When possible, users should limit the input data to what is required to perform the desired analysis.