MET Tool: Stat-Analysis

MET Tool: Stat-Analysis

Stat-Analysis Tool: General

Stat-Analysis Functionality

The Stat-Analysis tool reads the ASCII output files from the Point-Stat, Grid-Stat, Wavelet-Stat, and Ensemble-Stat tools. It provides a way to filter their STAT data and summarize the statistical information they contain. If you pass it the name of a directory, Stat-Analysis searches that directory recursively and reads any .stat files it finds. Alternatively, if you pass it an explicit file name, it'll read the contents of the file regardless of the suffix, enabling it to the optional _LINE_TYPE.txt files. Stat-Analysis runs one or more analysis jobs on the input data. It can be run by specifying a single analysis job on the command line or multiple analysis jobs using a configuration file. The analysis job types are summarized below:

  • The filter job simply filters out lines from one or more STAT files that meet the filtering options specified.
  • The summary job operates on one column of data from a single STAT line type. It produces summary information for that column of data: mean, standard deviation, min, max, and the 10th, 25th, 50th, 75th, and 90th percentiles.
  • The aggregate job aggregates STAT data across multiple time steps or masking regions. For example, it can be used to sum contingency table data or partial sums across multiple lines of data. The -line_type argument specifies the line type to be summed.
  • The aggregate_stat job also aggregates STAT data, like the aggregate job above, but then derives statistics from that aggregated STAT data. For example, it can be used to sum contingency table data and then write out a line of the corresponding contingency table statistics. The -line_type and -out_line_type arguments are used to specify the conversion type.
  • The ss_index job computes a skill-score index, of which the GO Index (go_index) is a special case. The GO Index is a performance metric used primarily by the United States Air Force.
  • The ramp job processes a time series of data and identifies rapid changes in the forecast and observation values. These forecast and observed ramp events are used populate a 2x2 contingency table from which categorical statistics are derived.

Stat-Analysis Usage

View the usage statement for Stat-Analysis by simply typing the following:

stat_analysis
Usage: stat_analysis  
  -lookin path Space-separated list of input paths where each is a _TYPE.txt file, STAT file, or directory which should be searched recursively for STAT files. Allows the use of wildcards (required).
  [-out filename] Output path or specific filename to which output should be written rather than the screen (optional).
  [-tmp_dir path] Override the default temporary directory to be used (optional).
  [-log file] Outputs log messages to the specified file
  [-v level] Level of logging
  [-config config_file] | [JOB COMMAND LINE] (Note: "|" means "or")
  [-config config_file] STATAnalysis config file containing Stat-Analysis jobs to be run.
  [JOB COMMAND LINE] All the arguments necessary to perform a single Stat-Analysis job. See the MET Users Guide for complete description of options.

At a minimum, you must specify at least one directory or file in which to find STAT data (using the -lookin path command line option) and either a configuration file (using the -config config_file command line option) or a job command on the command line.

cindyhg Tue, 06/25/2019 - 08:36

Configure

Configure

Stat-Analysis Tool: Configure

Start by making an output directory for Stat-Analysis and changing directories:
mkdir -p ${METPLUS_TUTORIAL_DIR}/output/met_output/stat_analysis
cd ${METPLUS_TUTORIAL_DIR}/output/met_output/stat_analysis

The behavior of Stat-Analysis is controlled by the contents of the configuration file or the job command passed to it on the command line. The default Stat-Analysis configuration may be found in the data/config/StatAnalysisConfig_default file.

Copy the default configuration file into your working directory and rename it:
cp ${MET_BUILD_BASE}/share/met/config/STATAnalysisConfig_default STATAnalysisConfig_tutorial
Open up the STATAnalysisConfig_tutorial file for editing with your preferred text editor.
vi STATAnalysisConfig_tutorial

You will see that most options are left blank, so the tool will use whatever it finds or whatever is specified in the command or job line.  If you go down to the jobs[] section you will see a list of the jobs run for the test scripts.

Remove those existing jobs and add the following 2 analysis jobs:
jobs = [
"-job aggregate -line_type CTC -fcst_thresh >273.0 -vx_mask FULL -interp_mthd NEAREST",
"-job aggregate_stat -line_type CTC -out_line_type CTS -fcst_thresh >273.0 -vx_mask FULL -interp_mthd NEAREST"
];

The first job listed above will select out only the contingency table count lines (CTC) where the threshold applied is >273.0 over the FULL masking region. This should result in 2 lines, one for pressure levels P850-500 and one for pressure P1050-850. So this job will be aggregating contingency table counts across vertical levels.

The second job listed above will perform the same aggregation as the first. However, it'll dump out the corresponding contingency table statistics derived from the aggregated counts.

Close the file and run it on the next page.
cindyhg Tue, 06/25/2019 - 08:38

Run on Point-Stat output

Run on Point-Stat output

Stat-Analysis Tool: Run on Point-Stat output

Now, run Stat-Analysis on the command line using the following command:
stat_analysis \
-config STATAnalysisConfig_tutorial \
-lookin ../point_stat \
-v 2

The output for these two jobs are printed to the screen.

Try redirecting their output to a file by adding the -out command line argument:
stat_analysis \
-config STATAnalysisConfig_tutorial \
-lookin ../point_stat \
-v 2 \
-out aggr_ctc_lines.out

The output was written to aggr_ctc_lines.out. We'll look at this file in the next section.

Next, try running the first job again, but entirely on the command line without a configuration file:
stat_analysis \
-lookin ../point_stat \
-v 2 \
-job aggregate \
-line_type CTC \
-fcst_thresh ">273.0" \
-vx_mask FULL \
-interp_mthd NEAREST

Note that we had to put double quotes (") around the forecast theshold string for this to work.

Next, run the same command but add the -dump_row command line option. This will redirect all of the STAT lines used by the job to a file. Also, add the -out_stat command line option. This will write a full STAT output file, including the 22 header columns:
stat_analysis \
-lookin ../point_stat \
-v 2 \
-job aggregate \
-line_type CTC \
-fcst_thresh ">273.0" \
-vx_mask FULL \
-interp_mthd NEAREST \
-dump_row aggr_ctc_job.stat \
-out_stat aggr_ctc_job_out.stat
Open up the file aggr_ctc_job.stat to see the 2 STAT lines used by this job.
vi aggr_ctc_job.stat
Open up the file aggr_ctc_job_out.stat to see the 1 output STAT line. Notice that the FCST_LEV and OBS_LEV columns contain the input strings concatenated together.
vi aggr_ctc_job_out.stat
Try re-running this job using -set_hdr FCST_LEV P1050-500 and -set_hdr OBS_LEV P1050-500. How does that affect the output?
The use of the -dump_row option is highly recommended to ensure that your analysis jobs run on the exact set of data that you intended. It's easy to make mistakes here!
cindyhg Tue, 06/25/2019 - 08:39

Output

Output

Stat-Analysis Tool: Output

On the previous page, we generated the output file aggr_ctc_lines.out by using the -out command line argument.

Open that file using the text editor of your choice, and be sure to turn word-wrapping off.

This file contains the output for the two jobs we ran through the configuration file. The output for each job consists of 3 lines as follows:

  1. The JOB_LIST line contains the job filtering parameters applied for this job.
  2. The COL_NAME line contains the column names for the data to follow in the next line.
  3. The third line consists of the line type generated (CTC and CTS in this case) followed by the values computed for that line type.
Next, try running the Stat-Analysis tool on the output file ../point_stat/point_stat_run2_360000L_20070331_120000V.stat. Start by running the following job:
stat_analysis \
-lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat \
-v 2 \
-job aggregate \
-fcst_var TMP \
-fcst_lev Z2 \
-vx_mask EAST -vx_mask WEST \
-interp_pnts 1 \
-line_type CTC \
-fcst_thresh ">278.0"

This job should aggregate 2 CTC lines for 2-meter temperature across the EAST and WEST regions.

Next, try creating your own Stat-Analysis command line jobs to do the following:
  1. Do the same aggregation as above but for the 5x5 interpolation output (i.e. 25 points instead of 1 point).
  2. Do the aggregation listed in (1) but compute the corresponding contingency table statistics (CTS) line. Hint: you will need to change the job type to aggregate_stat and specify the desired -out_line_type.
    How do the scores change when you increase the number of interpolation points? Did you expect this?
  3. Aggregate the scalar partial sums lines (SL1L2) for 2-meter temperature across the EAST and WEST masking regions.
    How does aggregating the East and West domains affect the output?
  4. Do the aggregation listed in (3) but compute the corresponding continuous statistics (CNT) line. Hint: use the aggregate_stat job type.
  5. Run an aggregate_stat job directly on the matched pair data (MPR lines), and use the -out_line_type command line argument to select the type of output to be generated. You'll likely have to supply additional command line arguments depending on what computation you request.

     

Now answer this question about this Stat-Analysis output:
  1. How do the scores compare to the original (separated by level) scores? What information is gained by aggregating the statistics?
When doing the exercises above, don't forget to use the -dump_row command line option to verify that you're running the job over the STAT lines you intended.
If you get stuck on any of these exercises, you may refer to the exercise answers on the next page. We will return to the Stat-Analysis tool in the future practical sessions.
cindyhg Tue, 06/25/2019 - 09:13

Exercise Answers

Exercise Answers
  1. Job Number 1:
     

    stat_analysis \
    -lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat -v 2 \
    -job aggregate -fcst_var TMP -fcst_lev Z2 -vx_mask EAST -vx_mask WEST -interp_pnts 25 -fcst_thresh ">278.0" \
    -line_type CTC \
    -dump_row job1_ps.stat
  2. Job Number 2:
     

    stat_analysis \
    -lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat -v 2 \
    -job aggregate_stat -fcst_var TMP -fcst_lev Z2 -vx_mask EAST -vx_mask WEST -interp_pnts 25 -fcst_thresh ">278.0" \
    -line_type CTC -out_line_type CTS \
    -dump_row job2_ps.stat
  3. Job Number 3:
     

    stat_analysis \
    -lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat -v 2 \
    -job aggregate -fcst_var TMP -fcst_lev Z2 -vx_mask EAST -vx_mask WEST -interp_pnts 25 \
    -line_type SL1L2 \
    -dump_row job3_ps.stat
  4. Job Number 4:
     

    stat_analysis \
    -lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat -v 2 \
    -job aggregate_stat -fcst_var TMP -fcst_lev Z2 -vx_mask EAST -vx_mask WEST -interp_pnts 25 \
    -line_type SL1L2 -out_line_type CNT \
    -dump_row job4_ps.stat
  5. This MPR job recomputes contingency table statistics for 2-meter temperature over G212 using a new threshold of ">=285":
     

    stat_analysis \
    -lookin ../point_stat/point_stat_run2_360000L_20070331_120000V.stat -v 2 \
    -job aggregate_stat -fcst_var TMP -fcst_lev Z2 -vx_mask G212 -interp_pnts 25 \
    -line_type MPR -out_line_type CTS \
    -out_fcst_thresh ge285 -out_obs_thresh ge285 \
    -dump_row job5_ps.stat
johnhg Thu, 07/25/2019 - 22:07