TC-Stat

TC-Stat griggs Wed, 04/24/2019 - 16:36

TC-Stat Functionality

The TC-Stat tool filters the output of TC-Pairs based upon search parameters and preforms analysis jobs on the filtered subset of data. TC-Stat is similar to the STAT-Analysis tool in but operates on .tcst files rather than .stat files. It writes summary statistics to ASCII output.

Like STAT-Analysis, TC-Stat supports multiple analysis job types, described below:

  • The filter job simply filters out TCMPR lines and tracks which meet the filtering options specified.
  • The summary job operates on one or more columns of data and produces summary information: mean, standard deviation, min, max, and the 10th, 25th, 50th, 75th, and 90th percentiles. When passed a time series of data, the time series independence is assessed. It can also be used to asses the frequency of superior performance. The data to be summarized may be defined as a named column, the difference between two named columns, or the absolute value of a column or difference of columns.
  • The rirw job looks for the occurence of rapid intensification or weakening in the ADeck and BDeck tracks. Those events are used to populate a 2x2 contingency table from which statistics are derived. The RI/RW event definition and matching logic are highly configurable.

TC-Stat Usage

View the usage statement for TC-Stat by simply typing the following:

tc_stat

At a minimum, you must specify at least on directory of file in which to find TCST data (using the -lookin command line option) and either a configuration file (using the -config command line option) or a single job command on the command line.

When -lookin is set to one or more explicit file names, TC-Stat reads them regardless of their suffix. When -lookin is set to a directory, TC-Stat searches it recursively for files with the .tcst suffix.

Run

Run griggs Wed, 04/24/2019 - 16:38

Like the STAT-Analysis tool, TC-Stat may be run with or without a configuration file. When running multiple analysis jobs over the same subset of data using a configuration file is most effiecient. However, when running simple jobs to quickly explore your data, using the command line is more convenient. For this tutorial, we'll run command line jobs.

Filter Job

The TC-Stat filter job subsets your data and writes that subset to an output file. TC-Stat supports the following types of filtering:

  • Header string columns using -amodel, -bmodel, -storm_id, -basin, -cyclone, and -storm_name. Multiple values may be specified as a comma-separated list or using multiple switches. When multiple values are specified, the output will contain their union.
  • Timing information using -init_beg, -init_end, -init_inc, -init_exc, -init_hour, similar switches for valid times, and -lead for the lead time.
  • The -init_mask, -valid_mask, and -track_watch_warn options filter by the corresponding data columns.
  • The -column_thresh option specifis the name of the column followed by a threshold to apply (e.g. -column_thresh TK_ERR gt10).
  • The -column_str option specifis the name of the column followed by a list of one or more strings to match (e.g. -column_str LEVEL HU,TS).
  • The -init_thresh and -init_str options work the same way but are only applied to the initial forecast track point (i.e. LEAD equals 0).
  • The -water_only option excludes any points where the distance to land is <= 0.
  • The -rirw and -landfall options subset the tracks down to the RIRW or landfall points, respectively.
  • The -event_equal and -event_equal_lead options control the logic for subsetting down to a homogenous sample.
  • The -out_init_mask and -out_valid_mask options define lat/lon polyline regions. The initial forecast track point (i.e. LEAD equals 0) must fall within the -out_init_mask while the entire track must fall within -out_valid_mask.
  • If your TC-Pairs output includes all track points, the -match_points options subsets tracks down to common times.

Next, run the following jobs:

  • Select data for the official forecast only (e.g. AMODEL equals OFCL). After you run the job, inspect the output file:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job filter -dump_row $MET_TUTORIAL_DATA/output/tc_stat/OFCL_sandy.tcst \
-amodel OFCL
  • The input TCST file includes tracks for 00, 06, 12, and 18Z initializations. Select only initialization hour 00 and inspect the output file:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job filter -dump_row $MET_TUTORIAL_DATA/output/tc_stat/INIT_sandy.tcst \
-init_hour 00
  • Select hurricane strength lines where the track error exceeds 150 nm and inspect the output file:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job filter -dump_row $MET_TUTORIAL_DATA/output/tc_stat/TKERR_sandy.tcst \
-column_str LEVEL HU \
-column_thresh TK_ERR gt150

Summary Job

Next, we'll run some summary jobs, applying additional filtering criteria as well. Just like the STAT-Analysis tool, TC-Stat supports the -by job command option which is a very convenient way of running the same job over multiple subsets of data:

  • Summarize all of the track (TK_ERR) and intensity (AMAX_WIND-BMAX_WIND) error values:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job summary \
-column TK_ERR -column AMAX_WIND-BMAX_WIND
  • Now use the -by option to run the same job for each unique combination of model name (AMODEL) and lead time (LEAD):
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job summary -by AMODEL,LEAD \
-column TK_ERR -column AMAX_WIND-BMAX_WIND
  • That's a lot of output, but we could filter it down using the -lead option to select particular lead times of interest:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job summary -by AMODEL,LEAD -lead 00,24,48,72 \
-column TK_ERR -column AMAX_WIND-BMAX_WIND
  • Run that same job one more time but use event equalization to compare three specific models (OFCL, OCD5, and HWRF) over a homogenous set of cases:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job summary -by AMODEL,LEAD -lead 00,24,48,72 \
-amodel OFCL,OCD5,HWRF -event_equal TRUE \
-column TK_ERR -column AMAX_WIND-BMAX_WIND

Notice that the counts (TOTAL column) are now constant across all models for each lead time.

By default, TC-Stat writes its job output to the screen but it can easily be redirected to a file using the -out option.

RIRW Job

Next, we'll run a sample Rapid Intensification job:

  • Run job for each unique model using all the default settings:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job rirw -by AMODEL
  • By default, TC-Stat dumps the contingency table counts (RIRW_CTC) and contingency table statistics (RIRW_CTS). Notice that there are differing counts in the TOTAL column. Let's rerun but turn off the RIRW_CTS output, and event equalize 3 models:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job rirw -by AMODEL -amodel OFCL,OCD5,HWRF -event_equal TRUE \
-out_line_type CTC
  • Notice that the TOTAL column remains constant meaning that event equalization worked as expected. By default, rapid intensification is defined an increase of 30 kts in exactly 24 hours which is a rather rare event. Let's try changing that to be a 20 kts maximum increase in 24 hours:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job rirw -by AMODEL -amodel OFCL,OCD5,HWRF -event_equal TRUE \
-out_line_type CTC -rirw_exact FALSE -rirw_thresh ge20
  • When populating the contingency table, we only get a hit when the rapid intensification occurs at exactly the same time in both tracks. But how do the scores change if we only require that the events be within 12 hours of eachother for a hit?
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job rirw -by AMODEL -amodel OFCL,OCD5,HWRF -event_equal TRUE \
-out_line_type CTC -rirw_exact FALSE -rirw_thresh ge20 -rirw_window 12
  • Lastly, rerun but write all possible line types (CTC, CTS, and MPR) to an output file:
tc_stat \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-job rirw -by AMODEL -amodel OFCL,OCD5,HWRF -event_equal TRUE \
-out_line_type CTC,CTS,MPR -rirw_exact FALSE -rirw_thresh ge20 -rirw_window 12 \
-out $MET_TUTORIAL_DATA/output/tc_stat/RIRW_sandy.txt

Open the output $MET_TUTORIAL_DATA/output/tc_stat/RIRW_sandy.txt file and inspect the results.

Output

Output griggs Wed, 04/24/2019 - 16:40

By default, TC-Stat writes its job output to the screen but can easily be redirected to a file using the -out option. The -out option is specific to each job. When run on the command line, TC-Stat executes a single job. Therefore using the -out option once is sufficient. When run with a configuration file, the -out option should be specified once for each job defined in the jobs array.

The MET tarball also includes an Rscript which automates calls to TC-Stat and the generation of graphics. The scripts/Rscripts/plot_tcmpr.R Rscript performs two main steps: call TC-Stat to filter track data and create plots of the filtered track data. If Rscript or the R boot package are not available on your system, you may skip this step.

Plotting TCMPR lines

The plot_tcmpr.R Rscript requires that the MET_BUILD_BASE environment variable be set to the top-level MET source code directory and that TC-Stat be found in your path. Run the following commands based on whether you are using C-Shell or the Bourne Shell:

  • Determine your shell:
echo $SHELL
  • For C-Shell:
setenv MET_BUILD_BASE `pwd`
setenv PATH `pwd`/bin:$PATH
  • For Bourne Shell:
export MET_BUILD_BASE=`pwd`
export PATH=`pwd`/bin:$PATH

Listed below are several examples of running this Rscript. The output printed to the screen lists the name of the output files. You may display them using a web browser, the display command, or any other graphics program.

  • Run the Rscript with no arguments to see the usage:
Rscript $RSCRIPTS_BASE/plot_tcmpr.R
  • Specify an input file and output directory:
Rscript $RSCRIPTS_BASE/plot_tcmpr.R \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-outdir $MET_TUTORIAL_DATA/output/tc_pairs
  • By default, the script creates boxplots of event equalized track errors (TK_ERR). Display the output image:
display $MET_TUTORIAL_DATA/output/tc_pairs/TK_ERR_boxplot.png
  • Turn off event equalization, plot only 3 models (OFCL, OCD5, and HWRF), and create several plot types for track and intensity errors:
Rscript $RSCRIPTS_BASE/plot_tcmpr.R \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-outdir $MET_TUTORIAL_DATA/output/tc_pairs \
-no_ee \
-filter '-amodel OFCL,OCD5,HWRF' \
-dep TK_ERR,ABS\(AMAX_WIND-BMAX_WIND\) \
-plot BOXPLOT,MEAN,MEDIAN

The -no_ee option disables event equalization. The -filter options are passed directly to the TC-Stat filtering job. The -dep options specifies the column(s) of data to be plotted. The -plot specifies the desired plot types. Display the output images.

  • Plot HWRF track errors broken down by storm intensity level:
Rscript $RSCRIPTS_BASE/plot_tcmpr.R \
-lookin $MET_TUTORIAL_DATA/output/tc_pairs/tc_pairs_sandy.tcst \
-outdir $MET_TUTORIAL_DATA/output/tc_pairs \
-no_ee \
-filter '-amodel HWRF' \
-dep TK_ERR \
-plot MEAN \
-series LEVEL \
-title "HWRF Mean Track Error for Sandy by Intensity Level"

The -series option overrides the default plotting by the AMODEL column. Use -title to override the default plot title. Display the output image:

display $MET_TUTORIAL_DATA/output/tc_pairs/TK_ERR_mean.png

These examples are meant as a brief introduction to this plotting script. We encourage you to run it many more times and try plotting different columns of data or pass different filtering options.