MET Tool: Series-Analysis

Series-Analysis Tool: General

Series-Analysis Functionality

The Series-Analysis Tool accumulates statistics separately for each horizontal grid location over a series. Usually, the series is defined as a time series, however any type of series is possible, including a series of vertical levels. This differs from the Grid-Stat tool in that Grid-Stat computes statistics aggregated over a spatial masking region at a single point in time. The Series-Analysis Tool computes statistics for each individual grid point and can be used to quantify how the model performance varies over the domain.

Series-Analysis Usage

View the usage statement for Series-Analysis by simply typing the following:

series_analysis

Usage: series_analysis
	-fcst file_1 ... file_n	Gridded forecast files or ASCII file containing a list of file names.
	-obs file_1 ... file_n	Gridded observation files or ASCII file containing a list of file names.
	[-both file_1 ... file_n]	Sets the -fcst and -obs options to the same list of files (e.g. the NetCDF matched pairs files from Grid-Stat).
	[-paired]	Indicates that the -fcst and -obs file lists are already matched up (i.e. the n-th forecast file matches the n-th observation file).
	-out file	NetCDF output file name for the computed statistics.
	-config file	SeriesAnalysisConfig file containing the desired configuration settings.
	[-log file]	Outputs log messages to the specified file
	[-v level]	Level of logging (optional).
	[-compress level]	NetCDF compression level (optional).

At a minimum, the -fcst, -obs (or -both), -out, and -config settings must be passed in on the command line. All forecast and observation fields must be interpolated to a common grid prior to running Series-Analysis.

cindyhg Mon, 06/24/2019 - 14:28

Configure

Series-Analysis Tool: Configure

Start by making an output directory for Series-Analysis and changing directories:

mkdir -p ${METPLUS_TUTORIAL_DIR}/output/met_output/series_analysis

cd ${METPLUS_TUTORIAL_DIR}/output/met_output/series_analysis

The behavior of Series-Analysis is controlled by the contents of the configuration file passed to it on the command line. The default Series-Analysis configuration file may be found in the data/config/SeriesAnalysisConfig_default file.

Prior to modifying the configuration file, users are advised to make a copy of the default:

cp ${MET_BUILD_BASE}/share/met/config/SeriesAnalysisConfig_default SeriesAnalysisConfig_tutorial

The configurable items for Series-Analysis are used to specify how the verification is to be performed. The configurable items include specifications for the following:

The forecast fields to be verified at the specified vertical level or accumulation interval.
The threshold values to be applied.
The area over which to limit the computation of statistics - as predefined grids or configurable lat/lon polylines.
The confidence interval methods to be used.
The smoothing methods to be applied.
The types of statistics to be computed.

You may find a complete description of the configurable items in the series_analysis configuration file section of the MET User's Guide. Please take some time to review them.

For this tutorial, we'll run Series-Analysis to verify a time series of 3-hour accumulated precipitation. We'll use GRIB1 for the forecast files and NetCDF for the observation files. Since the forecast and observations are different file formats, we'll specify the name and level information for them slightly differently.

Open up the SeriesAnalysisConfig_tutorial file for editing with your preferred text editor and edit it as follows:

vi SeriesAnalysisConfig_tutorial

Set the fcst dictionary to

fcst = {
   field = [
      {
        name = "APCP";
        level = [ "A3" ];
      }
   ];
}

To request the GRIB abbreviation for precipitation (APCP) accumulated over 3 hours (A3).
Delete obs = fcst; and insert

obs = {
   field = [
      {
        name = "APCP_03";
        level = [ "(*,*)" ];
      }
   ];
}

To request the NetCDF variable named APCP_03 where its two dimensions are the gridded dimensions (*,*).
Look up a few lines above the fcst dictionary and set

cat_thresh = [ >0.0, >=5.0 ];

To define the categorical thresholds of interest. By defining this at the top level of config file context, these thresholds will be applied to both the fcst and obs settings.
In the mask dictionary, set

grid = "G212";

To limit the computation of statistics to only those grid points falling inside the NCEP Grid 212 domain.
Set

block_size = 10000;

To process 10,000 grid points in each pass through the data. Setting block_size larger should make the tool run faster but use more memory.
In the output_stats dictionary, set

   fho    = [ "F_RATE", "O_RATE" ];
   ctc    = [ "FY_OY", "FN_ON" ];
   cts    = [ "CSI", "GSS" ];
   mctc   = [];
   mcts   = [];
   cnt    = [ "TOTAL", "RMSE" ];
   sl1l2 = [];
   pct    = [];
   pstd   = [];
   pjc    = [];
   prc    = [];

For each line type, you can select statistics to be computed at each grid point over the series. These are the column names from those line types. Here, we select the forecast rate (FHO: F_RATE), observation rate (FHO: O_RATE), number of forecast yes and observation yes (CTC: FY_OY), number of forecast no and observation no (CTC: FN_ON), critical success index (CTS: CSI), and the Gilbert Skill Score (CTS: GSS) for each threshold, along with the root mean squared error (CNT: RMSE).

Save and close this file.

johnhg Thu, 07/25/2019 - 23:03

Run

Series-Analysis Tool: Run

First, we need to prepare our observations by putting 1-hourly StageII precipitation forecasts into 3-hourly buckets. Create an output directory:

mkdir -p sample_obs/ST2ml_3h

Run the following PCP-Combine commands to prepare the observations:

pcp_combine -sum 00000000_000000 01 20050807_030000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080703V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_060000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080706V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_090000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080709V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_120000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080712V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_150000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080715V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_180000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080718V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050807_210000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080721V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

pcp_combine -sum 00000000_000000 01 20050808_000000 03 \

sample_obs/ST2ml_3h/sample_obs_2005080800V_03A.nc \

-pcpdir ${METPLUS_DATA}/met_test/data/sample_obs/ST2ml

Note that the previous set of PCP-Combine commands could easily be run by looping through times in METplus Wrappers! The MET tools are often run using METplus Wrappers rather than typing individual commands by hand. You'll learn more about automation using the METplus Wrappers throughout the tutorial.

Next, we'll run Series-Analysis using the following command:

series_analysis \

-fcst ${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_03.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_06.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_09.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_12.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_15.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_18.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_21.tm00_G212 \

${METPLUS_DATA}/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_24.tm00_G212 \

-obs sample_obs/ST2ml_3h/sample_obs_2005080703V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080706V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080709V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080712V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080715V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080718V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080721V_03A.nc \

sample_obs/ST2ml_3h/sample_obs_2005080800V_03A.nc \

-out series_analysis_2005080700_2005080800_3A.nc \

-config SeriesAnalysisConfig_tutorial \

-v 2

The statistics we requested in the configuration file will be computed separately for each grid location and accumulated over a time series of eight three-hour accumulations over a 24-hour period. Each grid point will have up to 8 matched pair values.

Note how long this command line is. Imagine how long it would be for a series of 100 files! Instead of listing all of the input files on the command line, you can list them in an ASCII file and pass that to Series-Analysis using the -fcst and -obs options.

johnhg Fri, 07/26/2019 - 11:44

Output

Series-Analysis Tool: Output

The output of Series-Analysis is one NetCDF file containing the requested output statistics for each grid location on the same grid as the input files.

You may view the output NetCDF file that Series-Analysis wrote using the ncdump utility. Run the following command to view the header of the NetCDF output file:

ncdump -h series_analysis_2005080700_2005080800_3A.nc

In the NetCDF header, we see that the file contains many arrays of data. For each threshold (>0.0 and >=5.0), there are values for the requested statistics: F_RATE, O_RATE, FY_OY, FN_ON, CSI, and GSS. The file also contains the requested RMSE and TOTAL number of matched pairs for each grid location over the 24-hour period.

Next, run the ncview utility to display the contents of the NetCDF output file:

ncview series_analysis_2005080700_2005080800_3A.nc &

Click through the different variables to see how the performance varies over the domain. Looking at the series_cnt_RMSEvariable, are the errors larger in the south eastern or north western regions of the United States?

Why does the extent of missing data increase for CSI for the higher threshold? Compare series_cts_CSI_gt0.0 to series_cts_CSI_ge5.0. (Hint: Find the definition of Critical Success index (CSI) in the MET User's Guide and look closely at the denominator.)

Try running Plot-Data-Plane to visualize the observation rate variable for non-zero precipitation (i.e. series_fho_O_RATE_gt0.0). Since the valid range of values for this data is 0 to 1, use that to set the -plot_range option.

Setting block_size to 10000 still required 3 passes through our 185x129 grid (= 23865 grid points). What happens when you increase block_size to 24000 and re-run? Does it run slower or faster?

johnhg Fri, 07/26/2019 - 14:29