GSI Tools

GSI Tools griggs Thu, 04/25/2019 - 16:18

GSI Tools Functionality

The Gridpoint Statistical Interpolation (GSI) system is a variational data assimilation system often run with the WRF model. GSI processes several observation data types and matches them to the model grid. GSI has the option of generating binary diagnostic files which contain information about that paired data. The GSI Tools in MET were designed to read those binary diagnostic files and dump their contents to ASCII files formatted so they can be further processed by the STAT-Analysis tool.

The GSI Tools are able to read conventional and radiance diagnostic files. Support for additional data types will be added in future releases. There are two GSI Tools: GSID2MPR and GSIDENS2ORANK. The GSID2MPR tool reads one or more GSI diagnostic file(s) and writes output in the MET matched pair (MPR) line type. It writes several extra columns of data to the end of the MPR lines based on input data type. Run this tool to create MPR data and pass it to the STAT-Analysis tool to compute additional statistics.

The GSIDENS2ORANK tool reads an ensemble of GSI diagnostic files and writes output in the MET observation rank (ORANK) line type. It writes several extra columns of data to the end of the ORANK lines based on input data type. Run this tool to create ensemble ORANK data and pass it to the STAT-Analysis tool to compute additional statistics.

GSI Tools Usage

View the usage statement for GSI Tools by simply typing the following:

gsid2mpr
gsidens2orank

You must pass one or more GSI diagnostic file names to GSID2MPR. For each input file, it writes an output file to the -outdir directory using the .stat suffix or the string specified by the -suffix option.

You must pass the GSI diagnostic file names for all ensemble members to GSIDENS2ORANK and the output file name using the -out option. Use -ens_mean to specify the ensemble mean data or the mean for each observation location will be computed on the fly from the member values.

Run

Run griggs Thu, 04/25/2019 - 16:20

Next, we'll run GSID2MPR on the command line using the following command:

gsid2mpr \
input/gsi_data/diag_conv_ges.mem* \
-outdir $MET_TUTORIAL_DATA/output/gsid2mpr \
-swap

Here, we've reformatted five conventional GSI diagnostic files. The -swap option indicates that the Endianness of the data must be switched. For each input file, a .stat output file is written to the directory specified by the -outdir option. Notice in the log messages that it checks for and skips duplicate entries in the data. The -no_check_dup option disables the checking for duplicate data. Next, run a similar command but for satellite radiance data:

gsid2mpr \
input/gsi_data/diag_amsua_n18_ges.mem* \
-outdir $MET_TUTORIAL_DATA/output/gsid2mpr \
-swap

Next, run very similar commands using the GSIDENS2ORANK tool to process the conventional and satellite radiance data as a 5-member ensemble, rather than as indivual files. This generates ORANK output lines instead of MPR lines:

gsidens2orank \
input/gsi_data/diag_conv_ges.mem* \
-out $MET_TUTORIAL_DATA/output/gsidens2orank/diag_conv_ORANK.stat \
-swap
gsidens2orank \
input/gsi_data/diag_amsua_n18_ges.mem* \
-out $MET_TUTORIAL_DATA/output/gsidens2orank/diag_amsua_ORANK.stat \

Output

Output griggs Thu, 04/25/2019 - 16:22

Open up and examine the following output files we just generated:

  • Conventional MPR data:
$MET_TUTORIAL_DATA/output/gsid2mpr/diag_conv_ges.mem001.stat
  • Radiance MPR data:
$MET_TUTORIAL_DATA/output/gsid2mpr/diag_amsua_n18_ges.mem001.stat
  • Conventional ORANK data:
$MET_TUTORIAL_DATA/output/gsidens2orank/diag_conv_ORANK.stat
  • Radiance ORANK data:
$MET_TUTORIAL_DATA/output/gsidens2orank/diag_amsua_ORANK.stat

Look at the header lines for the GSI2MPR output files and notice that the conventional and radiance headers are the same through the OBS_QC column. However, the remaining columns differ and are specific to the data type. The GSIDENS2ORANK output files include the standard ORANK header columns followed by extra columns specific to the data type. Next, we'll run STAT-Analysis aggregate_stat jobs to derive statistics from this data.

GSID2MPR Statistics

Run the following STAT-Analysis jobs to compute statistics using the GSID2MPR output:

  • Read MPR lines and compute continuous statistics for each variable present. Write the output to the specified .stat file:
stat_analysis \
-lookin $MET_TUTORIAL_DATA/output/gsid2mpr/diag_conv_ges.mem001.stat \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-by FCST_VAR -out_stat $MET_TUTORIAL_DATA/output/gsid2mpr/conv.mem001_CNT.stat
  • Open up $MET_TUTORIAL_DATA/output/gsid2mpr/conv.mem001_CNT.stat and notice that multiple observation type values are written as a comma-separated list in the OBTYPE. Rerun this command by use the -set_hdr option to define the output for that column:
stat_analysis \
-lookin $MET_TUTORIAL_DATA/output/gsid2mpr/diag_conv_ges.mem001.stat \
-job aggregate_stat -line_type MPR -out_line_type CNT \
-by FCST_VAR -out_stat $MET_TUTORIAL_DATA/output/gsid2mpr/conv.mem001_CNT.stat \
-set_hdr OBTYPE ALL
  • The OBTYPE column in the output should now be set to ALL. With the right set of options, STAT-Analysis may be used to filter this data in any way you would like and derive many different types of statistics. For example, process only temperature data (-fcst_var t), using only the pairs that were actually assimilated (-column_thresh ANLY_USE eq1), and threshold them to define a contingency table (-out_line_type CTC -out_thresh ge273):
stat_analysis \
-lookin $MET_TUTORIAL_DATA/output/gsid2mpr/diag_conv_ges.mem001.stat \
-job aggregate_stat -line_type MPR -out_line_type CTC \
-fcst_var t -out_thresh ge273 -column_thresh ANLY_USE eq1 \
-out_stat $MET_TUTORIAL_DATA/output/gsid2mpr/conv.mem001_TMP_CTC.stat \
-set_hdr OBTYPE ALL

GSIDENS2ORANK Statistics

As we saw above, STAT-Analysis can read MPR lines and derive a variety of output line types, such as SL1L2, CNT, CTC, CTS, MCTC, MCTS, and so on). Similarly, STAT-Analysis can read the ORANK lines generated by GSIDENS2ORANK and derive ensemble output line types, such as RHIST, PHIST, and SSVAR. Run the following STAT-Analysis jobs to compute statistics using the GSIDENS2ORNK output:

  • Read radiance ORANK lines and compute ranked histograms for each variable present and write the output to the screen:
stat_analysis \
-lookin $MET_TUTORIAL_DATA/output/gsidens2orank/diag_amsua_ORANK.stat \
-job aggregate_stat -line_type ORANK -out_line_type RHIST \
-by FCST_VAR
  • Now process conventional ORANK lines and derive PHIST output with a 0.10 bin size for each unique combination of FCST_VAR and N_USE column:
stat_analysis \
-lookin $MET_TUTORIAL_DATA/output/gsidens2orank/diag_conv_ORANK.stat \
-job aggregate_stat -line_type ORANK -out_line_type PHIST -out_bin_size 0.10 \
-by FCST_VAR,N_USE