3. Running GSI

This chapter discusses the issues of running GSI. It starts with introductions to the input data required to run GSI, then proceeds with a detailed explanation of an example GSI run script and introductions to files produced by a successful GSI run. It concludes with some frequently used options from the GSI namelist.

Input Data Required to Run GSI

In most cases, three types of input data (background, observations, and fixed files) must be available before running GSI. In some special idealized cases, such as a pseudo single observation test, GSI can be run without any observations. If running GSI with the 3D EnVar hybrid option, global or regional ensemble forecasts are also needed.

Background or First Guess Field

As with other data analysis systems, the background or first guess fields may come from a model forecast conducted separately or from a previous data assimilation cycle. The following is a list of the types of background files that can be used by this release version of GSI:

  1. WRF-NMM input fields in binary format
  2. WRF-NMM input fields in NetCDF format
  3. WRF-ARW input fields in binary format
  4. WRF-ARW input fields in NetCDF format
  5. GFS input fields in binary format or through NEMS I/O
  6. NEMS-NMMB input fields
  7. RTMA input files (2-dimensional binary format)
  8. WRF-Chem GOCART input fields with NetCDF format
  9. CMAQ binary file

The Weather Research and Forecasting (WRF) community modeling system includes two dynamical cores: the Advanced Research WRF (ARW) and the Nonhydrostatic Mesoscale Model (NMM). The GFS (Global Forecast System), NEMS (National Environmental Modeling System)-NMMB (Nonhydrostatic Mesoscale Model B-Grid), and RTMA (Real-Time Mesoscale Analysis) are operational systems at the National Center for Environmental Prediction (NCEP). The DTC mainly supports GSI for regional WRF applications. Therefore, most of the multiple platform tests were conducted using WRF netcdf background files (d). The DTC also supports the GSI in global and chemical applications with limited resources. The following backgrounds have been tested for this release:

  1. ARW NetCDF (d) were tested with multiple cases
  2. GFS (e) was tested with multiple NCEP cases
  3. WRF-Chem NetCDF (h) was tested with a single case
  4. NEMS-NMMB(f) was tested with a single case

Observations

GSI can analyze many types of observational data, including conventional data, satellite radiance observations, GPS Radio Occultations, and radar data, among others. The default observation file names are given in the released GSI namelist, with corresponding observations included in each file. Sample BUFR files available for download from the NCEP website listed in table [t31].

The observations are complex and many observations need format converting and quality control before being used by GSI. GSI ingests observations saved in BUFR format (with NCEP specified features). The NCEP processed PrepBUFR and BUFR files can be used directly. If users need to introduce their own data into GSI, please check the following website for the Users Guide and examples of BUFR/PreBUFR processing:

http://www.dtcenter.org/com-GSI/BUFR/index.php

DTC supports BUFR/PrepBUFR data processing and quality control as part of the GSI community tasks.

GSI can analyze all of the data types in table [t31], but each GSI run (for both operation and case study purposes) only uses a subset of the data. Some data may be outdated and not available, some are in monitoring mode, and some may have quality issues during certain periods. Users are encouraged to check data quality prior to running an analysis. The following NCEP links provide resources that include data quality history:

Because the current regional models do not have ozone as a prognostic variable, ozone data are not assimilated on the regional scale.

GSI can be run without any observations to see how the moisture constraint modifies the first guess (background) field. GSI can also be run in a pseudo single observation mode, which does not require any BUFR observation files. In this mode, users should specify observation information in the namelist section SINGLEOB_TEST (see Section [sec4.2] for details). As more data files are used, additional information will be added through the GSI analysis.

GSI observation file names, content, and examples
GSI Name Content Example file names
prepbufr Conventional observations, including ps, t, q, pw, uv, spd, dw, sst gdas1.t12z.prepbufr.nr
satwndbufr satellite winds observations gdas1.t12z.satwnd.tm00.bufr_d
amsuabufr AMSU-A 1b radiance (brightness temperatures) from satellites NOAA-15, 16, 17,18, 19 and METOP-A/B gdas1.t12z.1bamua.tm00.bufr_d
amsubbufr AMSU-B 1b radiance (brightness temperatures) from satellites NOAA-15, 16,17 gdas1.t12z.1bamub.tm00.bufr_d
radarbufr Radar radial velocity Level 2.5 data ndas.t12z.radwnd.tm12.bufr_d
gpsrobufr GPS radio occultation and bending angle observation gdas1.t12z.gpsro.tm00.bufr_d
ssmirrbufr Precipitation rate observations from SSM/I gdas1.t12z.spssmi.tm00.bufr_d
tmirrbufr Precipitation rate observations from TMI gdas1.t12z.sptrmm.tm00.bufr_d
sbuvbufr SBUV/2 ozone observations from satellite NOAA-16, 17, 18, 19 gdas1.t12z.osbuv8.tm00.bufr_d
hirs2bufr HIRS2 1b radiance from satellite NOAA-14 gdas1.t12z.1bhrs2.tm00.bufr_d
hirs3bufr HIRS3 1b radiance observations from satellite NOAA-16, 17 gdas1.t12z.1bhrs3.tm00.bufr_d
hirs4bufr HIRS4 1b radiance observation from satellite NOAA-18, 19 and METOP-A/B gdas1.t12z.1bhrs4.tm00.bufr_d
msubufr MSU observation from satellite NOAA 14 gdas1.t12z.1bmsu.tm00.bufr_d
airsbufr AMSU-A and AIRS radiances from satellite AQUA gdas1.t12z.airsev.tm00.bufr_d
mhsbufr Microwave Humidity Sounder observation from NOAA-18, 19 and METOP-A/B gdas1.t12z.1bmhs.tm00.bufr_d
ssmitbufr SSMI observation from satellite f13, f14, f15 gdas1.t12z.ssmit.tm00.bufr_d
amsrebufr AMSR-E radiance from satellite AQUA gdas1.t12z.amsre.tm00.bufr_d
ssmisbufr SSMIS radiances from satellite f16 gdas1.t12z.ssmis.tm00.bufr_d
gsnd1bufr GOES sounder radiance (sndrd1, sndrd2, sndrd3 sndrd4) from GOES-11, 12, 13, 14, 15. gdas1.t12z.goesfv.tm00.bufr_d
l2rwbufr NEXRAD Level 2 radial velocity ndas.t12z.nexrad.tm12.bufr_d
gsndrbufr GOES sounder radiance from GOES-11, 12 gdas1.t12z.goesnd.tm00.bufr_d
gimgrbufr GOES imager radiance from GOE-11, 12  
omibufr Ozone Monitoring Instrument (OMI) observation NASA Aura gdas1.t12z.omi.tm00.bufr_d
iasibufr Infrared Atmospheric Sounding Interfero-meter sounder observations from METOP-A/B gdas1.t12z.mtiasi.tm00.bufr_d
gomebufr The Global Ozone Monitoring Experiment (GOME) ozone observation from METOP-A/B gdas1.t12z.gome.tm00.bufr_d
mlsbufr Aura MLS stratospheric ozone data from Aura gdas1.t12z.mlsbufr.tm00.bufr_d
tcvitl Synthetic Tropic Cyclone-MSLP observation gdas1.t12z.syndata.tcvitals.tm00
seviribufr SEVIRI radiance from MET-08,09,10 gdas1.t12z. sevcsr.tm00.bufr_d
atmsbufr ATMS radiance from Suomi NPP gdas1.t12z.atms.tm00.bufr_d
crisbufr CRIS radiance from Suomi NPP gdas1.t12z.cris.tm00.bufr_d
modisbufr MODIS aerosol total column AOD observations from AQUA and TERRA  

[t31]

Fixed Files (Statistics and Control Files)

A GSI analysis also needs to read specific information from statistic files, configuration files, bias correction files, and CRTM coefficient files. We refer to these files as fixed files and they are located in a directory called fix/ in the release package, except for CRTM coefficients.

Table [t32] lists fixed files required for a GSI run, the content of the files, and corresponding example files from the regional and global applications:

Because most of those fixed files have hardwired names inside the GSI, a GSI run script needs to copy or link those files (right column in table [t32]) from the ./fix directory to the GSI run directory with the file name required in GSI (left column in table [t32]). For example, if GSI runs with an ARW background, the following line should be in the run script:

cp ${path of the fix directory}/anavinfo_arw_netcdf anavinfo

Note that in this release, there is a strict rule that the numbers of vertical levels in the file anavinfo must match the background file (for example, wrfinput_d01) for the 3-dimensional variables. Otherwise GSI will fail. To identify the correct numbers of vertical levels, users can dump out (use ncdump -h) the dimensions from the NetCDF background file and find the number for bottom_top and bottom_top_stag. For example, if the dimensions for the background file is:

bottom_top = 50 ;
bottom_top_stag = 51 ;

Then the corresponding anavinfo file should have 51 levels for prse (3-dimemsional pressure field) and 50 levels for other three-dimensional variables such as u, v, tv, q, oz, cw, etc. For details, users can dump out the global attributes of the background file and find the number of vertical levels for each variable. The following shows part of the anavinfo file for the above background:

state_derivatives::
!var  level  src
 ps   1      met_guess
 u    50     met_guess
 v    50     met_guess
 tv   50     met_guess
 q    50     met_guess
 oz   50     met_guess
 cw   50     met_guess
 prse 51     met_guess
::
GSI fixed files, content, and examples
GSI Name Content Example file names
anavinfo Information file to set control and analysis variables anavinfo_arw_netcdf anavinfo_ndas_netcdf global_anavinfo.l64.txt
berror_stats background error covariance nam_nmmstat_na.gcv nam_glb_berror.f77.gcv global_berror.l64y386.f77
errtable Observation error table nam_errtable.r3dv prepobs_errtable.global
   
convinfo Conventional observation information file global_convinfo.txt nam_regional_convinfo.txt
satinfo satellite channel information file global_satinfo.txt
pcpinfo precipitation rate observation information file global_pcpinfo.txt
ozinfo ozone observation information file global_ozinfo.txt
   
satbias_angle satellite scan angle dependent bias correction file global_satangbias.txt
satellite mass bias correction coefficient file sample.satbias
combined satellite angle dependent and mass bias correction coefficient file gdas1.t00z.abias.new
t_rejectlist, w_rejectlist,.. Rejetion list for T, wind, et al. in RTMA new_rtma_t_rejectlist new_rtma_w_rejectlist

[t32]

Each operational system, such as GFS, NAM, RAP, and RTMA, has their own set of fixed files. For your specific GSI runs, you need to get the correct set of fixed files. Fixed files for regional applications are included in this GSI/EnKF release and put under the fix/ directory. Fixed files for global applications are not included in this release in order to save space. Please download comGSIv3.7_EnKFv1.3_fix_global.tar.gz comGSIv3.7_EnKFv1.3_fix_global.tar.gz if you need to run global cases. Note that little endian background error covariance files are no longer supported.

Each release version of the GSI calls a certain version of the CRTM library and needs corresponding CRTM coefficients to do radiance data assimilation. This version of GSI uses CRTM 2.2.3. The coefficient files are listed in table [t34].

List of radiance coefficients used by CRTM
File name used in GSI Content Example Files
Nalli.IRwater.EmisCoeff.bin IR surface emissivity Nalli.IRwater.EmisCoeff.bin
NPOESS.IRice.EmisCoeff.bin coefficients NPOESS.IRice.EmisCoeff.bin
NPOESS.IRsnow.EmisCoeff.bin   NPOESS.IRsnow.EmisCoeff.bin
NPOESS.IRland.EmisCoeff.bin   NPOESS.IRland.EmisCoeff.bin
NPOESS.VISice.EmisCoeff.bin   NPOESS.VISice.EmisCoeff.bin
NPOESS.VISland.EmisCoeff.bin   NPOESS.VISland.EmisCoeff.bin
NPOESS.VISsnow.EmisCoeff.bin   NPOESS.VISsnow.EmisCoeff.bin
NPOESS.VISwater.EmisCoeff.bin   NPOESS.VISwater.EmisCoeff.bin
FASTEM6.MWwater.EmisCoeff.bin   FASTEM6.MWwater.EmisCoeff.bin
AerosolCoeff.bin Aerosol coefficients AerosolCoeff.bin
CloudCoeff.bin Cloud scattering and emission coefficients CloudCoeff.bin
${satsen}.SpcCoeff.bin Sensor spectral response characteristics ${satsen}.SpcCoeff.bin
${satsen}.TauCoeff.bin Transmittance coefficients ${satsen}.TauCoeff.bin

[t34]

GSI Run Script

In this release version, three sample run scripts are available for different GSI applications:

  • ush/comgsi_run_regional.ksh for regional GSI
  • ush/comgsi_run_global.ksh for global GSI (GFS)
  • ush/comgsi_run_chem.ksh for chemical analysis

These scripts will be called to generate GSI namelists:

  • ush/comgsi_namelist.sh for regional GSI
  • ush/comgsi_namelist_gfs.sh for global GSI (GFS)
  • ush/comgsi_namelist_chem.sh for GSI chemical analysis

We will introduce the regional run scripts (comgsi_run_regional.ksh) in detail in the following sections and introduce the global run script when we discuss the GSI global application in the Advanced GSI Users Guide.

Note there is also a run script for regional EnKF (comenkf_run_regional.ksh), a run script for global EnKF (comenkf_run_global.ksh) and the EnKF namelist script (comenkf_namelist.sh) in the same directory, which will be introduced in the EnKF Users Guide.

Steps in the GSI Run Script

The GSI run script creates a run time environment necessary to run the GSI executable. A typical GSI run script includes the following steps:

  1. Request computer resources to run GSI.
  2. Set environmental variables for the machine architecture.
  3. Set experimental variables (such as experiment name, analysis time, background, and observation).
  4. Set the script that generates the GSI namelist.
  5. Check the definitions of required variables.
  6. Generate a run directory for GSI (sometimes called a working or temporary directory).
  7. Copy the GSI executable to the run directory.
  8. Copy the background file to the run directory and create an index file listing the location and name of ensemble members if running with a hybrid set up.
  9. Link observations to the run directory.
  10. Link fixed files (statistic, control, and coefficient files) to the run directory.
  11. Generate namelist for GSI.
  12. Run the GSI executable.
  13. Post-process: save analysis results, generate diagnostic files, and clean the run directory.
  14. Run GSI as observation operator for EnKF, only for if_observer=Yes.

Typically, users only need to modify specific parts of the run script (steps 1, 2, and 3) to fit their specific computer environment and point to the correct input/output files and directories. Users may also need to modify step 4 if changes are made to the namelist and it is under a different name or at a different location. The next section (1.2.2) covers each of these modifications for steps 1 to 3. Section 1.2.3 will dissect a sample regional GSI run script and introduce each piece of this sample GSI run script. Users should start with the run script provided in the same release package with the GSI executable and modify it for their own run environment and case configuration.

Customization of the GSI Run Script

This section focuses on step 1 of the run script: modifying the machine specific entries. Specifically, this consists of setting Unix/Linux environment variables and selecting the correct parallel run time environment (batch system with options).

GSI can be run with the same parallel environments as other MPI programs, for example:

  • IBM supercomputer using LSF (Load Sharing Facility)
  • IBM supercomputer using LoadLevel
  • Linux clusters using PBS (Portable Batch System)
  • Linux clusters using LSF
  • Linux workstation (no batch system)
  • Intel Mac Darwin workstation with PGI complier (no batch system)

Two queuing systems are listed below as examples:

Machine & queue system Linux Cluster with LSF Linux Cluster with PBS Workstation
example
#BSUB -P ????????
#BSUB -W 00:10
#BSUB -n 4
#BSUB -R "span[ptile=16]
#BSUB -J gsi
#BSUB -o gsi.%J.out
#BSUB -e gsi.%J.err
#BSUB -q small
#PBS -l procs=4
#PBS -n
#PBS -o gsi.out
#PBS -e gsi.err
#PBS -N GSI
#PBS -l walltime=00:20
#PBS -A ??????
No batch system, skip this step

[t35]

In both of the examples above, environment variables are set specifying system resource management, such as the number of processors, the name/type of queue, maximum wall clock time allocated for the job, options for standard out and standard error, etc. Some platforms need additional definitions to specify Unix environment variables that further define the run environment.

These variable settings can significantly impact the GSI run efficiency and accuracy of the GSI results. Please check with your system administrator for optimal settings for your computer system. Note that while the GSI can be run with any number of processors, it will not scale well with the increase of processor numbers after a certain threshold based on the case configuration and GSI application types.

There are only two options to define in this block.

# GSIPROC = processor number used for GSI analysis
#------------------------------------------------
  GSIPROC=4
  ARCH='LINUX_LSF'
# Supported configurations:
            # IBM_LSF,
            # LINUX, LINUX_LSF, LINUX_PBS,
            # DARWIN_PGI

The option ARCH selects the machine architecture. It is a function of platform type and batch queuing system. The option GSIPROC sets the number of cores used in the run. This option also decides if the job is run as a multiple core job or as a single core run. Several choices of the option ARCH are listed in the sample run script. Please check with your system administrator about running parallel MPI jobs on your system.

Option ARCH Platform Compiler batch queuing system
IBM_LSF IBM AIX xlf, xlc LSF
LINUX Linux workstation Intel/PGI/GNU mpirun if GSIPROC > 1
LINUX_LSF Linux cluster Intel/PGI/GNU LSF
LINUX_PBS Linux cluster Intel/PGI/GNU PBS
DARWIN_PGI MAC DARWIN PGI mpirun if GSIPROC > 1

[t36]

This section discusses setting up variables specific to a given case, such as analysis time, working directory, background and observation files, location of fixed files and CRTM coefficients, the GSI executable file, and the script generating GSI namelist.

#####################################################
# case set up (users should change this part)
#####################################################
#
# ANAL_TIME= analysis time  (YYYYMMDDHH)
# WORK_ROOT= working directory, where GSI runs
# PREPBURF = path of PreBUFR conventional obs
# BK_FILE  = path and name of background file
# OBS_ROOT = path of observations files
# FIX_ROOT = path of fix files
# GSI_EXE  = path and name of the gsi executable
# ENS_ROOT = path where ensemble background files exist
  ANAL_TIME=2017051318
  JOB_DIR=the_job_directory
     #normally you put run scripts here and submit jobs form here,
     #require a copy of gsi.x at this directory
  RUN_NAME=a_descriptive_run_name_such_as_case05_3denvar_etc
  OBS_ROOT=the_directory_where_observation_files_are_located
  BK_ROOT=the_directory_where_background_files_are_located
  GSI_ROOT=the_comgsi_main directory where src/ ush/ fix/ etc are located
  CRTM_ROOT=the_CRTM_directory
  ENS_ROOT=the_directory_where_ensemble_backgrounds_are_located
      #ENS_ROOT is not required if not running hybrid EnVAR
  HH=`echo $ANAL_TIME | cut -c9-10`
  GSI_EXE=${JOB_DIR}/gsi.x  #assume you have a copy of gsi.x here
  WORK_ROOT=${JOB_DIR}/${RUN_NAME}
  FIX_ROOT=${GSI_ROOT}/fix
  GSI_NAMELIST=${GSI_ROOT}/ush/comgsi_namelist.sh
  PREPBUFR=${OBS_ROOT}/nam.t${HH}z.prepbufr.tm00
  BK_FILE=${BK_ROOT}/wrfinput_d01.${ANAL_TIME}

When picking the observation BUFR files, please be aware of the following:

  • GSI run will stop if the time in the background file does not match the cycle time in the observation BUFR file used for the GSI run (there is a namelist option to turn this verification step off).
  • Even if their contents are identical, PrepBUFR/BUFR files will differ if they were created on platforms with different endian byte order specification (Linux vs. IBM). Appendix A.1 discusses the conversion tool SSRC used to byte-swap observation files. Since release version 3.2, GSI compiled with PGI and Intel can automatically handle byte order issues in PrepBUFR and BUFR files. Users can directly link BUFR files of any order if working with Intel and PGI platform.

The next part of this block focuses on additional options that specify important aspects of the GSI configuration.

#------------------------------------------------
# bk_core= which WRF core is used as background (NMM or ARW or NMMB)
# bkcv_option= which background error covariance and parameter will be used
#              (GLOBAL or NAM)
# if_clean = clean  : delete temperal files in working directory (default)
#            no     : leave running directory as is (this is for debug only)
# if_observer = Yes  : only used as observation operater for enkf
# if_hybrid   = Yes  : Run GSI as 3D/4D EnVar
# if_4DEnVar  = Yes  : Run GSI as 4D EnVar
# if_nemsio = Yes    : The GFS background files are in NEMSIO format
# if_oneob  = Yes    : Do single observation test
  if_hybrid=No     # Yes, or, No -- case sensitive !
  if_4DEnVar=No    # Yes, or, No -- case sensitive (set if_hybrid=Yes first)!
  if_observer=No   # Yes, or, No -- case sensitive !
  if_nemsio=No     # Yes, or, No -- case sensitive !
  if_oneob=No      # Yes, or, No -- case sensitive !

  bk_core=ARW
  bkcv_option=NAM
  if_clean=clean
#
# setup whether to do single obs test
  if [ ${if_oneob} = Yes ]; then
    if_oneobtest='.true.'
  else
    if_oneobtest='.false.'
  fi
#
# setup for GSI 3D/4D EnVar hybrid
  if [ ${if_hybrid} = Yes ] ; then
    PDYa=`echo $ANAL_TIME | cut -c1-8`
    cyca=`echo $ANAL_TIME | cut -c9-10`
    gdate=`date -u -d "$PDYa $cyca -6 hour" +%Y%m%d%H` #guess date is 6hr ago
    gHH=`echo $gdate |cut -c9-10`
    datem1=`date -u -d "$PDYa $cyca -1 hour" +%Y-%m-%d_%H:%M:%S` #1hr ago
    datep1=`date -u -d "$PDYa $cyca 1 hour"  +%Y-%m-%d_%H:%M:%S`  #1hr later
    if [ ${if_nemsio} = Yes ]; then
      if_gfs_nemsio='.true.'
      ENSEMBLE_FILE_mem=${ENS_ROOT}/gdas.t${gHH}z.atmf006s.mem
    else
      if_gfs_nemsio='.false.'
      ENSEMBLE_FILE_mem=${ENS_ROOT}/sfg_${gdate}_fhr06s_mem
    fi

    if [ ${if_4DEnVar} = Yes ] ; then
      BK_FILE_P1=${BK_ROOT}/wrfout_d01_${datep1}
      BK_FILE_M1=${BK_ROOT}/wrfout_d01_${datem1}

      if [ ${if_nemsio} = Yes ]; then
        ENSEMBLE_FILE_mem_p1=${ENS_ROOT}/gdas.t${gHH}z.atmf009s.mem
        ENSEMBLE_FILE_mem_m1=${ENS_ROOT}/gdas.t${gHH}z.atmf003s.mem
      else
        ENSEMBLE_FILE_mem_p1=${ENS_ROOT}/sfg_${gdate}_fhr09s_mem
        ENSEMBLE_FILE_mem_m1=${ENS_ROOT}/sfg_${gdate}_fhr03s_mem
      fi
    fi
  fi

# The following two only apply when if_observer = Yes, i.e. run observation operator for EnKF
# no_member     number of ensemble members
# BK_FILE_mem   path and base for ensemble members
  no_member=20
  BK_FILE_mem=${BK_ROOT}/wrfarw.mem
#

Option if_hybrid controls whether to run a hybrid ensemble/variational data analysis. If if_hybrid=Yes, option if_4DEnVar=Yes indicates a hybrid 4D-EnVar analysis will be run, while if_4DEnVar=No indicates a hybrid 3DEnVAR analysis will be run. Option if_observer determines whether GSI is run as an observation operator for EnKF.

Option bk_core indicates the specific dynamic core used to create the background files and specifies the core in the namelist. Option bk_core can be ARW or NMMB. Option bkcv_option specifies the background error covariance to be used in the case. Two regional background error covariance matrices are provided with the release, one from NCEP global data assimilation (GDAS), and one from the NAM data assimilation system (NDAS). Please check Section [sec4.8] for more details about GSI background error covariance. Option if_clean tells the script if it needs to delete temporary intermediate files in the working directory after a GSI run is completed.

In most cases, users should only make minor changes after the following:

#####################################################
# Users should NOT change script after this point
#####################################################
#
BYTE_ORDER=Big_Endian
# BYTE_ORDER=Little_Endian

Description of the Sample Regional Run Script to Run GSI

Listed below is an annotated regional run script with explanations on each function block.

For further details on the first three blocks of the script that users need to change, see sections 3.2.2.1, 3.2.2.2, and 3.2.2.3:

#!/bin/ksh
#####################################################
# machine set up (users should change this part)
#####################################################

set -x
#
# GSIPROC = processor number used for GSI analysis
#------------------------------------------------
  GSIPROC=1
  ARCH='LINUX_LSF'

# Supported configurations:
            # IBM_LSF,
            # LINUX, LINUX_LSF, LINUX_PBS,
            # DARWIN_PGI
#
#####################################################
# case set up (users should change this part)
#####################################################
#
# ANAL_TIME= analysis time  (YYYYMMDDHH)
# WORK_ROOT= working directory, where GSI runs
# PREPBURF = path of PreBUFR conventional obs
# BK_FILE  = path and name of background file
# OBS_ROOT = path of observations files
# FIX_ROOT = path of fix files
# GSI_EXE  = path and name of the gsi executable
# ENS_ROOT = path where ensemble background files exist
  ANAL_TIME=2017051318
  JOB_DIR=the_job_directory
     #normally you put run scripts here and submit jobs form here, require a copy of gsi.x at this directory
  RUN_NAME=a_descriptive_run_name_such_as_case05_3denvar_etc
  OBS_ROOT=the_directory_where_observation_files_are_located
  BK_ROOT=the_directory_where_background_files_are_located
  GSI_ROOT=the_comgsi_main directory where src/ ush/ fix/ etc are located
  CRTM_ROOT=the_CRTM_directory
  ENS_ROOT=the_directory_where_ensemble_backgrounds_are_located
      #ENS_ROOT is not required if not running hybrid EnVAR
  HH=`echo $ANAL_TIME | cut -c9-10`
  GSI_EXE=${JOB_DIR}/gsi.x  #assume you have a copy of gsi.x here
  WORK_ROOT=${JOB_DIR}/${RUN_NAME}
  FIX_ROOT=${GSI_ROOT}/fix
  GSI_NAMELIST=${GSI_ROOT}/ush/comgsi_namelist.sh
  PREPBUFR=${OBS_ROOT}/nam.t${HH}z.prepbufr.tm00
  BK_FILE=${BK_ROOT}/wrfinput_d01.${ANAL_TIME}
#
#------------------------------------------------
# bk_core= which WRF core is used as background (NMM or ARW or NMMB)
# bkcv_option= which background error covariance and parameter will be used
#              (GLOBAL or NAM)
# if_clean = clean  : delete temperal files in working directory (default)
#            no     : leave running directory as is (this is for debug only)
# if_observer = Yes  : only used as observation operater for enkf
# if_hybrid   = Yes  : Run GSI as 3D/4D EnVar
# if_4DEnVar  = Yes  : Run GSI as 4D EnVar
# if_nemsio = Yes    : The GFS background files are in NEMSIO format
# if_oneob  = Yes    : Do single observation test
  if_hybrid=No     # Yes, or, No -- case sensitive !
  if_4DEnVar=No    # Yes, or, No -- case sensitive (set if_hybrid=Yes first)!
  if_observer=No   # Yes, or, No -- case sensitive !
  if_nemsio=No     # Yes, or, No -- case sensitive !
  if_oneob=No      # Yes, or, No -- case sensitive !

  bk_core=ARW
  bkcv_option=NAM
  if_clean=clean
#
# setup whether to do single obs test
  if [ ${if_oneob} = Yes ]; then
    if_oneobtest='.true.'
  else
    if_oneobtest='.false.'
  fi
#
# setup for GSI 3D/4D EnVar hybrid
  if [ ${if_hybrid} = Yes ] ; then
    PDYa=`echo $ANAL_TIME | cut -c1-8`
    cyca=`echo $ANAL_TIME | cut -c9-10`
    gdate=`date -u -d "$PDYa $cyca -6 hour" +%Y%m%d%H` #guess date is 6hr ago
    gHH=`echo $gdate |cut -c9-10`
    datem1=`date -u -d "$PDYa $cyca -1 hour" +%Y-%m-%d_%H:%M:%S` #1hr ago
    datep1=`date -u -d "$PDYa $cyca 1 hour"  +%Y-%m-%d_%H:%M:%S`  #1hr later
    if [ ${if_nemsio} = Yes ]; then
      if_gfs_nemsio='.true.'
      ENSEMBLE_FILE_mem=${ENS_ROOT}/gdas.t${gHH}z.atmf006s.mem
    else
      if_gfs_nemsio='.false.'
      ENSEMBLE_FILE_mem=${ENS_ROOT}/sfg_${gdate}_fhr06s_mem
    fi

    if [ ${if_4DEnVar} = Yes ] ; then
      BK_FILE_P1=${BK_ROOT}/wrfout_d01_${datep1}
      BK_FILE_M1=${BK_ROOT}/wrfout_d01_${datem1}

      if [ ${if_nemsio} = Yes ]; then
        ENSEMBLE_FILE_mem_p1=${ENS_ROOT}/gdas.t${gHH}z.atmf009s.mem
        ENSEMBLE_FILE_mem_m1=${ENS_ROOT}/gdas.t${gHH}z.atmf003s.mem
      else
        ENSEMBLE_FILE_mem_p1=${ENS_ROOT}/sfg_${gdate}_fhr09s_mem
        ENSEMBLE_FILE_mem_m1=${ENS_ROOT}/sfg_${gdate}_fhr03s_mem
      fi
    fi
  fi

# The following two only apply when if_observer = Yes, i.e. run observation operator for EnKF
# no_member     number of ensemble members
# BK_FILE_mem   path and base for ensemble members
  no_member=20
  BK_FILE_mem=${BK_ROOT}/wrfarw.mem
#
#

At this point, users should be able to run the GSI for simple cases without changing the scripts. However, some advanced users may need to change some of the following blocks for special applications, such as use of radiance data, cycled runs, specifying certain namelist variables, or running GSI on a platform not tested by the DTC.

#####################################################
# Users should NOT change script after this point
#####################################################

The next block sets the run command for GSI on multiple platforms. The ARCH variable is set at the beginning of the script. Option BYTE_ORDER has been set as Big_Endian because GSI compiled with Intel and PGI can read a Big_Endian background error file, BUFR files, and CRTM coefficient files.

#####################################################
# Users should NOT make changes after this point
#####################################################
#
BYTE_ORDER=Big_Endian
# BYTE_ORDER=Little_Endian

case $ARCH in
   'IBM_LSF')
      ###### IBM LSF (Load Sharing Facility)
      RUN_COMMAND="mpirun.lsf " ;;

   'LINUX')
      if [ $GSIPROC = 1 ]; then
         #### Linux workstation - single processor
         RUN_COMMAND=""
      else
         ###### Linux workstation -  mpi run
        RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach "
      fi ;;

   'LINUX_LSF')
      ###### LINUX LSF (Load Sharing Facility)
      RUN_COMMAND="mpirun.lsf " ;;

   'LINUX_PBS')
      #### Linux cluster PBS (Portable Batch System)
      RUN_COMMAND="mpirun -np ${GSIPROC} " ;;

   'DARWIN_PGI')
      ### Mac - mpi run
      if [ $GSIPROC = 1 ]; then
         #### Mac workstation - single processor
         RUN_COMMAND=""
      else
         ###### Mac workstation -  mpi run
         RUN_COMMAND="mpirun -np ${GSIPROC} -machinefile ~/mach "
      fi ;;

   * )
     print "error: $ARCH is not a supported platform configuration."
     exit 1 ;;
esac

The next block checks if all the variables needed for a GSI run are properly defined. These variables should have been defined in the first three parts of this script.

##################################################################################
# Check GSI needed environment variables are defined and exist
#

# Make sure ANAL_TIME is defined and in the correct format
if [ ! "${ANAL_TIME}" ]; then
  echo "ERROR: \$ANAL_TIME is not defined!"
  exit 1
fi

# Make sure WORK_ROOT is defined and exists
if [ ! "${WORK_ROOT}" ]; then
  echo "ERROR: \$WORK_ROOT is not defined!"
  exit 1
fi

# Make sure the background file exists
if [ ! -r "${BK_FILE}" ]; then
  echo "ERROR: ${BK_FILE} does not exist!"
  exit 1
fi

# Make sure OBS_ROOT is defined and exists
if [ ! "${OBS_ROOT}" ]; then
  echo "ERROR: \$OBS_ROOT is not defined!"
  exit 1
fi
if [ ! -d "${OBS_ROOT}" ]; then
  echo "ERROR: OBS_ROOT directory '${OBS_ROOT}' does not exist!"
  exit 1
fi

# Set the path to the GSI static files
if [ ! "${FIX_ROOT}" ]; then
  echo "ERROR: \$FIX_ROOT is not defined!"
  exit 1
fi
if [ ! -d "${FIX_ROOT}" ]; then
  echo "ERROR: fix directory '${FIX_ROOT}' does not exist!"
  exit 1
fi

# Set the path to the CRTM coefficients
if [ ! "${CRTM_ROOT}" ]; then
  echo "ERROR: \$CRTM_ROOT is not defined!"
  exit 1
fi
if [ ! -d "${CRTM_ROOT}" ]; then
  echo "ERROR: fix directory '${CRTM_ROOT}' does not exist!"
  exit 1
fi


# Make sure the GSI executable exists
if [ ! -x "${GSI_EXE}" ]; then
  echo "ERROR: ${GSI_EXE} does not exist!"
  exit 1
fi

# Check to make sure the number of processors for running GSI was specified
if [ -z "${GSIPROC}" ]; then
  echo "ERROR: The variable $GSIPROC must be set to contain the number of processors to run GSI"
  exit 1
fi

The next block creates a working directory (workdir) in which GSI will run. The directory should have enough disk space to hold all the files needed for this run. This directory is cleaned before each run, therefore, save all the files needed from the previous run before rerunning GSI.

##################################################################################
# Create the work directory and cd into it

workdir=${WORK_ROOT}
echo " Create working directory:" ${workdir}

if [ -d "${workdir}" ]; then
  rm -rf ${workdir}
fi
mkdir -p ${workdir}
cd ${workdir}

#
##################################################################################

echo " Copy GSI executable, background file, and link observation bufr to working directory"

# Save a copy of the GSI executable in the workdir
cp ${GSI_EXE} gsi.exe

# Bring over background field (it's modified by GSI so we can't link to it)
cp ${BK_FILE} ./wrf_inout
if [ ${if_4DEnVar} = Yes ] ; then
  cp ${BK_FILE_P1} ./wrf_inou3
  cp ${BK_FILE_M1} ./wrf_inou1
fi

Note: You can link observation files to the working directory because GSI will not overwrite these files. The observations that can be analyzed in GSI are listed in the column “dfile” of the GSI namelist section OBS_INPUT, as specified in run/comgsi_namelist.sh. Most of the conventional observations are in one single file named prepbufr, while different radiance data are in separate files based on satellite instruments, such as AMSU-A or HIRS. All these observation files must be linked as GSI recognized file names in “dfile.” Please check table [t31] for a detailed explanation of links and the meanings of each file name listed below.

# Link to the prepbufr data
ln -s ${PREPBUFR} ./prepbufr

# ln -s ${OBS_ROOT}/gdas1.t${HH}z.sptrmm.tm00.bufr_d tmirrbufr
# Link to the radiance data
srcobsfile[1]=${OBS_ROOT}/gdas1.t${HH}z.satwnd.tm00.bufr_d
gsiobsfile[1]=satwnd
srcobsfile[2]=${OBS_ROOT}/gdas1.t${HH}z.1bamua.tm00.bufr_d
gsiobsfile[2]=amsuabufr
srcobsfile[3]=${OBS_ROOT}/gdas1.t${HH}z.1bhrs4.tm00.bufr_d
gsiobsfile[3]=hirs4bufr
srcobsfile[4]=${OBS_ROOT}/gdas1.t${HH}z.1bmhs.tm00.bufr_d
gsiobsfile[4]=mhsbufr
srcobsfile[5]=${OBS_ROOT}/gdas1.t${HH}z.1bamub.tm00.bufr_d
gsiobsfile[5]=amsubbufr
srcobsfile[6]=${OBS_ROOT}/gdas1.t${HH}z.ssmisu.tm00.bufr_d
gsiobsfile[6]=ssmirrbufr
# srcobsfile[7]=${OBS_ROOT}/gdas1.t${HH}z.airsev.tm00.bufr_d
gsiobsfile[7]=airsbufr
srcobsfile[8]=${OBS_ROOT}/gdas1.t${HH}z.sevcsr.tm00.bufr_d
gsiobsfile[8]=seviribufr
srcobsfile[9]=${OBS_ROOT}/gdas1.t${HH}z.iasidb.tm00.bufr_d
gsiobsfile[9]=iasibufr
srcobsfile[10]=${OBS_ROOT}/gdas1.t${HH}z.gpsro.tm00.bufr_d
gsiobsfile[10]=gpsrobufr
srcobsfile[11]=${OBS_ROOT}/gdas1.t${HH}z.amsr2.tm00.bufr_d
gsiobsfile[11]=amsrebufr
srcobsfile[12]=${OBS_ROOT}/gdas1.t${HH}z.atms.tm00.bufr_d
gsiobsfile[12]=atmsbufr
srcobsfile[13]=${OBS_ROOT}/gdas1.t${HH}z.geoimr.tm00.bufr_d
gsiobsfile[13]=gimgrbufr
srcobsfile[14]=${OBS_ROOT}/gdas1.t${HH}z.gome.tm00.bufr_d
gsiobsfile[14]=gomebufr
srcobsfile[15]=${OBS_ROOT}/gdas1.t${HH}z.omi.tm00.bufr_d
gsiobsfile[15]=omibufr
srcobsfile[16]=${OBS_ROOT}/gdas1.t${HH}z.osbuv8.tm00.bufr_d
gsiobsfile[16]=sbuvbufr
srcobsfile[17]=${OBS_ROOT}/gdas1.t${HH}z.eshrs3.tm00.bufr_d
gsiobsfile[17]=hirs3bufrears
srcobsfile[18]=${OBS_ROOT}/gdas1.t${HH}z.esamua.tm00.bufr_d
gsiobsfile[18]=amsuabufrears
srcobsfile[19]=${OBS_ROOT}/gdas1.t${HH}z.esmhs.tm00.bufr_d
gsiobsfile[19]=mhsbufrears
srcobsfile[20]=${OBS_ROOT}/rap.t${HH}z.nexrad.tm00.bufr_d
gsiobsfile[20]=l2rwbufr
srcobsfile[21]=${OBS_ROOT}/rap.t${HH}z.lgycld.tm00.bufr_d
gsiobsfile[21]=larcglb
ii=1
while [[ $ii -le 21 ]]; do
   if [ -r "${srcobsfile[$ii]}" ]; then
      ln -s ${srcobsfile[$ii]}  ${gsiobsfile[$ii]}
      echo "link source obs file ${srcobsfile[$ii]}"
   fi
   (( ii = $ii + 1 ))
done

The following block copies constant fixed files from the fix/ directory and links CRTM coefficients. Please check Section 3.1 for the meanings of each fixed file.

##################################################################################

echo " Copy fixed files and link CRTM coefficient files to working directory"

# Set fixed files
#   berror   = forecast model background error statistics
#   specoef  = CRTM spectral coefficients
#   trncoef  = CRTM transmittance coefficients
#   emiscoef = CRTM coefficients for IR sea surface emissivity model
#   aerocoef = CRTM coefficients for aerosol effects
#   cldcoef  = CRTM coefficients for cloud effects
#   satinfo  = text file with information about assimilation of brightness temperatures
#   satangl  = angle dependent bias correction file (fixed in time)
#   pcpinfo  = text file with information about assimilation of prepcipitation rates
#   ozinfo   = text file with information about assimilation of ozone data
#   errtable = text file with obs error for conventional data (regional only)
#   convinfo = text file with information about assimilation of conventional data
#   bufrtable= text file ONLY needed for single obs test (oneobstest=.true.)
#   bftab_sst= bufr table for sst ONLY needed for sst retrieval (retrieval=.true.)

Note: For background error covariances, observation errors, and analysis variable information, we provide two sets of fixed files. One set is based on GFS statistics and another is based on NAM statistics. For this release there is an additional setting in the ANAVINFO file for “bk_core” for both GFS and NAM statistics.

if [ ${bkcv_option} = GLOBAL ] ; then
  echo ' Use global background error covariance'
  BERROR=${FIX_ROOT}/${BYTE_ORDER}/nam_glb_berror.f77.gcv
  OBERROR=${FIX_ROOT}/prepobs_errtable.global
  if [ ${bk_core} = NMM ] ; then
     ANAVINFO=${FIX_ROOT}/anavinfo_ndas_netcdf_glbe
  fi
  if [ ${bk_core} = ARW ] ; then
    ANAVINFO=${FIX_ROOT}/anavinfo_arw_netcdf_glbe
  fi
  if [ ${bk_core} = NMMB ] ; then
    ANAVINFO=${FIX_ROOT}/anavinfo_nems_nmmb_glb
  fi
else
  echo ' Use NAM background error covariance'
  BERROR=${FIX_ROOT}/${BYTE_ORDER}/nam_nmmstat_na.gcv
  OBERROR=${FIX_ROOT}/nam_errtable.r3dv
  if [ ${bk_core} = NMM ] ; then
     ANAVINFO=${FIX_ROOT}/anavinfo_ndas_netcdf
  fi
  if [ ${bk_core} = ARW ] ; then
     ANAVINFO=${FIX_ROOT}/anavinfo_arw_netcdf
  fi
  if [ ${bk_core} = NMMB ] ; then
     ANAVINFO=${FIX_ROOT}/anavinfo_nems_nmmb
  fi
fi

SATINFO=${FIX_ROOT}/global_satinfo.txt
CONVINFO=${FIX_ROOT}/global_convinfo.txt
OZINFO=${FIX_ROOT}/global_ozinfo.txt
PCPINFO=${FIX_ROOT}/global_pcpinfo.txt

#  copy Fixed fields to working directory
 cp $ANAVINFO anavinfo
 cp $BERROR   berror_stats
 cp $SATINFO  satinfo
 cp $CONVINFO convinfo
 cp $OZINFO   ozinfo
 cp $PCPINFO  pcpinfo
 cp $OBERROR  errtable
#
#    # CRTM Spectral and Transmittance coefficients
CRTM_ROOT_ORDER=${CRTM_ROOT}/${BYTE_ORDER}
emiscoef_IRwater=${CRTM_ROOT_ORDER}/Nalli.IRwater.EmisCoeff.bin
emiscoef_IRice=${CRTM_ROOT_ORDER}/NPOESS.IRice.EmisCoeff.bin
emiscoef_IRland=${CRTM_ROOT_ORDER}/NPOESS.IRland.EmisCoeff.bin
emiscoef_IRsnow=${CRTM_ROOT_ORDER}/NPOESS.IRsnow.EmisCoeff.bin
emiscoef_VISice=${CRTM_ROOT_ORDER}/NPOESS.VISice.EmisCoeff.bin
emiscoef_VISland=${CRTM_ROOT_ORDER}/NPOESS.VISland.EmisCoeff.bin
emiscoef_VISsnow=${CRTM_ROOT_ORDER}/NPOESS.VISsnow.EmisCoeff.bin
emiscoef_VISwater=${CRTM_ROOT_ORDER}/NPOESS.VISwater.EmisCoeff.bin
emiscoef_MWwater=${CRTM_ROOT_ORDER}/FASTEM6.MWwater.EmisCoeff.bin
aercoef=${CRTM_ROOT_ORDER}/AerosolCoeff.bin
cldcoef=${CRTM_ROOT_ORDER}/CloudCoeff.bin

ln -s $emiscoef_IRwater ./Nalli.IRwater.EmisCoeff.bin
ln -s $emiscoef_IRice ./NPOESS.IRice.EmisCoeff.bin
ln -s $emiscoef_IRsnow ./NPOESS.IRsnow.EmisCoeff.bin
ln -s $emiscoef_IRland ./NPOESS.IRland.EmisCoeff.bin
ln -s $emiscoef_VISice ./NPOESS.VISice.EmisCoeff.bin
ln -s $emiscoef_VISland ./NPOESS.VISland.EmisCoeff.bin
ln -s $emiscoef_VISsnow ./NPOESS.VISsnow.EmisCoeff.bin
ln -s $emiscoef_VISwater ./NPOESS.VISwater.EmisCoeff.bin
ln -s $emiscoef_MWwater ./FASTEM6.MWwater.EmisCoeff.bin
ln -s $aercoef  ./AerosolCoeff.bin
ln -s $cldcoef  ./CloudCoeff.bin
# Copy CRTM coefficient files based on entries in satinfo file
for file in `awk '{if($1!~"!"){print $1}}' ./satinfo | sort | uniq` ;do
   ln -s ${CRTM_ROOT_ORDER}/${file}.SpcCoeff.bin ./
   ln -s ${CRTM_ROOT_ORDER}/${file}.TauCoeff.bin ./
done

# Only need this file for single obs test
 bufrtable=${FIX_ROOT}/prepobs_prep.bufrtable
 cp $bufrtable ./prepobs_prep.bufrtable

# for satellite bias correction
# Users may need to use their own satbias files for correct bias correction
cp ${GSI_ROOT}/fix/comgsi_satbias_in ./satbias_in
cp ${GSI_ROOT}/fix/comgsi_satbias_pc_in ./satbias_pc_in

Please note that in the above sample script, two files related to radiance bias correction are copied to the work directory:

cp ${GSI_ROOT}/fix/comgsi_satbias_in ./satbias_in
cp ${GSI_ROOT}/fix/comgsi_satbias_pc_in ./satbias_pc_in

There are two options on how to perform the radiance bias correction. The first method is to do the angle dependent bias correction offline and do the mass bias correction inside the GSI analysis, therefore requiring two input files: satbias_angle, corresponding to the angle dependent bias correction file and satbias_in, being the input file for mass bias correction. The second method is to combine the angle dependent and mass bias correction together and do it within the GSI analysis, requiring one combined input file: satbias_in. Note that the input bias correction coefficients file, satbias_in, is different for the two options, therefore it is important to use the appropriate input file for each method. The sample input files for the first method are provided with this release: global_satangbias.txt and sample.satbias. To use the second option - combined angle dependent and mass bias correction, a sample file, gdas1.t00z.abias_pc.20150617, is also provided. As a starting point, users may also download a GDAS satbias coefficient file from the NOMADS ftp site as the input file (starting in spring 2015, the GDAS satbias files have adopted the following format):

ftp://nomads.ncdc.noaa.gov/GDAS/YYYYMM/YYYYMMDD/gdas1.tHHz.abias

In order to use the combined angle dependent and mass bias correction, users also need to set adp_anglebc=.true. in the &SETUP section of the GSI namelist (comgsi_namelist.sh). For more details about the namelist, please see Appendix C in this document.

Set up some constants used in the GSI namelist. Please note that bkcv_option is set for background error tuning. They should be set based on specific applications. Here we provide three sample sets of the constants for different background error covariance options, one set is used in the NAM operations, one for the GFS operations and one for the NMMB operations. In this release, the capability of NMMB application is included and therefore the namelist settings for NMMB are provided in addition to NMM and ARW applications.

##################################################################################
# Set some parameters for use by the GSI executable and to build the namelist
echo " Build the namelist "

# default is NAM
#   as_op='1.0,1.0,0.5 ,0.7,0.7,0.5,1.0,1.0,'
vs_op='1.0,'
hzscl_op='0.373,0.746,1.50,'
if [ ${bkcv_option} = GLOBAL ] ; then
#   as_op='0.6,0.6,0.75,0.75,0.75,0.75,1.0,1.0'
   vs_op='0.7,'
   hzscl_op='1.7,0.8,0.5,'
fi
if [ ${bk_core} = NMMB ] ; then
   vs_op='0.6,'
fi

# default is NMM
   bk_core_arw='.false.'
   bk_core_nmm='.true.'
   bk_core_nmmb='.false.'
   bk_if_netcdf='.true.'
if [ ${bk_core} = ARW ] ; then
   bk_core_arw='.true.'
   bk_core_nmm='.false.'
   bk_core_nmmb='.false.'
   bk_if_netcdf='.true.'
fi
if [ ${bk_core} = NMMB ] ; then
   bk_core_arw='.false.'
   bk_core_nmm='.false.'
   bk_core_nmmb='.true.'
   bk_if_netcdf='.false.'
fi

The following section specifies the number of outer loops and whether to save GSI read observations based on the setting of ”if_observer”.

if [ ${if_observer} = Yes ] ; then
  nummiter=0
  if_read_obs_save='.true.'
  if_read_obs_skip='.false.'
else
  nummiter=2
  if_read_obs_save='.false.'
  if_read_obs_skip='.false.'
fi

The following section of the script is used to generate the GSI namelist called gsiparm.anl in the working directory. A detailed explanation of each variable can be found in Section 3.4 and Appendix C.

# Build the GSI namelist on-the-fly
. $GSI_NAMELIST

The following block modifies the anavinfo file so that its vertical levels are consistent with the wrf_inout file for WRF ARW or NMM. Users no longer need to manually modify the anavinfo file.

# modify the anavinfo vertical levels based on wrf_inout for WRF ARW and NMM
if [ ${bk_core} = ARW ] || [ ${bk_core} = NMM ] ; then
bklevels=`ncdump -h wrf_inout | grep "bottom_top =" | awk '{print $3}' `
bklevels_stag=`ncdump -h wrf_inout | grep "bottom_top_stag =" | awk '{print $3}' `
anavlevels=`cat anavinfo | grep ' sf ' | tail -1 | awk '{print $2}' `  # levels of sf, vp, u, v, t, etc
anavlevels_stag=`cat anavinfo | grep ' prse ' | tail -1 | awk '{print $2}' `  # levels of prse
sed -i 's/ '$anavlevels'/ '$bklevels'/g' anavinfo
sed -i 's/ '$anavlevels_stag'/ '$bklevels_stag'/g' anavinfo
fi

The following block runs GSI and checks if GSI has successfully completed.

###################################################
#  run  GSI
###################################################
echo ' Run GSI with' ${bk_core} 'background'

case $ARCH in
   'IBM_LSF')
      ${RUN_COMMAND} ./gsi.exe < gsiparm.anl > stdout 2>&1  ;;

   * )
      ${RUN_COMMAND} ./gsi.exe > stdout 2>&1  ;;
esac

##################################################################
#  run time error check
##################################################################
error=$?

if [ ${error} -ne 0 ]; then
  echo "ERROR: ${GSI} crashed  Exit status=${error}"
  exit ${error}
fi

The following block saves the analysis results with an understandable name and adds the analysis time to some output file names. Among them, “stdout” contains runtime output of GSI and wrf_inout is the resulting analysis file.

##################################################################
#
#   GSI updating satbias_in
#
# GSI updating satbias_in (only for cycling assimilation)

# Copy the output to more understandable names
ln -s stdout      stdout.anl.${ANAL_TIME}
ln -s wrf_inout   wrfanl.${ANAL_TIME}
ln -s fort.201    fit_p1.${ANAL_TIME}
ln -s fort.202    fit_w1.${ANAL_TIME}
ln -s fort.203    fit_t1.${ANAL_TIME}
ln -s fort.204    fit_q1.${ANAL_TIME}
ln -s fort.207    fit_rad1.${ANAL_TIME}

The following block collects the diagnostic files. The diagnostic files are merged and categorized based on outer loop and data type. Setting “write_diag” to true in the namelist directs GSI to write out diagnostic information for each observation. This information is very useful to check analysis details. Please check Appendix A.2 for the tool to read and analyze these diagnostic files.

# Loop over first and last outer loops to generate innovation
# diagnostic files for indicated observation types (groups)
#
# NOTE:  Since we set miter=2 in GSI namelist SETUP, outer
#        loop 03 will contain innovations with respect to
#        the analysis.  Creation of o-a innovation files
#        is triggered by write_diag(3)=.true.  The setting
#        write_diag(1)=.true. turns on creation of o-g
#        innovation files.
#

loops="01 03"
for loop in $loops; do

case $loop in
  01) string=ges;;
  03) string=anl;;
   *) string=$loop;;
esac

#  Collect diagnostic files for obs types (groups) below
#   listall="conv amsua_metop-a mhs_metop-a hirs4_metop-a hirs2_n14 msu_n14 \
#          sndr_g08 sndr_g10 sndr_g12 sndr_g08_prep sndr_g10_prep sndr_g12_prep \
#          sndrd1_g08 sndrd2_g08 sndrd3_g08 sndrd4_g08 sndrd1_g10 sndrd2_g10 \
#          sndrd3_g10 sndrd4_g10 sndrd1_g12 sndrd2_g12 sndrd3_g12 sndrd4_g12 \
#          hirs3_n15 hirs3_n16 hirs3_n17 amsua_n15 amsua_n16 amsua_n17 \
#          amsub_n15 amsub_n16 amsub_n17 hsb_aqua airs_aqua amsua_aqua \
#          goes_img_g08 goes_img_g10 goes_img_g11 goes_img_g12 \
#          pcp_ssmi_dmsp pcp_tmi_trmm sbuv2_n16 sbuv2_n17 sbuv2_n18 \
#          omi_aura ssmi_f13 ssmi_f14 ssmi_f15 hirs4_n18 amsua_n18 mhs_n18 \
#          amsre_low_aqua amsre_mid_aqua amsre_hig_aqua ssmis_las_f16 \
#          ssmis_uas_f16 ssmis_img_f16 ssmis_env_f16 mhs_metop_b \
#          hirs4_metop_b hirs4_n19 amusa_n19 mhs_n19"
listall=`ls pe* | cut -f2 -d"." | awk '{print substr($0, 0, length($0)-3)}' | sort | uniq`

   for type in $listall; do
      count=`ls pe*${type}_${loop}* | wc -l`
      if [[ $count -gt 0 ]]; then
         cat pe*${type}_${loop}* > diag_${type}_${string}.${ANAL_TIME}
      fi
   done
done

The following scripts clean the temporary intermediate files:

#  Clean working directory to save only important files
ls -l * > list_run_directory
if [[ ${if_clean} = clean  &&  ${if_observer} != Yes ]]; then
  echo ' Clean working directory after GSI run'
  rm -f *Coeff.bin     # all CRTM coefficient files
  rm -f pe0*           # diag files on each processor
  rm -f obs_input.*    # observation middle files
  rm -f siganl sigf03  # background middle files
  rm -f fsize_*        # delete temperal file for bufr size
fi

The following block of the script runs only for if_observer=Yes, which runs GSI as an observation operator for EnKF and without doing minimization. The script first renames the previous diagnostics files and GSI analysis file by appending .ensmean to the filenames to avoid these files being overwritten by the new GSI run.

#################################################
# start to calculate diag files for each member
#################################################
#
if [ ${if_observer} = Yes ] ; then
  string=ges
  for type in $listall; do
    count=0
    if [[ -f diag_${type}_${string}.${ANAL_TIME} ]]; then
       mv diag_${type}_${string}.${ANAL_TIME} diag_${type}_${string}.ensmean
    fi
  done
  mv wrf_inout wrf_inout_ensmean

Next, the script generates the namelist for each ensemble member.

# Build the GSI namelist on-the-fly for each member
  nummiter=0
  if_read_obs_save='.false.'
  if_read_obs_skip='.true.'
. $GSI_NAMELIST

The rest of the script loops through the ensemble members to get the background ready, run GSI, and check the run status:

# Loop through each member
  loop="01"
  ensmem=1
  while [[ $ensmem -le $no_member ]];do

     rm pe0*

     print "\$ensmem is $ensmem"
     ensmemid=`printf %3.3i $ensmem`

# get new background for each member
     if [[ -f wrf_inout ]]; then
       rm wrf_inout
     fi

     BK_FILE=${BK_FILE_mem}${ensmemid}
     echo $BK_FILE
     ln -s $BK_FILE wrf_inout

#  run  GSI
     echo ' Run GSI with' ${bk_core} 'for member ', ${ensmemid}

     case $ARCH in
        'IBM_LSF')
           ${RUN_COMMAND} ./gsi.exe < gsiparm.anl > stdout_mem${ensmemid} 2>&1  ;;

        * )
           ${RUN_COMMAND} ./gsi.exe > stdout_mem${ensmemid} 2>&1 ;;
     esac

#  run time error check and save run time file status
     error=$?

     if [ ${error} -ne 0 ]; then
       echo "ERROR: ${GSI} crashed for member ${ensmemid} Exit status=${error}"
       exit ${error}
     fi

     ls -l * > list_run_directory_mem${ensmemid}

The following lines generate the diagnostics files for each member.

# generate diag files

     for type in $listall; do
           count=`ls pe*${type}_${loop}* | wc -l`
        if [[ $count -gt 0 ]]; then
           cat pe*${type}_${loop}* > diag_${type}_${string}.mem${ensmemid}
        fi
     done

The following section is to move on to the next ensemble member and run GSI.

# next member
     (( ensmem += 1 ))

  done

fi

If this point is reached, the GSI successfully finishes and exits with status “0”:

exit 0

GSI Analysis Result Files in Run Directory

Once the GSI run script is set up, it is ready to be submitted like any other batch job. When completed, GSI will create a number of files in the run directory. Below is an example of the files generated in the run directory from one of the GSI test case runs. This case was run to perform a regional GSI analysis with a WRF-ARW NetCDF background using conventional (prepbufr), radiance (AMSU-A, HIRS4, and MHS), and GPSRO data. The analysis time is 1200Z on 13 May 2017. Four processors were used. To make the run directory more readable, we turned on the clean option in the run script, which deleted all temporary intermediate files.

amsuabufr                      fort.206     hirs3bufrears
amsuabufrears                  fort.207     hirs4bufr
anavinfo                       fort.208     l2rwbufr
atmsbufr                       fort.209     larcglb
berror_stats                   fort.210     list_run_directory
convinfo                       fort.211     mhsbufr
diag_amsua_n15_anl.2017051312  fort.212     mhsbufrears
diag_amsua_n15_ges.2017051312  fort.213     omibufr
diag_amsua_n18_anl.2017051312  fort.214     ozinfo
diag_amsua_n18_ges.2017051312  fort.215     pcpbias_out
diag_amsua_n19_anl.2017051312  fort.217     pcpinfo
diag_amsua_n19_ges.2017051312  fort.218     prepbufr
diag_conv_anl.2017051312       fort.219     prepobs_prep.bufrtable
diag_conv_ges.2017051312       fort.220     radar_supobs_from_level2
diag_hirs4_n19_anl.2017051312  fort.221     satbias_angle
diag_hirs4_n19_ges.2017051312  fort.223     satbias_ang.out
diag_mhs_n18_anl.2017051312    fort.224     satbias_in
diag_mhs_n18_ges.2017051312    fort.225     satbias_out
diag_mhs_n19_anl.2017051312    fort.226     satbias_out.int
diag_mhs_n19_ges.2017051312    fort.227     satbias_pc_in
errtable                       fort.228     satbias_pc.out
fit_p1.2017051312              fort.229     satinfo
fit_q1.2017051312              fort.230     satwnd
fit_rad1.2017051312            fort.232     sbuvbufr
fit_t1.2017051312              fort.233     seviribufr
fit_w1.2017051312              fort.234     ssmirrbufr
fort.201                       gimgrbufr    stdout
fort.202                       gomebufr     stdout.anl.2017051312
fort.203                       gpsrobufr    wrfanl.2017051312
fort.204                       gsi.exe      wrf_inout
fort.205                       gsiparm.anl

It is important to know which files hold the GSI analysis results, standard output, and diagnostic information. We will introduce these files and their contents in detail in the following chapter. The following is a brief list of what these files contain:

  • stdout or stdout.anl.(time): standard text output file. stdout.anl.(time) is a link to stdout with the analysis time appended. This is the most commonly used file to check the GSI analysis processes and contains basic and important information about the analyses. We will explain the contents of the stdout file in Section 4.1 and users are encouraged to read this file in detail to become familiar with the order of GSI analysis processing.
  • wrf_inout or wrfanl.(time): analysis results if GSI completes successfully. It exists only if using WRF for the background. The wrfanl.(time) file is a link to wrf_inout with the analysis time appended. The format is the same as the background file.
  • diag_conv_anl.(time): binary diagnostic files for conventional and GPS RO observations at the final analysis step (analysis departure for each observation).
  • diag_conv_ges.(time): binary diagnostic files for conventional and GPS RO observations before the initial analysis step (background departure for each observation)
  • diag_(instrument_satellite)_anl: diagnostic files for satellite radiance observations at the final analysis step.
  • diag_(instrument_satellite)_ges: diagnostic files for satellite radiance observations before the initial analysis step.
  • gsiparm.anl: GSI namelist, generated by the run script.
  • fit_(variable).(time): links to fort.2?? with meaningful names (variable name plus analysis time). They are statistic results of observation departures from background and analysis results according to observation variables. Please see Section 4.5 for more details.
  • fort.220: output from the inner loop minimization (in pcgsoi.f90). Please see Section 4.6 for details.
  • anavinfo: info file to set up control, state, and background variables. Please see the Advanced GSI Users Guide for details.
  • *info (convinfo,satinfo, …): info files that control data usage. Please see Section [sec4.3] for details.
  • berror_stats and errtable: background error file (binary) and observation error file (text).
  • *bufr: observation BUFR files linked to the run directoryi. Please see Section 3.1 for details.
  • satbias_in: the input coefficients of bias correction for satellite radiance observations.
  • satbias_out: the output coefficients of bias correction for satellite radiance observations after the GSI run.
  • satbias_pc: the input coefficients of bias correction for passive satellite radiance observations.
  • list_run_directory : the complete list of files in the run directory before cleaning takes place. This is generated by the GSI run script.

The diag files, such as diag_(instrument_satellite)_anl.(time) and diag_conv_anl.(time), contain important information about the data used in the GSI, including observation departure from analysis results for each observation (O-A). Similarly, diag_conv_ges and diag_(instrumen_satellite)_ges.(time) include the observation innovation for each observation (O-B). These files can be very helpful in understanding the detailed impact of data on the analysis. A tool is provided to process these files, which is introduced in Appendix A.2.

There are many intermediate files in this directory while GSI is running or if the run crashes. The complete list of files in the directory (prior to cleaning) is saved in file list_run_directory. Some knowledge about the content of these files is very helpful for debugging if the GSI run crashes. Please check table [t37] for the meaning of these files. (Note: you may not see all the files in the list because different observational data are used. Also, the fixed files prepared for a GSI run, such as CRTM coefficient files, are not included.)

List of GSI intermediate files
File name Content
sigf03 This is a temporary file, holding binary format background files (typically sigf03, sigf06 and sigf09 if FGAT used). When you see this file, at the minimum, a background file was successfully read in.
siganl Analysis results in binary format. When this file exists, the analysis has finished.
pe????.(conv or instrument_satellite)_(outer loop) Diagnostic files for conventional and satellite radiance observations at each outer loop and each sub-domain (????=subdomain id)i.
obs_input.???? Observation scratch files (each file contains observations for one observation type within the whole analysis domain and time window. ????=observation type id in namelist).
pcpbias_out Output precipitation bias correction file.

[t37]

Introduction to Frequently Used GSI Namelist Options

The complete namelist options and their explanations are listed in Appendix A of the Advanced GSI Users Guide. For most GSI analysis applications, only a few namelist variables need to be changed. Here we introduce frequently used variables for regional analyses:

Set Up the Number of Outer and Inner Loops

To change the number of outer loops and the number of inner iterations in each outer loop, the following three variables in the namelist need to be modified:

  • miter: number of outer analysis loops.
  • niter(1): maximum iteration number of inner loop iterations for the 1st outer loop. The inner loop will stop when it reaches this maximum number, when it reaches the convergence threshold, or when it fails to converge.
  • niter(2): maximum iteration number of inner loop iterations for the 2nd outer loop.
  • If miter is larger than two, repeat niter with larger index.

Set Up the Analysis Variable for Moisture

There are two moisture analysis variable options. It is based on the following namelist variable:

qoption = 1 or 2:

  • If qoption=1, the moisture analysis variable is pseudo-relative humidity. The saturation specific humidity, qsatg, is computed from the guess and held constant during the inner loop. Thus, the relative humidity control variable can only change via changes in specific humidity, q.
  • If qoption=2, the moisture analysis variable is normalized relative humidity. This formulation allows relative humidity to change in the inner loop via changes to surface pressure, temperature, or specific humidity.

Set Up the Background File

The following four variables define which background field will be used in the GSI analyses:

  • regional: if true, perform a regional GSI run using either ARW or NMM inputs as the background. If false, perform a global GSI analysis. If either wrf_nmm_regional or wrf_mass_regional are true, it will be set to true.
  • wrf_nmm_regional: if true, the background comes from WRF-NMM. When using other background fields, set it to false.
  • wrf_mass_regional: if true, the background comes from WRF-ARW. When using other background fields, set it to false.
  • nems_nmmb_regional: if true, the background comes from NMMB. When using other background fields, set it to false.
  • netcdf: if true, WRF files are in NetCDF format, otherwise WRF files are in binary format. This option only works for a regional GSI analysis.

Set Up the Output of Diagnostic Files

The following variables tell the GSI to write out diagnostic results in certain loops:

  • write_diag(1): if true, write out diagnostic data in the beginning of the analysis, so that we can have information on observation \(-\) background (O-B) differences.
  • write_diag(2): if true, write out diagnostic data at the end of the 1st outer loop (before the 2nd outer loop starts).
  • write_diag(3): if true, write out diagnostic data at the end of the 2nd outer loop (after the analysis finishes if the outer loop number is two), so that we can have information on observation \(-\) analysis (O-A) differences.

Please check appendix A.2 for the tools to read the diagnostic files.

Set Up the GSI Recognized Observation Files

The following sets up the GSI recognized observation files for GSI observation ingest:

OBS_INPUT::
!  dfile          dtype       dplat     dsis                 dval    dthin dsfcalc
   prepbufr       ps          null      ps                   1.0     0     0
   prepbufr       t           null      t                    1.0     0     0
   prepbufr       q           null      q                    1.0     0     0
   prepbufr       pw          null      pw                   1.0     0     0
   satwndbufr     uv          null      uv                   1.0     0     0
   prepbufr       uv          null      uv                   1.0     0     0
   prepbufr       spd         null      spd                  1.0     0     0
   prepbufr       dw          null      dw                   1.0     0     0
   radarbufr      rw          null      rw                   1.0     0     0
   prepbufr       sst         null      sst                  1.0     0     0
   gpsrobufr      gps_ref     null      gps                  1.0     0     0
   ssmirrbufr     pcp_ssmi    dmsp      pcp_ssmi             1.0    -1     0
  • dfile: GSI recognized observation file name. The observation file contains observations used for a GSI analysis. This file can include several observation variables from different observation types. The file name listed by this parameter will be read in by GSI. This name can be changed as long as the name in the link from the BUFR/PrepBUFR file in the run scripts also changes correspondingly.
  • dtype: analysis variable name that GSI can read in. Please note this name should be consistent with that used in the GSI code.
  • dplat: sets up the observation platform for a certain observation, which will be read in from the file dfile.
  • dsis: sets up the data name (including both data type and platform name) used inside GSI.

Please see Section 4.3 for examples and explanations of these variables.

Set Up Observation Time Window

In the namelist section OBS_INPUT, use time_window_max to set the maximum half time window (hours) for all data types. In the convinfo file, you can use the column “twindow” to set the half time window for a certain data type (hours). For conventional observations, only observations within the smaller window of these two will be kept for further processing. For others, observations within time_window_max will be kept for further processing.

Set Up Data Thinning

  1. Radiance data thinning

Radiance data thinning is controlled through two GSI namelist variables in the section &OBS_INPUT. Below is an example:

&OBS_INPUT
   dmesh(1)=120.0,dmesh(2)=60.0,dmesh(3)=30,time_window_max=1.5,ext_sonde=.true.,
 /
OBS_INPUT::
!  dfile          dtype       dplat     dsis                 dval    dthin dsfcalc
   prepbufr       ps          null      ps                   1.0     0     0

   gpsrobufr      gps_ref     null      gps                  1.0     0     0
   ssmirrbufr     pcp_ssmi    dmsp      pcp_ssmi             1.0    -1     0
   tmirrbufr      pcp_tmi     trmm      pcp_tmi              1.0    -1     0

   hirs3bufr      hirs3       n17       hirs3_n17            6.0     1     0
   hirs4bufr      hirs4       metop-a   hirs4_metop-a        6.0     2     0

The two namelist variables that control the radiance data thinning are real array “dmesh” in the 1st line and the “dthin” values in the 6th column. The “dmesh” array sets mesh sizes for radiance thinning grids in kilometers, while “dthin” defines if the data type it represents needs to be thinned and which thinning grid (mesh size) to use. If the value of dthin is:

  • an integer less than or equal to zero, no thinning is needed
  • an integer larger than zero, this kind of radiance data will be thinned using the mesh size defined as dmesh (dthin).

The following section provides several thinning examples defined by the above sample &OBS_INPUT section:

  • Data type ps from prepbufr: no thinning because dthin=0
  • Data type gps_ref from gpsrobufr: no thinning because dthin=0
  • Data type pcp_ssmi from dmsp: no thinning because dthin(01)=-1
  • Data type hirs3 from NOAA-17: thinning in a 120 km grid because dthin=1 and dmesh(1)=120
  • Data type hirs4 from metop-a: thinning in a 60 km grid because dthin=2 and dmesh(2)=60
  1. Conventional data thinning

The conventional data can also be thinned. However, the setup of thinning is not in the namelist. To give users a complete picture of data thinning, conventional data thinning is briefly introduced here. There are three columns, ithin, rmesh, pmesh, in the convinfo file (more details on this file are in Section 4.3) to configure conventional data thinning:

  • ithin: 0 = no thinning; 1 = thinning with grid mesh decided by rmesh and pmesh
  • rmesh: horizontal thinning grid size in km
  • pmesh: vertical thinning grid size in mb; if 0, then use background vertical grid.

Set Up Background Error Factor

In the namelist section BKGERR, vs is used to set up the scale factor for vertical correlation length and hzscl is defined to set up scale factors for horizontal smoothing. The scale factors for the variance of each analysis variables are set in the anavinfo file. The typical values used in operations for regional and global background error covariance are given and picked based on the choice of background error covariance in the run scripts and sample anavinfo files

Single Observation Test

To do a single observation test, the following namelist option has to be set to true:

oneobtest=.true.

Then go to the namelist section SINGLEOB_TEST to set up the single observation location and variable to be tested, please see Section 4.2 for an example and details on the single observation test.