5. Applications for Regional and Global EnKF

In this chaper, the elements from the previous chapters will be applied to demonstrate how to run a regional and global case using the GSI observer and EnKF. These examples are intended to give users a clear idea of how to set up the GSI observer and EnKF for a particular application and properly check the run status and analysis results in order to determine if the run was successful. Note that the regional example focuses on WRF ARW, however WRF NMM and NMMB runs are similar, but require different background ensemble and namelist options. Similarly, the global example features a single global configuration (T254), however users may wish to use a different configuration, again requiring different background ensemble and namelist options.

It is assumed that the reader has successfully compiled GSI and EnKF on a local machine. For regional case studies, users should have the following data available:

  1. Ensemble for background

    • Ensemble files from WRF-ARW, WRF-NMM, NMM-B forecast files may be used. For this case, WRF-ARW ensemble members will be used, which are generated from the GFS ensemble, following the naming convention: wrfarw.mem00nn. Where nn corresponds to each ensemble member (“ensmean” appended for ensemble mean).
  2. Conventional, GPSRO and radiance data

  3. Fixed files

    • Fixed files are located in the comGSIv3.5_EnKFv1.1/fix directory

For global case studies, users should have the following data available:

  1. Ensemble for background

    • Ensemble files from GFS forecast files may be used. GFS ensemble members corresponding to various spectral resolutions may be used, following the naming convention: sfg_yyyymmddhh_fhrff_mem0nn. The “yyyymmddhh” corresponds to the initialization date, “ff” corresponds to the forecast hour, and “nn” corresponds to each ensemble member (ensmean appended for ensemble mean).
  2. Conventional data

  3. Fixed files

    • Fixed files are located in the comGSIv3.5_EnKFv1.1/fix/global directory

The following cases will give users an example of how to run the EnKF for a regional case with various data sources, as well as a simple global application case. Users are welcome to download these examples from the EnKF Users webpage (online case for release v1.1) or create a new background ensemble and obtain observation data from the above server.

The background ensemble files and observations used in the regional case studies are as follows:

  1. Background files: wrfarw.mem0001 - wrfarw.mem0020, wrfarw.ensmean

    • The horizontal grid spacing is 45-km with 51 vertical sigma levels.

      Background used for regional case study [ch5_fig1]

  2. Conventional, GPSRO and radiance data from 00 UTC 13 February 2014

    • Conventional file:
      • gdas1.t00z.prepbufr.nr
    • GPSRO file:
      • gdas1.t00z.gpsro.tm00.bufr_d
    • Radiance files:
      • gdas1.t00z.1bamua.tm00.bufr_d
      • gdas1.t00z.1bhrs4.tm00.bufr_d
      • gdas1.t00z.1bmhs.tm00.bufr_d
  3. Fixed files:

    • Fixed files are located in the GSI/EnKF community code package, under /comGSIv3.5_EnKFv1.1/fix
    • For observation control:
      • convinfo: conventional data (PrepBUFR, GPSRO) info file
      • ozinfo: ozone retrival info file
      • satinfo: satellite radiance info file
    • Adaptive radiance bias correction (although example files are provided in the /fix directory, it is better to obtain bias correction files valid for the proper date. For this case, GFS bias correction coefficients are provided with the case data):
      • satbias_in: satellite bias correction coefficient file
      • satbias_pc: satellite bias correction coefficient file for passive channels.

This case study was run on a linux cluster. GSI no longer (from GSI v3.2) requires byte- swapping to little endian format. BUFRLIB can automatically handle byte order issues.

For the regional ARW case, assume the following locations:

Path to the background ensemble files:

/scratch/casedata/arw_2014021300/bk/

the path to the observations:

/scratch/casedata/arw_2014021300/obs

and the GSI release version 3.5/EnKF version 1.1 is located at:

/scratch/comGSIv3.5_EnKFv1.1

For the global GFS case, assume locations as follows: Path to background ensemble files:

/scratch/casedata/enkf_glb_t254/bk

the path to the observations:

/scratch/casedata/enkf_glb_t254/obs

and the GSI release version 3.5/EnKF version 1.1 is located at:

/scratch/comGSIv3.5_EnKFv1.1

Running GSI Observer for Regional Applications

Run Script

With both GSI and EnKF compiled and the background ensemble files and observations acquired, the next step is to work with the gsi observer run script, run_gsi_regional.ksh. The location of this script is under comGSIv3.5_EnKFv1.1/run. This run script is the same as the one used for a GSI analysis run, with a few specific options selected in order to loop through all the ensemble members and generate the ensemble observation priors for each member and the ensemble mean. In addition to the GSI observer specific options, other user-specific modifications need to be made:

  • Set up batch queueing system
    To run GSI with multiple processors, a job queuing head must be added to the run_gsi_regional.ksh script. The set up of the job queue is dependent on the machine and the job control system. Refer to the GSI Users Guide, section 3.2.2, for more examples of the setup section of this script. The following example is setup to run on a Linux cluster with LSF:
    #BSUB -P ??????             # project code
    #BSUB -W 00:30                 # wall-clock time (hrs:mins)
    #BSUB -n 4                  # number of tasks in job
    #BSUB -R "span[ptile=16]"      # run 16 MPI tasks per node
    #BSUB -J gsi                   # job name
    #BSUB -o gsi.%J.out          # output file name in which %J is replaced by the job ID
    #BSUB -e gsi.%J.err          # error file name in which %J is replaced by the job ID
    #BSUB -q small               # queue
    
  • Set up number of processors and the job queue system used For this example, LINUX_LSF and 4 processors are used:

    GSIPROC=4
    ARCH='LINUX_LSF'
    
  • Set up the case data, analysis time, fix files, GSI executable, and CRTM coefficients:

    Set up analysis time:

    ANAL_TIME=2014021300
    

    Set up working directory, which will hold the analysis results (all ensemble members will be run in this directory). This directory must have the proper write permissions as well as enough space to hold the output.

    WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_2014021300
    

    Set path to the observation directory and the PrepBUFR file within the observation directory. All observations to be assimilated should reside in this directory.

    OBS_ROOT=/scratch/${user}/casedata/arw_2014021300/obs
    PREPBUFR=${OBS_ROOT}/gdas1.t00z.prepbufr.nr
    

    Set path to background ensemble files:

    BK_ROOT=/scratch/${user}/casedata/arw_2014021300/bk
    

    Set file for background ensemble mean:

    BK_FILE=${BK_ROOT}/wrfarw.ensmean
    

    Set the GSI system used for this case, including the paths of the fix files and the CRTM coefficents as well as the location of the GSI executable:

    CRTM_ROOT=/scratch/${user}/CRTM_REL-2.2.3
    GSI_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/
    FIX_ROOT=${GSI_ROOT}/fix
    GSI_EXE=${GSI_ROOT}/run/gsi.exe
    GSI_NAMELIST=${GSI_ROOT}/run/comgsi_namelist.sh
    
  • Set which background and background error file to use

    bk_core=ARW
    bkcv_option=NAM
    if_clean=clean
    
    This example uses the ARW NetCDF background; therefore bk_core is set to ARW. The regional background error covariance file is used in this case, as set by bkcv_option=NAM. Finally, the run scripts are set to clean the run directory to delete all temporary intermediate files.
  • Choose to run GSI observer, set up background ensemble information

    if_observer=Yes
    no_member=20
    BK_FILE_mem=${BK_ROOT}/wrfarw.mem
    
    The option if_observer=Yes is the switch that enables run_gsi_regional.ksh to run the GSI observer (rather than GSI analysis). In this example, 20 ensemble members are selected with the naming convention: wrfarw.memnnnn. Note that memnnnn which is associated with each ensemble member (nnnn), is not included and will be appended later in the script.
  • Link observations

    # Link to the prepbufr data
    ln -s ${PREPBUFR} ./prepbufr
    # Link to the radiance data
    ln -s ${OBS_ROOT}/gdas1.t00z.1bamua.tm00.bufr_d amsuabufr
    ln -s ${OBS_ROOT}/gdas1.t00z.1bhrs4.tm00.bufr_d hirs4bufr
    ln -s ${OBS_ROOT}/gdas1.t00z.1bmhs.tm00.bufr_d mhsbufr
    ln -s ${OBS_ROOT}/gdas1.t00z.gpsro.tm00.bufr_d gpsrobufr
    

Past the arch selection, environment variable checks, and creation of working directory, users will find the location where the observations are linked. For this case, we can see that the conventional PrepBUFR observations have been linked, as well as three different satellite radiance BUFR files (AMSU-A, HIRS4, and MHS) and a GPS RO BUFR file. These files will be linked to the working directory and separate observation innovation (diag) files will be generated for each observation.

In the run script, the proper anavinfo file is selected based on the core and background error covariance used for the case:

echo ' Use NAM background error covariance'
BERROR=${FIX_ROOT}/${BYTE_ORDER}/nam_nmmstat_na.gcv
OBERROR=${FIX_ROOT}/nam_errtable.r3dv
if [ ${bk_core} = NMM ] ; then
   ANAVINFO=${FIX_ROOT}/anavinfo_ndas_netcdf
fi
if [ ${bk_core} = ARW ] ; then
   ANAVINFO=${FIX_ROOT}/anavinfo_arw_netcdf
fi
if [ ${bk_core} = NMMB ] ; then
   ANAVINFO=${FIX_ROOT}/anavinfo_nems_nmmb
fi

The anavinfo file used for this case is anavinfo_arw_netcdf, because the background is NetCDF ARW and the regional background error covariance has been selected. It is important to check this file, located in the ./fix directory. The number of vertical levels in anavinfo_arw_netcdf must match those for the background. In this example, the case has 50 vertical levels with the default value in anavinfo_arw_netcdf is 40. Change these values (level column) accordingly:

met_guess::
!var     level    crtm_use    desc              orig_name
  ps        1      -1         surface_pressure     ps
  z         1      -1         geopotential_height  phis
  u        50       2         zonal_wind           u
  v        50       2         meridional_wind      v
  div      50      -1         zonal_wind           div
  vor      50      -1         meridional_wind      vor
  tv       50       2         virtual_temperature  tv
  q        50       2         specific_humidity    sphu
  oz       50       2         ozone                ozone
  cw       50      10         cloud_condensate     cw
... ...
control_vector::
!var     level  itracer as/tsfc_sdv  an_amp0   source  funcof
 sf       50      0       1.00        -1.0     state    u,v
 vp       50      0       1.00        -1.0     state    u,v
 ps        1      0       0.50        -1.0     state    prse
 t        50      0       0.70        -1.0     state    tv
 q        50      1       0.70        -1.0     state    q
 oz       50      1       0.50        -1.0     state    oz
 sst       1      0       1.00        -1.0     state    sst
 cw       50      1       1.00        -1.0     state    cw
 stl       1      0       1.00        -1.0     motley   sst
 sti       1      0       1.00        -1.0     motley   sst
::

Run GSI Observer and Check Run Status

Once the run script is set up properly for the case and the machine, GSI can be run through the run script. The following command will submit the job:

$ bsub < run_gsi_regional.ksh

While the job is running, move to the working directory and check the details. Given the following working directory setup:

WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_2014021300

Go to directory WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/ and check the run directory. A directory named gsidiag_2014021300 should have been created. This directory is the run directory for this GSI observer case study. The directory will be populated with many files:

amsua_n18.TauCoeff.bin
ssmi_f15.SpcCoeff.bin
amsua_n18.SpcCoeff.bin
ssmi_f15.TauCoeff.bin
imgr_g11.SpcCoeff.bin
imgr_g11.TauCoeff.bin

These files are CRTM coefficients that have been linked to this run directory through the GSI run script. Similar to running the GSI analysis, many other files are linked or copied to this run directory, such as:

gsiparm.anl: GSI namelist
prepbufr: PrepBUFR file for conventional observation
convinfo: data usage control for conventional data
satbias_in: satellite bias correction coefficient file
satinfo: data usage and channel control for satellite radiance data
berror_stats: background error file
errtable: observation error file

Additionally, for the GSI observer many files are generated as a result of the lread_obs_save and lread_obs_skip options in the namelist for writing/reading collective observation selection information:

obs_input.nnnn: During ensemble mean (save=True, skip=False), save all observation preprocessing to this file. Each file (nnnn) is for a different variable type
pennnn.obs_setup: When looping through ensemble members (save=False, skip=True), all observation preprocessing is skipped, obs_input.nnnn files are read and pennnn.obs_setup is written for each processor (nnnn) for all observations.

As the GSI observer is running for the ensemble mean as well as looping through each member, files are are generated for each ensemble member, as well as the ensemble mean.

pennnn.conv_01:
Files generated after O-B, generated for conventional
observations as well as one file per sensor (e.g. AMSU-A n18) These files are trimmed for resulting diag files and cleaned.
list_run_directory:
Directory listing after ensemble mean is run, before directory
is cleaned for running the ensemble members.
list_run_directory_memnnn: As as above, for each ensemble member (nnn).
stdout: Standard output file for ensemble mean.
stdout_memnnn: Standard output file for each ensemble member (nnn).

The presence of the standard output files in the directory suggest the GSI observer run scripts have successfully set up a run environment for the GSI observer, properly looping through the ensemble members, and the GSI executable is running on each ensemble member.

Once GSI has finished running, diag files should be generated for each observation type for each member as well as the ensemble mean:

diag_conv_ges.ensmean    diag_conv_ges.mem001
diag_conv_ges.mem002     diag_conv_ges.mem003
diag_conv_ges.mem004     diag_conv_ges.mem005
... ...                  diag_conv_ges.mem020

diag_amsua_metop-a_ges.ensmean diag_amsua_metop-a_ges.mem001
diag_amsua_metop-a_ges.mem002  diag_amsua_metop-a_ges.mem003
diag_amsua_metop-a_ges.mem004  diag_amsua_metop-a_ges.mem005
 ... ...                       diag_amsua_metop-a_ges.mem020

diag_amsua_n18_ges.ensmean      diag_amsua_n18_ges.mem001
diag_amsua_n18_ges.mem002       diag_amsua_n18_ges.mem003
diag_amsua_n18_ges.mem004       diag_amsua_n18_ges.mem005
......
diag_hirs4_n19_ges.ensmean          diag_amsua_n15_ges.mem001
diag_hirs4_n19_ges.mem002           diag_amsua_n15_ges.mem003
diag_hirs4_n19_ges.mem004           diag_amsua_n15_ges.mem005
... ...                             diag_amsua_n15_ges.mem020

Check for Successful GSI Completion

The presence of the diag files and standard out files for the ensemble mean and each member indicates that the GSI observer has run without crashing, but does not necessary indicate a successful analysis. It is important to check the stdout files in the run directory to make sure the GSI observer completed each step without any obvious problems. The following are several important areas of the standard out file to check:

  1. Read in the anavinfo and namelist

     gsi_metguess_mod*init_:  2D-MET STATE VARIABLES:
     ps
     z
     gsi_metguess_mod*init_:  3D-MET STATE VARIABLES:
     u
     v
     div
     vor
     tv
     q
     oz
     cw
    .....
    
     &SETUP
     GENCODE =   78.0000000000000     ,
     FACTQMIN        =  0.000000000000000E+000,
     FACTQMAX        =  0.000000000000000E+000,
    .....
    
  2. Read in the background field

    The following lines in stdout immediately following the namelist section, indicate that GSI is reading the background fields. Checking that the range of the max and min values will indicate if particular background fields are normal.

      dh1  =            3
      iy,m,d,h,m,s=        2014           2          13           0           0
               0
      dh1  =            3
     rmse_var = SMOIS
     ndim1 =            3
     ordering = XYZ
     staggering =  N/A
     start_index =            1           1           1           0
     end_index =          129          70           4           0
     WrfType =          104
     ierr  =            0
    ... ...
      rmse_var = T ndim1=           3
      WrfType =          104  WRF_REAL=         104 ierr  =            0
      ordering = XYZ staggering =  N/A
      start_index =            1           1           1           0  end_index =
             129          70          50           0
      k,max,min,mid T=           1   313.3391       237.9947       273.2565
      k,max,min,mid T=           2   313.4587       238.4884       273.5590
      k,max,min,mid T=           3   313.6850       239.3906       274.1599
      k,max,min,mid T=           4   314.0594       241.0577       275.0899
      k,max,min,mid T=           5   314.6113       243.6186       276.3394
      k,max,min,mid T=           6   315.3213       247.0504       277.7711
    ... ...
    
  3. Read in observational data

    The stdout (ensemble mean) and stdout_memnnn (ensemble members) contains distinct differences for the reading in the observational data controlled by the lread_obs_save and lread_obs_skip options in the namelist
    a. Ensemble mean
     read_obs_check: bufr file date is   2014021300 prepbufr q
     read_obs_check: bufr file date is   2014021300 prepbufr ps
     read_obs_check: bufr file uv                   not available satwndbufr
     read_obs_check: bufr file rw                   not available radarbufr
     read_obs_check: bufr file pcp_tmi   trmm       not available tmirrbufr
     read_obs_check: bufr file hirs3     n17        not available hirs3bufr
    ... ...
      number of extra processors            1
     READ_OBS:  read   33 mhs        mhs_n18              using ntasks=   2   0   1795      0
     READ_OBS:  read   34 mhs        mhs_n19              using ntasks=   2   2    584      0
     READ_OBS:  read   35 mhs        mhs_metop-a          using ntasks=   2   0    174      0
     READ_OBS:  read   36 mhs        mhs_metop-b          using ntasks=   2   2      2      0
    ... ...
     READ_BUFRTOVS         : file=mhsbufr         type=mhs        sis=mhs_n19              nread=         0
     ithin= 2 rmesh=  60.000000 isfcalc= 0 ndata=         0 ntask=  2
    ... ...
     READ_PREPBUFR         : file=prepbufr        type=uv         sis=uv                   nread=     52122
     ithin= 0 rmesh= 120.000000 isfcalc= 0 ndata=     50772 ntask=  1
    ... ...
     READ_OBS:  write collective obs selection info to obs_input.common
    
    For the ensemble mean, the same procedure for observation processing as the GSI analysis is followed (read_obs_check, READ_OBS, READ_BUFRTOVS, READ_GPS, READ_PREPBUFR …). Finally, it is indicated that the collective observations selection information is written out for use with the ensemble members.
    b. Ensemble members

    For the stdout_memnnn, the observation processing steps are skipped and observation information is read in from the ensemble mean:

    OBSERVER_SET: read collective obs selection info from obs_input.common
    

    Finally, for both the ensemble mean and members, the following lines will appear:

    OBS_PARA: ps                        2038      2538      5499      6600
    OBS_PARA: t                         3271      4431      7459      8753
    OBS_PARA: q                         2823      3917      6110      7187
    OBS_PARA: pw                         261       217       226       180
    OBS_PARA: uv                        3333      5207      9038      8902
    OBS_PARA: gps_ref                    597      1001       210      1085
    OBS_PARA: hirs4     metop-a            0         0         0       174
    OBS_PARA: amsua     n15              542       540         0         0
    OBS_PARA: amsua     n18              606       672         0         0
    OBS_PARA: amsua     metop-a            0         0         0       151
    OBS_PARA: mhs       n18              730       828         0         0
    OBS_PARA: mhs       metop-a            0         0         0       201
    
    This table is important to check if the observations have been read in, which types of observations have been read in, and the distribution of observations in each subdomain.
  4. Indication that the GSI observer has successfully run:

     observer_final: successfully finalized
     glbsoi: complete
    [000]gsisub(): : complete.
    
    
         ENDING DATE-TIME    JUL 14,2016  23:29:19.825  196  THU   2457584
         PROGRAM GSI_ANL HAS ENDED.
    * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
    

After looking over each section of the standard output files, it can be concluded that the GSI observer ran without issues. Note that because the outer loop was set to 0 (miter=0) for the GSI observer, no minimization occurred and therefore no analysis results were produced.

Running EnKF for Regional Applications

Run Script

Once the GSI observer has been run successfully, the next step is to setup the EnKF run script, run_enkf_wrf.ksh. This script is located under comGSIv3.5_EnKFv1.1/run. This run script uses the diag files generated by the GSI observer script as observation input, and generates the EnKF analysis. Similar to the GSI observer script, several user-specific modifications need to be made:

  • Set up batch queueing system

    #BSUB -P ????             # project code
    #BSUB -W 00:30                 # wall-clock time (hrs:mins)
    #BSUB -n 32                  # number of tasks in job
    #BSUB -R "span[ptile=16]"      # run 16 MPI tasks per node
    #BSUB -J gsi                   # job name
    #BSUB -o gsi.%J.out          # output file name in which %J is replaced by the job ID
    #BSUB -e gsi.%J.err          # error file name in which %J is replaced by the job ID
    #BSUB -q small               # queue
    
  • Set up number of processors and the job queue system used:

    GSIPROC=32
    ARCH='LINUX_LSF'
    
    One difference from the GSI observer script is that the number of processors used should be greater than at least the number of ensemble members for running the EnKF. In this case we have 20 ensemble members and have requested 32 cores.
  • Set up the analysis time, fixed files, EnKF executable

    ANAL_TIME=2014021300
    WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/enkf_2014021300
    diag_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_2014021300
    BK_ROOT=/scratch/${user}/casedata/arw_2014021300/bk
    BK_FILE=${BK_ROOT}/wrfarw.ensmean
    GSI_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1
    FIX_ROOT=${GSI_ROOT}/fix
    ENKF_EXE=${GSI_ROOT}/src/main/enkf/wrf_enkf
    CRTM_ROOT=/scratch/${user}/CRTM_2.2.3/
    ENKF_NAMELIST=${GSI_ROOT}/run/enkf_wrf_namelist.sh
    
  • Set background file and location of the diag files (from GSI observer)

    diag_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_2014021300
    BK_FILE=${BK_ROOT}/wrfarw.ensmean
    
    Two modifications to note for the EnKF are the diag_ROOT and BK_FILE. The diag_ROOT points to the working directory where the GSI observer was run. This directory contains diag* files which will be linked to the EnKF working directory. BK_FILE points to the background ensemble mean.
  • Select ensemble parameters

    NMEM_ENKF=20
    BK_FILE_mem=${BK_ROOT}/wrfarw
    NLONS=129
    NLATS=70
    NLEVS=50
    IF_ARW=.true.
    IF_NMM=.false.
    
    The next section sets information about the ensemble, including the number of members, background ensemble members, domain specifications, dynamical core, and list of observations to be used in the EnKF. Note that the ensemble members (BK_FILE_mem) do not include the 3-digit member number. This will be appended to the name specified for this field later in the script.
  • Select observations

    list="conv amsua_n18 amsua_metop-a"
    

The previous line is contained with the ensemble parameter section. For the observations to be assimilated, be aware that only observations/platforms that have been run through the GSI observer (and therefore a diag* file exists) are valid.

Run EnKF and Check Run Status

Once the run script is set up properly for the case and the machine, EnKF can be run through the run script. The following command will submit the job:

$ bsub < run_enkf_wrf.ksh

While the job is running, move to the working directory and check the details. Given the following working directory setup:

WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/enkf_2014021300

Go to directory /scratch/${user}/comGSIv3.5_EnKFv1.1/run and check the run directory. A directory named enkf_2014021300 should have been created. This directory is the run directory for this EnKF case study. The directory will be populated with many links:

diag_conv_ges.ensmean  diag_conv_ges.mem007  diag_conv_ges.mem014
diag_conv_ges.mem001   diag_conv_ges.mem008  diag_conv_ges.mem015
diag_conv_ges.mem002   diag_conv_ges.mem009  diag_conv_ges.mem016
diag_conv_ges.mem003   diag_conv_ges.mem010  diag_conv_ges.mem017
diag_conv_ges.mem004   diag_conv_ges.mem011  diag_conv_ges.mem018
diag_conv_ges.mem005   diag_conv_ges.mem012  diag_conv_ges.mem019
diag_conv_ges.mem006   diag_conv_ges.mem013  diag_conv_ges.mem020

diag_amsua_n18_ges.ensmean  diag_amsua_n18_ges.mem007  diag_amsua_n18_ges.mem014
diag_amsua_n18_ges.mem001   diag_amsua_n18_ges.mem008  diag_amsua_n18_ges.mem015
diag_amsua_n18_ges.mem002   diag_amsua_n18_ges.mem009  diag_amsua_n18_ges.mem016
diag_amsua_n18_ges.mem003   diag_amsua_n18_ges.mem010  diag_amsua_n18_ges.mem017
diag_amsua_n18_ges.mem004   diag_amsua_n18_ges.mem011  diag_amsua_n18_ges.mem018
diag_amsua_n18_ges.mem005   diag_amsua_n18_ges.mem012  diag_amsua_n18_ges.mem019
diag_amsua_n18_ges.mem006   diag_amsua_n18_ges.mem013  diag_amsua_n18_ges.mem020

diag_hirs4_metop-a_ges.ensmean  diag_hirs4_metop-a_ges.mem007  diag_hirs4_metop-a_ges.mem014
diag_hirs4_metop-a_ges.mem001   diag_hirs4_metop-a_ges.mem008  diag_hirs4_metop-a_ges.mem015
diag_hirs4_metop-a_ges.mem002   diag_hirs4_metop-a_ges.mem009  diag_hirs4_metop-a_ges.mem016
diag_hirs4_metop-a_ges.mem003   diag_hirs4_metop-a_ges.mem010  diag_hirs4_metop-a_ges.mem017
diag_hirs4_metop-a_ges.mem004   diag_hirs4_metop-a_ges.mem011  diag_hirs4_metop-a_ges.mem018
diag_hirs4_metop-a_ges.mem005   diag_hirs4_metop-a_ges.mem012  diag_hirs4_metop-a_ges.mem019
diag_hirs4_metop-a_ges.mem006   diag_hirs4_metop-a_ges.mem013  diag_hirs4_metop-a_ges.mem020

The diag files are linked from the GSI observer working directory. These links are specified in the run script. Note that ozone diag files (gome, omi, sbuv) will appear in the directory without links (zero length files) if ozone diags are not present from the GSI observer.

The first guess files are also linked into the directory from the run script, pointing to the path of the ensemble mean and ensemble members designated in the setup section.

firstguess.ensmean  firstguess.mem007  firstguess.mem014
firstguess.mem001   firstguess.mem008  firstguess.mem015
firstguess.mem002   firstguess.mem009  firstguess.mem016
firstguess.mem003   firstguess.mem010  firstguess.mem017
firstguess.mem004   firstguess.mem011  firstguess.mem018
firstguess.mem005   firstguess.mem012  firstguess.mem019
firstguess.mem006   firstguess.mem013  firstguess.mem020

Similar to running the GSI observer, many other static files are linked or copied to this run directory, such as:

enkf.nml: EnKF namelist
convinfo: data usage control for conventional data
satbias_in: satellite bias correction coefficient file
satbias_pc: satellite bias correction file for passive channels
satinfo: data usage and channel control for satellite radiance data
ozinfo: data usage control from ozone data
covinflate.dat: Three-dimensional multiplicative inflation factor fields
stdout: EnKF standard output file

The presence of the standard output file in the directory suggest the EnKF run script has successfully set up a run environment for the EnKF, properly linking the first guess and diag files from the GSI observer, and the EnKF executable is running. Once EnKF has finished running, analysis files should be generated for each member as well as the ensemble mean:

analysis.ensmean  analysis.mem006  analysis.mem012  analysis.mem018
analysis.mem001   analysis.mem007  analysis.mem013  analysis.mem019
analysis.mem002   analysis.mem008  analysis.mem014  analysis.mem020
analysis.mem003   analysis.mem009  analysis.mem015
analysis.mem004   analysis.mem010  analysis.mem016
analysis.mem005   analysis.mem011  analysis.mem017

Check for Successful EnKF Completion

The presence of the analysis files for the ensemble mean and each member as well as the standard out file indicates that the EnKF has run without crashing, but does not necessary indicate a successful analysis. It is important to check the stdout file in the run directory to make sure the EnKF completed each step without any obvious problems. The following are several important areas of the standard out file to check:

  1. Check namelist has been properly read in and configuration is correct:

     namelist parameters:
     --------------------
     &NAM_ENKF
     DATEIN  =  2014021300,
     DATAPATH        = ./
     IASSIM_ORDER    =           0,
     COVINFLATEMAX   =   100.0000    ,
     COVINFLATEMIN   =   1.000000    ,
     DETERMINISTIC   = T,
    ... ...
    
  2. Check number of ensemble members, as well as the actual analysis variables and the background type. The maximum and minimum values for surface pressure are printed for a sanity check:

             20  members
    number of background forecast times to be updated =            1
    first-guess forecast hours for analysis = 06
              5 3d vars to update
    total of         251  2d grids will be updated (including ps)
    using multiplicative inflation based on Pa/Pb
    Vars in Rad-Jacobian (dims)
    --------------------------
    sst                            0
    enkf_main:  number of threads            1
    Updating U, V, T, QVAPOR, PH, and MU for WRF-ARW...
    Surface pressure (spressmn) min/max range:   678.673339843750
      1032.22473144531
    
  3. Statistics of the ensemble prior
    Following many lines describing the bias correction coefficients, the inventory of the observation number, observation types from the input diag* files, and the time spent reading in each observation type, statistics of the ensemble priors fit to all observations are provided for each region (NH, SH, TR):
     innovation statistics for prior:
     conventional obs
     region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
    NH    all ps  14083 -0.591E-02  0.860E+00  0.814E+00  0.601E+00  0.550E+00
    TR    all ps     50 -0.629E-01  0.422E+00  0.661E+00  0.367E+00  0.550E+00
    NH     all t   7766 -0.228E+00  0.163E+01  0.128E+01  0.480E+00  0.119E+01
    TR     all t    138 -0.130E+00  0.118E+01  0.119E+01  0.234E+00  0.117E+01
    NH    all uv  21680  0.136E-01  0.295E+01  0.276E+01  0.140E+01  0.237E+01
    TR    all uv    302 -0.379E+00  0.251E+01  0.268E+01  0.124E+01  0.238E+01
    NH     all q   2553 -0.323E-01  0.146E+00  0.157E+00  0.868E-01  0.131E+00
    TR     all q     68 -0.721E-01  0.202E+00  0.164E+00  0.582E-01  0.154E+00
     satellite brightness temp
     instrument, channel #, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
               amsua_n18   1   241 -0.199E+00  0.194E+01  0.306E+01  0.176E+01  0.250E+01
               amsua_n18   2   248 -0.194E+00  0.145E+01  0.234E+01  0.806E+00  0.220E+01
               amsua_n18   3   248 -0.620E+00  0.119E+01  0.206E+01  0.497E+00  0.200E+01
               amsua_n18   4   248 -0.189E+00  0.398E+00  0.568E+00  0.140E+00  0.550E+00
               amsua_n18   5   248  0.188E-01  0.337E+00  0.315E+00  0.966E-01  0.300E+00
               amsua_n18   6  1203 -0.396E-01  0.198E+00  0.238E+00  0.610E-01  0.230E+00
               amsua_n18   7  1225 -0.215E+00  0.324E+00  0.235E+00  0.480E-01  0.230E+00
               amsua_n18   8  1155 -0.235E+00  0.369E+00  0.259E+00  0.675E-01  0.250E+00
               amsua_n18  10  1211  0.408E+00  0.491E+00  0.353E+00  0.429E-01  0.350E+00
               amsua_n18  11    12  0.720E+00  0.723E+00  0.404E+00  0.567E-01  0.400E+00
               amsua_n18  15   245 -0.311E+00  0.198E+01  0.380E+01  0.148E+01  0.350E+01
           hirs4_metop-a   3     1  0.115E+01  0.115E+01  0.534E+00  0.631E-01  0.530E+00
           hirs4_metop-a   4    42  0.260E+00  0.521E+00  0.403E+00  0.488E-01  0.400E+00
           hirs4_metop-a   5     7 -0.534E+00  0.558E+00  0.367E+00  0.696E-01  0.360E+00
           hirs4_metop-a   6     7 -0.748E+00  0.782E+00  0.473E+00  0.109E+00  0.460E+00
           hirs4_metop-a   7     9 -0.557E+00  0.815E+00  0.580E+00  0.108E+00  0.570E+00
           hirs4_metop-a   8     9  0.374E+00  0.675E+00  0.100E+01  0.208E-01  0.100E+01
           hirs4_metop-a  10     9  0.199E+00  0.581E+00  0.612E+00  0.120E+00  0.600E+00
           hirs4_metop-a  11     9 -0.380E-01  0.718E+00  0.143E+01  0.770E+00  0.120E+01
           hirs4_metop-a  12    34 -0.988E+00  0.179E+01  0.213E+01  0.141E+01  0.160E+01
           hirs4_metop-a  13     9  0.694E-01  0.330E+00  0.382E+00  0.116E+00  0.364E+00
           hirs4_metop-a  14     9 -0.369E-01  0.257E+00  0.294E+00  0.137E+00  0.260E+00
           hirs4_metop-a  15     9  0.352E-01  0.289E+00  0.281E+00  0.107E+00  0.260E+00
    
    This table should be checked in order to determine if the inflation is appropriate. The goal is to make the total ensemble spreads of priors (\(\sqrt{S+R}\) match the innovations (innov std) as much as possible. We can see in certain regions, particularly in the NH the differences are noteable. These differences are associated with higher observation error (\(\sqrt{R}\)). Users should consider tuning of inflation and localization, which is typically determined using cases with multiple assimilation cycles. Refer to section 4.2 of this Users Guide for more information on tuning.
  4. Domain and observation partition:

    npts =         9030
    min/max number of points per proc =          243         317
    time to do model space decomp =   1.529277069494128E-003
    nobstot =        53078
    min/max number of obs per proc =         1658        1659
    time to do ob space decomp =   3.935033455491066E-004
    sending out observation prior ensemble perts from root ...
    nobstot*nanals               1061560
    npts*ndim               2266530
    ... took   1.730861375108361E-002  secs
    time in load_balance =  7.929413160309196E-002 on proc           0
    

    The analysis variables and the observations are distributed to different processors. We can see in this case that the min/max number of points per processor are 243 and 317, respectively. Similarly, we can see that the min/max number of observations per processor are 1658 and 1659, respectively, indicating that the observations are well dispersed among the processors.

     READGRIDDATA_ARW: U           1  -20.55222       20.72632
     READGRIDDATA_ARW: U           2  -20.87878       21.20316
     READGRIDDATA_ARW: U           3  -21.61046       22.36177
     READGRIDDATA_ARW: U           4  -22.62561       24.08465
     READGRIDDATA_ARW: U           5  -24.07165       26.47827
     READGRIDDATA_ARW: U           6  -25.89237       32.81841
     READGRIDDATA_ARW: U           7  -27.10455       35.81213
    ... ...
    

    Additionally, check the minimum and maximum values of the fields at each vertical level as a quick sanity check.

Diagnose EnKF Analysis Results

At the bottom of the standard output file, there are several output statistics and tables that are helpful for users to diagnose the quality of the EnKF analysis.
  1. Spread inflation of the analysis ensemble

    global ps prior std. dev min/max =    21.52301       172.3270
    NH mean ps prior standard deviation =    57.82042
    NH mean ps posterior standard deviation (before inflation)=    29.20955
    NH mean ps posterior standard deviation (after inflation) =    54.58403
    NH mean ps inflation =    2.421974
    TR mean ps prior standard deviation =    31.09247
    TR mean ps posterior standard deviation (before inflation)=    28.31872
    TR mean ps posterior standard deviation (after inflation) =    30.79383
    TR mean ps inflation =    1.106122
    time in inflate_ens =  1.366839744150639E-002 on proc           0
    
    This section is important for checking whether the values of the inflation are reasonable by looking at a summary of the maximum and minimum values. Also, viewing the regional averaged statistics before and after inflation. For this case we can see both the NH and TR show larger standard deviation after the inflation, which is more consistent with the prior standard deviation.
  2. Spread inflation of the analysis ensemble

     innovation statistics for posterior:
     conventional obs
     region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
    NH    all ps  14083  0.503E-03  0.757E+00  0.569E+00  0.146E+00  0.550E+00
    TR    all ps     50 -0.562E-01  0.398E+00  0.585E+00  0.198E+00  0.550E+00
    NH     all t   7766 -0.780E-01  0.145E+01  0.122E+01  0.263E+00  0.119E+01
    TR     all t    138 -0.107E+00  0.115E+01  0.118E+01  0.199E+00  0.117E+01
    NH    all uv  21680  0.406E-01  0.267E+01  0.246E+01  0.655E+00  0.237E+01
    TR    all uv    302 -0.259E+00  0.236E+01  0.246E+01  0.636E+00  0.238E+01
    NH     all q   2553 -0.174E-01  0.114E+00  0.138E+00  0.444E-01  0.131E+00
    TR     all q     68 -0.279E-01  0.168E+00  0.159E+00  0.413E-01  0.154E+00
     satellite brightness temp
     instrument, channel #, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
               amsua_n18   1   241  0.614E+02  0.915E+02  0.263E+01  0.823E+00  0.250E+01
               amsua_n18   2   248 -0.132E+03  0.171E+03  0.223E+01  0.376E+00  0.220E+01
               amsua_n18   3   248 -0.303E+02  0.398E+02  0.201E+01  0.244E+00  0.200E+01
               amsua_n18   4   248 -0.231E+00  0.580E+01  0.555E+00  0.776E-01  0.550E+00
               amsua_n18   5   248 -0.788E+00  0.199E+01  0.305E+00  0.575E-01  0.300E+00
               amsua_n18   6  1203 -0.138E+01  0.158E+01  0.234E+00  0.423E-01  0.230E+00
               amsua_n18   7  1225 -0.273E+01  0.287E+01  0.232E+00  0.322E-01  0.230E+00
               amsua_n18   8  1155 -0.131E+01  0.186E+01  0.254E+00  0.437E-01  0.250E+00
               amsua_n18  10  1211 -0.179E+01  0.196E+01  0.352E+00  0.374E-01  0.350E+00
               amsua_n18  11    12 -0.130E+01  0.133E+01  0.404E+00  0.537E-01  0.400E+00
               amsua_n18  15   245 -0.163E+03  0.198E+03  0.357E+01  0.693E+00  0.350E+01
           hirs4_metop-a   3     1  0.118E+01  0.118E+01  0.533E+00  0.558E-01  0.530E+00
           hirs4_metop-a   4    42  0.286E+00  0.542E+00  0.403E+00  0.455E-01  0.400E+00
           hirs4_metop-a   5     7 -0.451E+00  0.487E+00  0.364E+00  0.562E-01  0.360E+00
           hirs4_metop-a   6     7 -0.724E+00  0.755E+00  0.468E+00  0.852E-01  0.460E+00
           hirs4_metop-a   7     9 -0.572E+00  0.817E+00  0.577E+00  0.871E-01  0.570E+00
           hirs4_metop-a   8     9  0.122E+01  0.136E+01  0.100E+01  0.205E-01  0.100E+01
           hirs4_metop-a  10     9  0.855E+00  0.100E+01  0.609E+00  0.104E+00  0.600E+00
           hirs4_metop-a  11     9 -0.236E-01  0.543E+00  0.130E+01  0.507E+00  0.120E+01
           hirs4_metop-a  12    34 -0.738E+00  0.153E+01  0.178E+01  0.790E+00  0.160E+01
           hirs4_metop-a  13     9  0.121E+01  0.128E+01  0.375E+00  0.905E-01  0.364E+00
           hirs4_metop-a  14     9  0.776E-01  0.256E+00  0.281E+00  0.106E+00  0.260E+00
           hirs4_metop-a  15     9  0.281E+00  0.384E+00  0.273E+00  0.838E-01  0.260E+00
    

    After the EnKF analysis, it is important to check the innovation statistics, much like the prior fit to all observation.

Running GSI Observer for Global Applications

Run Script

With both GSI and EnKF compiled (note that a separate global executable must be generated for EnKF) and the background ensemble files and observations acquired, the next step is to work with the gsi observer run script, run_gsi_global.ksh. The location of this script is under comGSIv3.5_EnKFv1.1/run. This run script is the same as the one used for a GSI global analysis run, with a few specific options selected in order to loop through all the ensemble members and generate the ensemble observation priors for each member and the ensemble mean. For the global examples, users must choose a case that corresponds to the spectral resolution of the background ensemble. For this example, we will be using T254. In addition to the GSI observer specific options, other user-specific modifications need to be made:

  • Set up batch queueing system
    To run GSI with multiple processors, a job queuing head must be added to the run_gsi_global.ksh script. The set up of the job queue is dependent on the machine and the job control system. Refer to the GSI Users Guide, section 3.2.2, for more examples of the setup section of this script. The following example is setup to run on a Linux cluster with LSF:
    # LSF batch script to run an MPI application
    #BSUB -P ???????             # project code
    #BSUB -W 00:30                 # wall-clock time (hrs:mins)
    #BSUB -n 24                   # number of tasks in job
    #BSUB -R "span[ptile=16]"      # run 16 MPI tasks per node
    #BSUB -J gsi                   # job name
    #BSUB -o gsi.%J.out          # output file name in which %J is replaced by the job ID
    #BSUB -e gsi.%J.err          # error file name in which %J is replaced by the job ID
    #BSUB -q small               # queue
    
  • Set up number of processors and the job queue system used. For this example, LINUX_LSF and 24 processors are used:

    GSIPROC=24
    ARCH='LINUX_LSF'
    
  • Set up the case data, analysis time, fix files, GSI executable, and CRTM coefficients:
    Set up analysis time and select global case:
    ANAL_TIME=2014040512
    GFSCASE=enkf_glb_t254
    

    Set up working directory, which will hold the analysis results (all ensemble members will be run in this directory). This directory must have the proper write permissions as well as enough space to hold the output.

    WORK_ROOT=/scratch/${user}/comGSIv3.5-EnKFv1.1/run/gsidiag_${GFS_CASE}
    

    Set path to background ensemble files:

    BK_ROOT=/scratch/${user}/casedata/enkf_glb_t254/bk
    

    Set path to the observation directory and the PrepBUFR file within the observation directory. All observations to be assimilated should reside in this directory.

    OBS_ROOT=/scratch/${user}/casedata/enkf_glb_t254/obs
    PREPBUFR=${OBS_ROOT}/gdas1.t12z.prepbufr.nr
    

    Set the GSI system used for this case, including the paths of the fix files and the CRTM coefficients as well as the location of the GSI executable:

    CRTM_ROOT=/scratch/${user}/CRTM_REL-2.2.3
    GSI_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/
    FIX_ROOT=${GSI_ROOT}/fix
    GSI_EXE=${GSI_ROOT}/run/gsi.exe
    GSI_NAMELIST=${GSI_ROOT}/run/comgsi_namelist_gfs.sh
    
    Note that the GSI namelist generation script used for the global case (comgsi_namelist_gfs.sh) is different from the one used for the regional example.
  • Choose to run GSI observer, set up background ensemble information

    if_observer=Yes  # Yes ,or, No -- case sensitive!!!
    no_member=10
    BK_FILE_mem=${BK_ROOT}/sfg_2014040506
    
    The option if_observer=Yes is the switch that enables run_gsi_global.ksh to run the GSI observer (rather than GSI analysis). In this example, 10 ensemble members are selected with the naming convention: sfg_${GUESS_DATE}_memnnnn . Note that _memnnnn which is assocated with each ensemble member (nnnn), is not included and will be appended later in the script.
  • Set the JCAP resolution for the case:

    if [[ "$GFSCASE" = "T62" ]]; then
      JCAP=62
      JCAP_B=62
    elif [[ "$GFSCASE" = "T126" ]]; then
      JCAP=126
      JCAP_B=126
    elif [[ "$GFSCASE" = "enkf_glb_t254" ]]; then
      JCAP=254
      JCAP_B=254
    elif [[ "$GFSCASE" = "T254" ]]; then
      JCAP=254
      JCAP_B=574
    elif [[ "$GFSCASE" = "T574" ]]; then
      JCAP=574
      JCAP_B=1534
    else
       echo "INVALID case = $GFSCASE"
       exit
    fi
      LEVS=64
    
    Note that this selection is filled based on selection of the GFSCASE. This example used the enkf_glb_t254 case. All cases use 64 levels.
    Further in the run script run_gsi_global.ksh, the resolution parameters are set based on the requested case (resolution). For this case, we are using JCAP=254:
    elif [[ "$JCAP" = "254" ]]; then
       LONA=512
       LATA=256
       DELTIM=1200
       resol=2
    
  • Link satellite bias correction coefficients, background ensemble files, and observations:

    if [[ "$GFSCASE" = "enkf_glb_t254" ]]; then
      cp $OBS_ROOT/gdas1.t12z.abias      ./satbias_in
      cp $OBS_ROOT/gdas1.t12z.satang     ./satbias_angle
    
      cp $BK_ROOT/sfcanl_2014040506_fhr03_ensmean  ./sfcf03
      cp $BK_ROOT/sfcanl_2014040506_fhr06_ensmean  ./sfcf06
      cp $BK_ROOT/sfcanl_2014040506_fhr06_ensmean  ./sfcf09
    
      cp $BK_ROOT/sfg_2014040506_fhr03_mem001    ./sigf03
      cp $BK_ROOT/sfg_2014040506_fhr06_mem001    ./sigf06
      cp $BK_ROOT/sfg_2014040506_fhr09_mem001    ./sigf09
    ... ...
    # Link to the prepbufr data
    ln -s ${PREPBUFR} ./prepbufr
    
    # Link to the other observation data
    
    if [[ "$GFSCASE" = "enkf_glb_t254" ]]; then
      obsfile_amua=gdas1.t12z.1bamua.tm00.bufr_d
      obsfile_amub=gdas1.t12z.1bamub.tm00.bufr_d
    else
    

    Past the arch selection, environment variable checks, and creation of working directory, users will find the location where the observations are linked. For this case, we can see that the conventional PrepBUFR observations have been linked, as well as AMSU-A and AMSU-B satellite radiance BUFR files. These files will be linked to the working directory and separate observation innovation (diag) files will be generated for each observation.

Run GSI Observer and Check Run Status

Once the run script is set up properly for the case and the machine, GSI can be run through the run script. The following command will submit the job:

$ bsub < run_gsi_global.ksh

While the job is running, move to the working directory and check the details. Given the following working directory setup:

WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_${GFSCASE}

Go to directory /scratch/${user}/comGSIv3.5_EnKFv1.1/ and check the run directory. A directory named gsidiag_enkf_glb_t254 should have been created. This directory is the run directory for this GSI observer case study. The directory will be populated with many files such as:

AerosolCoeff.bin                 hirs4_n19.TauCoeff.bin         sndrD1_g14.TauCoeff.bin
ahi_himawari8.SpcCoeff.bin       iasi616_metop-a.SpcCoeff.bin   sndrD1_g15.SpcCoeff.bin
ahi_himawari8.TauCoeff.bin       iasi616_metop-a.TauCoeff.bin   sndrD1_g15.TauCoeff.bin
airs281SUBSET_aqua.SpcCoeff.bin  iasi616_metop-b.SpcCoeff.bin   sndrD2_g11.SpcCoeff.bin
... ...

These files are CRTM coefficients that have been linked to this run directory through the GSI run script. Similar to running the GSI analysis, many other files are linked or copied to this run directory, such as:

gsiparm.anl : GSI namelist
prepbufr : PrepBUFR file for conventional observation
convinfo : data usage control for conventional data
satbias_in : satellite bias correction coefficient file
satinfo : data usage and channel control for satellite radiance data
berror_stats : background error file
errtable : observation error file

Additionally, for the GSI observer many files are generated as a result of the lread_obs_save and lread_obs_skip options in the namelist for writing/reading collective observation selection information:

obs_input.nnnn: During ensemble mean (save=True, skip=False), save all observation preprocessing to this file. Each file (nnnn) is for a different variable type
pennnn.obs_setup: When looping through ensemble members (save=False, skip=True), all observation preprocessing is skipped, obs_input.nnnn files are read and pennnn.obs_setup is written for each processor (nnn) for all observations.
As the GSI observer is running for the ensemble mean as well as looping through each member, files are for each ensemble member, as well as the ensemble mean:
pennnn.conv_01 : Files generated after O-B, generated for conventional observations as well as one file per sensor (e.g. AMSU-A n18). These files are trimmed for resulting diag files and cleaned.
list_run_directory : Directory listing after ensemble mean is run, before directory is cleaned for running the ensemble members.
list_run_directory_memnnn: As as above, for each ensemble member (nnn).
stdout : Standard output file for ensemble mean.
stdout_memnnn : Standard output file for each ensemble member (nnn).

The presence of the standard output files in the directory suggest the GSI observer run scripts have successfully set up a run environment for the GSI observer, properly looping through the ensemble members, and the GSI executable is running on each ensemble member. Once GSI has finished running, diag files should be generated for each observation type for each member as well as the ensemble mean:

diag_conv_ges.ensmean  diag_conv_ges.mem004  diag_conv_ges.mem008
diag_conv_ges.mem001   diag_conv_ges.mem005  diag_conv_ges.mem009
diag_conv_ges.mem002   diag_conv_ges.mem006  diag_conv_ges.mem010
diag_conv_ges.mem003   diag_conv_ges.mem007
diag_amsua_metop-a_ges.ensmean  diag_amsua_metop-a_ges.mem006
diag_amsua_metop-a_ges.mem001   diag_amsua_metop-a_ges.mem007
diag_amsua_metop-a_ges.mem002   diag_amsua_metop-a_ges.mem008
diag_amsua_metop-a_ges.mem003   diag_amsua_metop-a_ges.mem009
diag_amsua_metop-a_ges.mem004   diag_amsua_metop-a_ges.mem010
diag_amsua_metop-a_ges.mem005
diag_amsua_n18_ges.ensmean  diag_amsua_n18_ges.mem004  diag_amsua_n18_ges.mem008
diag_amsua_n18_ges.mem001   diag_amsua_n18_ges.mem005  diag_amsua_n18_ges.mem009
diag_amsua_n18_ges.mem002   diag_amsua_n18_ges.mem006  diag_amsua_n18_ges.mem010
diag_amsua_n18_ges.mem003   diag_amsua_n18_ges.mem007
diag_amsua_n19_ges.ensmean  diag_amsua_n19_ges.mem004  diag_amsua_n19_ges.mem008
diag_amsua_n19_ges.mem001   diag_amsua_n19_ges.mem005  diag_amsua_n19_ges.mem009
diag_amsua_n19_ges.mem002   diag_amsua_n19_ges.mem006  diag_amsua_n19_ges.mem010
diag_amsua_n19_ges.mem003   diag_amsua_n19_ges.mem007

Check for Successful GSI Completion

The presence of the diag files and standard out files for the ensemble mean and each member indicates that the GSI observer has run without crashing, but does not necessary indicate a successful analysis. It is important to check the stdout files in the run directory to make sure the GSI observer completed each step without any obvious problems. The following are several important areas of the standard out file to check:

  1. Read in the anavinfo and namelist

     READ_FILES:  analysis date,minutes         2014           4           5
              12           0    19070640
     gsi_metguess_mod*init_:  2D-MET STATE VARIABLES:
     ps
     z
     gsi_metguess_mod*init_:  3D-MET STATE VARIABLES:
     u
     v
    ... ...
    
     &SETUP
     GENCODE =   78.0000000000000     ,
     FACTQMIN        =  0.000000000000000E+000,
     FACTQMAX        =  0.000000000000000E+000,
    
  2. Read in the background field
    The following lines in the stdout immediately following the namelist section, indicate that GSI is reading the background fields. Checking that the range of the max and min values will indicate if particular background fields are normal.
    GESINFO:  jcap_b= 254, levs= 64, latb=  256, lonb=  512, ntrac=  3, ncldt=  1, idvc=  2, nvcoord=  2,
    idvm=  0, idsl=  0, idpsfc=  0, idthrm=  0
       k,ak,bk,ck,tref=  1   0.00000000000       1.00000000000       0.00000000000       300.000000000
       k,ak,bk,ck,tref=  2   0.00000000000      0.994671165943       0.00000000000       300.000000000
       k,ak,bk,ck,tref=  3  0.574999988079E-03  0.988626599312       0.00000000000       300.000000000
       k,ak,bk,ck,tref=  4  0.574100017548E-02  0.981742262840       0.00000000000       300.000000000
       ... ...
    
  3. Read in observational data
    The stdout (ensemble mean) and stdout_memnnn (ensemble members) contains distinct differences for the reading in the observational data controlled by the lread_obs_save and lread_obs_skip options in the namelist.
    1. Ensemble mean

       read_obs_check: bufr file date is   2014040512 prepbufr t
       read_obs_check: bufr file date is   2014040512 prepbufr uv
       read_obs_check: bufr file date is   2014040512 prepbufr q
       read_obs_check: bufr file date is   2014040512 prepbufr ps
      ... ...
        number of extra processors           13
       READ_OBS:  read    1 ps         ps                   using ntasks=   1   0      0      0
       READ_OBS:  read    2 t          t                    using ntasks=   1   1      0      0
       READ_OBS:  read    3 q          q                    using ntasks=   1   2      0      0
       READ_OBS:  read    4 pw         pw                   using ntasks=   1   3   2738      0
      ... ...
       READ_PREPBUFR         : file=prepbufr        type=pw         sis=pw                   nread=       345
       ithin= 0 rmesh=1450.000000 isfcalc= 0 ndata=       345 ntask=  1
       READ_BUFRTOVS         : file=amsuabufr       type=amsua      sis=amsua_n18            nread=    202575
       ithin= 2 rmesh=1500.000000 isfcalc= 0 ndata=       675 ntask=  1
       ... ...
       READ_OBS:  write collective obs selection info to obs_input.common
      
      For the ensemble mean, the same procedure for observation processing as the GSI analysis is followed (read_obs_check, READ_OBS, READ_BUFRTOVS, READ_GPS, READ_PREPBUFR, …). Finally, it is indicated that the collective observations selection information is written out for use with the ensemble members.
    1. Ensemble members
      For the stdout_memnnn, the observation processing sets are skipped and observation information is read in from the ensemble mean:
      OBSERVER_SET:  read collective obs selection info from obs_input.common
      

      Finally, for both the ensemble mean and members, the following lines will appear:

      OBS_PARA: ps                          24       400      1649      2577        58       245      1496       506
                        82       402       616       251        36        67       328       864        44       224
                      5210      1397        69       414       506       629
      OBS_PARA: t                           48       403      3406      4875       174       488      4128      1825
                        84       724      2702      1146         4       215       882      2218       151       913
                     11864      3167       160      1622      1308      2381
      OBS_PARA: q                           16       370      3025      4749       154       383      3211      1737
                        51       594      2126      1102         0       161       697      1951       140       850
                     10310      2974       155      1525      1104      1506
      OBS_PARA: pw                           0         0         0         0         0         0         0         0
                         0         0         3         0         0         0        42        25         0         0
                       251        32         0         0         0         4
      OBS_PARA: uv                          52       410      4459      5664       239      1023      4994      1835
                        86      1553      3792      1316         3       184       877      2242       180      1576
                     11854      3595       264      3186      2231      2895
      OBS_PARA: amsua     n15                0         0         0         0         7         9         0         0
                         1         0         0         0         5         0         0         0         3        10
                         7         0         0         0         0         0
      OBS_PARA: amsua     n18                3         1         0         0         5         2         0         0
                         1         0         0         0         5         6         2         0         2         5
                         6         8         0         0         0         0
      OBS_PARA: amsua     n19                0         0         3         7         0         0         0         0
                         3         1         0         0         2         9        10         6         0         0
                         0         3         0         0         0         3
      OBS_PARA: amsua     metop-a            0         0         0         4         0         0         0         2
                         0         4         9         5         0         0         0         0         0         0
                         0         0         0         6         9         6
      OBS_PARA: amsua     metop-b            0         0         0         0         1         0         0         0
                         6         9         0         0         3         0         0         0         4         0
                         0         0         4        11         7         0
      
      This table is important to check if the observations have been read in, which types of observations have been read in, and the distribution of observations in each subdomain.
  4. Indication that the GSI observer has successfully run:
     glbsoi: complete
    [000]gsisub(): : complete.
    
    
         ENDING DATE-TIME    JUL 15,2016  15:48:52.880  197  FRI   2457585
         PROGRAM GSI_ANL HAS ENDED.
    * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
    

After looking over each section of the standard output files, it can be concluded that the GSI global observer ran without issues. Note that because the outer loop was set to 0 (miter=0) for the GSI observer, no minimization occurred and therefore no analysis results were produced.

Running EnKF for Global Applications

Run Script

Once the GSI global observer has been run successfully, the next step is to setup the EnKF run script, run_enkf_global.ksh. This script is located under comGSIv3.5_EnKFv1.1/run. This run script uses the diag files generated by the GSI global observer script as observation input, and generates the global EnKF analysis. Similar to the GSI observer script, several user-specific modifications need to be made:

  • Set up batch queueing system

    #BSUB -P ??????             # project code
    #BSUB -W 00:30                 # wall-clock time (hrs:mins)
    #BSUB -n 80                   # number of tasks in job
    #BSUB -R "span[ptile=16]"      # run 16 MPI tasks per node
    #BSUB -J gsi                   # job name
    #BSUB -o gsi.%J.out          # output file name in which %J is replaced by the job ID
    #BSUB -e gsi.%J.err          # error file name in which %J is replaced by the job ID
    #BSUB -q small               # queue
    
    One difference from the GSI observer script is that the number of processors used should be greater than at least the number of ensemble members for running the EnKF. In this case we have 10 ensemble members and have requested 80 cores due to the large global domain.
  • Set up the analysis time, global case, fixed files, EnKF executable

    ANAL_TIME=2014040512
    GUESS_TIME=2014040506
    GFSCASE=enkf_glb_t254
    WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/enkf_${GFSCASE}
    BK_ROOT=/scratch/${user}/casedata/enkf_glb_t254/bk
    GSI_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1
    FIX_ROOT=${GSI_ROOT}/fix
    ENKF_EXE=${GSI_ROOT}/src/main/enkf/global_enkf
    CRTM_ROOT=/scratch/${user}/CRTM_2.2.3
    ENKF_NAMELIST=${GSI_ROOT}/run/enkf_gfs_namelist.sh
    
  • Set location of the diag files (from GSI observer)

    DIAG_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/gsidiag_${GFSCASE}
    
    One modifications to note for the EnKF is the DIAG_ROOT. The DIAG_ROOT points to the working directory where the GSI observer runs. This directory contains diag* files which will be linked to the EnKF working directory.
  • Select ensemble and case parameters

    NMEM_ENKF=10
    LEVS=64
    NVARS=5
    ... ...
    elif [[ "$GFSCASE" = "enkf_glb_t254" ]]; then
      JCAP=254
      JCAP_B=254
    ... ...
      elif [[ "$JCAP" = "254" ]]; then
       LONA=512
       LATA=256
       DELTIM=1200
       resol=2
    
    This section sets information about the number of background ensemble members, domain specifications. This information mirrors that of the run_gsi_global.ksh script.
  • Read in namelist

    #Build EnKF namelist on-the-fly
    . $ENKF_NAMELIST
    cat << EOF > enkf.nml
    
     $enkf_namelist
    
    EOF
    
    Similar to the regional enkf (run_enkf_wrf.ksh), the global EnKF builds the namelist from an outside script for namelist generation (enkf_gfs_namelist.sh).
  • Select observations

    list="conv amsua_metop-a amsua_n18 amsua_n15"
    

    The previous line is contained within the ensemble looping section. For the observations to be assimilated, be aware that only observations/platforms that have been run through the GSI observer (and therefore a diag* file exists) are valid. For this example case, we will only run EnKF for conventional observations.

Run EnKF and Check Run Status

Once the run script is set up properly for the case and the machine, EnKF can be run through the run script. The following command will submit the job:

$ bsub < run_enkf_global.ksh

While the job is running, move to the working directory and check the details. Given the following working directory setup:

WORK_ROOT=/scratch/${user}/comGSIv3.5_EnKFv1.1/run/enkf_${GFSCASE}

Go to directory /scratch/${user}/comGSIv3.5_EnKFv1.1/ and check the run directory. A directory named enkf_enkf_glb_t264 should have been created. This directory is the run directory for this EnKF case study. The directory will be populated with many links:

diag_conv_ges.ensmean  diag_conv_ges.mem004  diag_conv_ges.mem008
diag_conv_ges.mem001   diag_conv_ges.mem005  diag_conv_ges.mem009
diag_conv_ges.mem002   diag_conv_ges.mem006  diag_conv_ges.mem010
diag_conv_ges.mem003   diag_conv_ges.mem007
diag_amsua_metop-a_ges.ensmean  diag_amsua_metop-a_ges.mem006
diag_amsua_metop-a_ges.mem001   diag_amsua_metop-a_ges.mem007
diag_amsua_metop-a_ges.mem002   diag_amsua_metop-a_ges.mem008
diag_amsua_metop-a_ges.mem003   diag_amsua_metop-a_ges.mem009
diag_amsua_metop-a_ges.mem004   diag_amsua_metop-a_ges.mem010
diag_amsua_metop-a_ges.mem005
diag_amsua_n18_ges.ensmean  diag_amsua_n18_ges.mem004  diag_amsua_n18_ges.mem008
diag_amsua_n18_ges.mem001   diag_amsua_n18_ges.mem005  diag_amsua_n18_ges.mem009
diag_amsua_n18_ges.mem002   diag_amsua_n18_ges.mem006  diag_amsua_n18_ges.mem010
diag_amsua_n18_ges.mem003   diag_amsua_n18_ges.mem007
diag_amsua_n19_ges.ensmean  diag_amsua_n19_ges.mem004  diag_amsua_n19_ges.mem008
diag_amsua_n19_ges.mem001   diag_amsua_n19_ges.mem005  diag_amsua_n19_ges.mem009
diag_amsua_n19_ges.mem002   diag_amsua_n19_ges.mem006  diag_amsua_n19_ges.mem010
diag_amsua_n19_ges.mem003   diag_amsua_n19_ges.mem007

The diag files are linked from the GSI global observer working directory. These links are specified in the run script. Note that ozone diag files (gome, omi, sbuv) will appear in the directory without links (zero length files) if ozone diags are not present from the GSI observer.

sfg_2014040512_fhr06_ensmean  sfg_2014040512_fhr06_mem004  sfg_2014040512_fhr06_mem008
sfg_2014040512_fhr06_mem001   sfg_2014040512_fhr06_mem005  sfg_2014040512_fhr06_mem009
sfg_2014040512_fhr06_mem002   sfg_2014040512_fhr06_mem006  sfg_2014040512_fhr06_mem010
sfg_2014040512_fhr06_mem003   sfg_2014040512_fhr06_mem007

The first guess files are also linked into the directory from the run script, pointing to the path of the ensemble mean and ensemble members designated in the setup section.

Similar to running the GSI observer, many other static files are linked or copied to this run directory, such as:

enkf.nml: EnKF namelist
convinfo: data usage control for conventional data
satbias_in: satellite bias correction coefficient file
satbias_angle: satellite bias correction angle file
satinfo: data usage and channel control for satellite radiance data
ozinfo: data usage control from ozone data
covinflate.dat: Three-dimensional multiplicative inflation factor fields
stdout: EnKF standard output file

The presence of the standard output file in the directory suggests the EnKF run script has successfully set up a run environment for the EnKF, properly linking the first guess and diag files from the GSI observer, and the EnKF executable is running. Once EnKF has finished running, analysis files should be generated for each member:

sanl_2014040512_mem001  sanl_2014040512_mem005  sanl_2014040512_mem009
sanl_2014040512_mem002  sanl_2014040512_mem006  sanl_2014040512_mem010
sanl_2014040512_mem003  sanl_2014040512_mem007
sanl_2014040512_mem004  sanl_2014040512_mem008

Check for Successful EnKF Completion

The presence of the EnKF analysis files for each member as well as the standard out file indicates that the EnKF has run without crashing, but does not necessary indicate a successful analysis. It is important to check the stdout file in the run directory to make sure the EnKF completed each step without any obvious problems. The following are several important areas of the standard out file to check:

  1. Check namelist has been properly read in and configuration is correct:

     namelist parameters:
     --------------------
     &NAM_ENKF
     DATEIN  =  2014040512,
     DATAPATH        = ./
     IASSIM_ORDER    =           0,
     COVINFLATEMAX   =   100.0000    ,
     COVINFLATEMIN   =   1.000000    ,
     DETERMINISTIC   = T,
     SORTINC = T,
     CORRLENGTHNH    =   2000.000    ,
     CORRLENGTHTR    =   2000.000    ,
     CORRLENGTHSH    =   2000.000    ,
    ... ...
    
  2. Check analysis time, number of ensemble members, as well as the actual analysis variables and the background type. The maximum and minimum values for surface pressure are printed for a sanity check:

    analysis time 2014040512
             10  members
    number of background forecast times to be updated =            1
    first-guess forecast hours for analysis = 06
              5 3d vars to update
    total of         321  2d grids will be updated (including ps)
    using multiplicative inflation based on Pa/Pb
    Vars in Rad-Jacobian (dims)
    --------------------------
    sst                            0
    enkf_main:  number of threads            1
    ensemble mean first guess surface pressure:
      516.109537854119        1051.19857562343
    
  3. Statistics of the ensemble prior
    After many lines describing the bias correction coefficents, the inventory of the observation number, observation types from the input diag* files, and the time spent reading in each observation type, statistics of the ensemble priors fit to all observations are provided for each region (NH, SH, TR):
     innovation statistics for prior:
     conventional obs
     region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
    NH    all ps  12059 -0.700E-01  0.122E+01  0.151E+01  0.499E+00  0.143E+01
    TR    all ps   2184  0.402E+00  0.118E+01  0.141E+01  0.476E+00  0.133E+01
    SH    all ps    849 -0.102E+00  0.125E+01  0.148E+01  0.563E+00  0.137E+01
    NH     all t  20598  0.929E-01  0.165E+01  0.121E+01  0.387E+00  0.115E+01
    TR     all t   6126  0.218E+00  0.153E+01  0.113E+01  0.418E+00  0.105E+01
    SH     all t   1169  0.141E+00  0.172E+01  0.113E+01  0.417E+00  0.105E+01
    NH    all uv  49010  0.199E+00  0.312E+01  0.250E+01  0.951E+00  0.231E+01
    TR    all uv  20980  0.651E-01  0.335E+01  0.265E+01  0.127E+01  0.232E+01
    SH    all uv   4682 -0.153E+00  0.375E+01  0.247E+01  0.114E+01  0.219E+01
    NH     all q   7936 -0.335E-01  0.181E+00  0.207E+00  0.534E-01  0.200E+00
    TR     all q   2951 -0.119E-01  0.180E+00  0.214E+00  0.770E-01  0.200E+00
    SH     all q    543 -0.163E-01  0.212E+00  0.213E+00  0.730E-01  0.200E+00
    
    This table should be checked in order to determine if the inflation is appropriate. As mentioned in the regional EnKF example, the goal is to make the total ensemble spreads of priors (\(\sqrt{(S+R)}\) match the innovations (innov std) as much as possible. Because this is a global run using only conventional observation data, we can see that all regions (NH,TR,SH) are listed and there are no statistics for radiances. We can see certain obtypes, particularly in the uv has fairly close values. On the other hand, there are still regions/obtypes that have noteable differences, mainly associated with larger observation errors (\(\sqrt{(R)}\)). Users should consider tuning of inflation and localization, which is typically determined using cases with multiple assimilation cycles. Refer to section 4.2 of this Users Guide for more information on tuning.
  4. Domain and observation partition:

    npts =       131072
    min/max number of points per proc =         1598        1710
    time to do model space decomp =   1.697483193129301E-002
    nobstot =       129087
    min/max number of obs per proc =         1613        1614
    time to do ob space decomp =   5.538591649383307E-004
    sending out observation prior ensemble perts from root ...
    nobstot*nanals               1290870
    npts*ndim              42074112
    ... took   6.416106596589088E-003  secs
    

    The analysis variables and the observations are distributed to different processors. We can see in this case that the min/max number of points per processor are 1598 and 1710, respectively. Similarly, we can see that the min/max number of observations per processor are 1613 and 1614, respectively, indicating that the observations are well dispersed among the processors.

     min/max pressi           1   516.1096       1051.199
     min/max pressi           2   513.3593       1045.597
     min/max pressi           3   510.2454       1039.249
     min/max pressi           4   506.7440       1032.063
     min/max pressi           5   502.8375       1023.943
    ... ...
    

Additionally, check the minimum and maximum values of the fields at each vertical level as a quick sanity check.

Diagnose EnKF Analysis Results

At the bottom of the standard output file, there are several output statistics and tables that are helpful for users to diagnose the quality of the EnKF analysis.

  1. Statistics of the ensemble analysis

    global ps prior std. dev min/max =   7.2904140E-02   3.055338
    NH mean ps prior standard deviation =   0.5657563
    NH mean ps posterior standard deviation (before inflation)=   0.3681831
    NH mean ps posterior standard deviation (after inflation) =   0.5323756
    NH mean ps inflation =    1.653412
    SH mean ps prior standard deviation =   0.6475344
    SH mean ps posterior standard deviation (before inflation)=   0.6009755
    SH mean ps posterior standard deviation (after inflation) =   0.6399752
    SH mean ps inflation =    1.086266
    TR mean ps prior standard deviation =   0.4878409
    TR mean ps posterior standard deviation (before inflation)=   0.4240963
    TR mean ps posterior standard deviation (after inflation) =   0.4770048
    TR mean ps inflation =    1.170121
    time in inflate_ens =  5.236316868104041E-002 on proc           0
    
    This section is important for checking whether the values of the inflation are reasonable by looking at a summary of the maximum and minimum values. Also, viewing the regional averaged statistics before and after inflation. For this case we can see for all regions, larger standard deviation after the inflation are reported, which is more consistent with the prior standard deviation.
  2. Spread inflation of the analysis ensemble

     innovation statistics for posterior:
     conventional obs
     region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
    NH    all ps  12059  0.538E-01  0.912E+00  0.144E+01  0.200E+00  0.143E+01
    TR    all ps   2184  0.243E+00  0.102E+01  0.136E+01  0.285E+00  0.133E+01
    SH    all ps    849 -0.755E-01  0.978E+00  0.142E+01  0.388E+00  0.137E+01
    NH     all t  20598  0.590E-01  0.147E+01  0.117E+01  0.196E+00  0.115E+01
    TR     all t   6126  0.177E+00  0.135E+01  0.107E+01  0.222E+00  0.105E+01
    SH     all t   1169  0.100E+00  0.148E+01  0.108E+01  0.246E+00  0.105E+01
    NH    all uv  49010  0.765E-01  0.269E+01  0.236E+01  0.478E+00  0.231E+01
    TR    all uv  20980  0.116E-02  0.274E+01  0.240E+01  0.605E+00  0.232E+01
    SH    all uv   4682 -0.981E-01  0.303E+01  0.227E+01  0.591E+00  0.219E+01
    NH     all q   7936 -0.333E-01  0.166E+00  0.202E+00  0.268E-01  0.200E+00
    TR     all q   2951  0.369E-02  0.148E+00  0.203E+00  0.370E-01  0.200E+00
    SH     all q    543 -0.188E-01  0.187E+00  0.204E+00  0.393E-01  0.200E+00
    

    After the EnKF analysis, it is important to check the innovation statistics, as previously discussed in the regional test case.