.. _enkf_diag:

4. EnKF Diagnostics and Tuning
===========================

This chapter will discuss how to assess whether an EnKF was successful
based on the contents of the standard output (stdout). Properly checking
the EnKF output will also provide useful information to diagnose
potential errors in the system. The chapter begins with an introduction
to the content and structure of the EnKF stdout, followed by detailed
discussion of tuning options in the namelist. This chapter follows a
case at 00z on February 13th, 2014 (case 2014021300). This case uses
WRF-ARW NetCDF ensemble files as the background and analyzes several
observations typical for operations, including most conventional
observation data and select radiance data (AMSU-A , HIRS4). The case was
run on a Linux cluster supercomputer, using 32 cores. Users can follow
this test to reproduce the following results by visiting:

http://www.dtcenter.org/EnKF/users/tutorial/index.php

Understanding the Standard Output from EnKF (stdout)
----------------------------------------------------

Upon completion of an EnKF run, it is always useful to do a quick check
of the standard output (stdout), to assess the performance of the EnKF.
The stdout file has information about whether the EnKF analysis has
successfully completed, if the ensemble spread inflation looks good, and
if the background and analysis fields are reasonable. Understanding the
contents of this file can also be very helpful for users to find clues
in the event of a crash.

The EnKF stdout has following information:

 
   a. namelist configuration

 
   b. background ensemble members of observations and analysis variables

 
   c. statistics of the ensemble prior

 
   d. domain and observation partition

 
   e. statistics of the ensemble analysis

 
   f. spread inflation of the analysis ensemble

The following section contains a detailed description of the content of
stdout, explained using the WRF-ARW case: 2014021300. The analysis
domain consists of 129 x 70 x 50 grid points. To keep the output concise
and make it more readable, redundant content are omitted.

The following indicates the start of the EnKF analysis. It shows how
many processors are used to run the analysis and the beginning time of
this run:

::

   Execute poe command line: poe  ./enkf.x
    running on           32  processors ...


   * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
        PROGRAM ENKF_ANL HAS BEGUN. COMPILED 2011319.55     ORG: NP25
        STARTING DATE-TIME  JUL 14,2016  00:01:03.956  196  THU   2457584

The following lines show how many satellite observation types are set to
be read in and if the radiance bias correction is updated before the
analysis process:

::

    number of satellite radiance files used          59
    number of satellite ozone files used           7

The following lines show the namelist used in the analysis:

::

    namelist parameters:
    --------------------
    &NAM_ENKF
    DATEIN  =  2014021300,
    DATAPATH        = ./                                                                                                    ,
    IASSIM_ORDER    =           0,
    COVINFLATEMAX   =   100.0000    ,
    COVINFLATEMIN   =   1.000000    ,
    DETERMINISTIC   = T, SORTINC = T,
    CORRLENGTHNH    =   500.0000    ,
    CORRLENGTHTR    =   500.0000    ,
    CORRLENGTHSH    =   500.0000    ,
   ... ...
    SAVE_INFLATION  = F,
    LETKF_FLAG      = F,
    MASSBAL_ADJUST  = F,
    USE_EDGES       = F,
    EMISS_BC        = T,
    ISEED_PERTURBED_OBS     =           0
    /
    --------------------

The following lines show ensemble size, background information, the
number of 3D analysis variables, and total number of 2D fields of the 3D
variables, plus surface pressure (Ps).

::

             20  members
    number of background forecast times to be updated =            1
    first-guess forecast hours for analysis = 06
              5 3d vars to update
    total of         251  2d grids will be updated (including ps)

The actual analysis variables and the background type are printed, as
well as the maximum and minimum of the surface pressure as a sanity
check:

::

    Updating U, V, T, QVAPOR, PH, and MU for WRF-ARW...
    Surface pressure (spressmn) min/max range:   678.673339843750
      1032.22473144531

Next, the contents of convinfo are displayed:

::

   READ_CONVINFO: tcp     112    0  1   3.000      0   0   0   75.00     5.000     1.000     75.00     0.000        0  0.000    0.000      0  0.000    0.000     2
   READ_CONVINFO: ps      120    0  1   3.000      0   0   0   4.000     3.000     1.000     4.000    0.3000E-03    0  0.000    0.000      0  0.000    0.000     2
    line ignored in convinfo due to use flag  ps              132           0
             -1
   ......

    line ignored in convinfo due to use flag  gps              44           0
             -1

The next 272 lines show the content of ozinfo:

::

   OZINFO_READ:  jpch_oz=    272
     1 sbuv6_n14      lev =    1 use = -1 pob =     0.240 gross =   1.000 error =   1.000 b_oz =  10.000 pg_oz =   0.000
     2 sbuv6_n14      lev =    2 use = -1 pob =     0.490 gross =   1.000 error =   1.000 b_oz =  10.000 pg_oz =   0.000
     3 sbuv6_n14      lev =    3 use = -1 pob =     0.980 gross =   1.000 error =   1.000 b_oz =  10.000 pg_oz =   0.000
   ... ...
   271 mls30_aura     lev =   54 use = -1 pob =   999.999 gross =   9.999 error =   9.999 b_oz =  10.000 pg_oz =   0.000
   272 mls30_aura     lev =   55 use = -1 pob =   999.999 gross =   9.999 error =   9.999 b_oz =  10.000 pg_oz =   0.000

The next 2680 lines show the content of satinfo and starts reading in
the radiance bias correction coefficients:

::

   RADINFO_READ:  jpch_rad=   2723
      1 amsua_n15      chan= 1 var= 3.000 varch_cld= 20.000 use= 1 ermax=   4.500 b_rad= 10.00 pg_rad=  0.00 icld_det=-2
      2 amsua_n15      chan= 2 var= 2.200 varch_cld= 18.000 use= 1 ermax=   4.500 b_rad= 10.00 pg_rad=  0.00 icld_det=-2
      3 amsua_n15      chan= 3 var= 2.000 varch_cld= 12.000 use= 1 ermax=   4.500 b_rad= 10.00 pg_rad=  0.00 icld_det=-2
      4 amsua_n15      chan= 4 var= 0.600 varch_cld=  3.000 use= 1 ermax=   2.500 b_rad= 10.00 pg_rad=  0.00 icld_det=-2
   ... ...

   2722 ahi_himawari8  chan=15 var= 2.200 varch_cld=  0.000 use= -1 ermax=   2.000 b_rad= 10.00 pg_rad=  0.00 icld_det=-2
   2723 ahi_himawari8  chan=16 var= 2.200 varch_cld=  0.000 use= -1 ermax=   2.000 b_rad= 10.00 pg_rad=  0.00 icld_det=-2

The majority of next near 3000 lines show the content of radiance bias
correction coefficients:

::

    RADINFO_READ:  guess air mass bias correction coefficients below
      1            amsua_n15   -1.96518   0.00000  39.06817  -0.92319   0.40846   0.00000   0.00000   0.00173   3.68970   2.36747   8.86294  -1.84829
      2            amsua_n15   -7.41540   0.00000  80.13801  -0.81847  -1.01321   0.00000   0.00000   0.01427   2.63880   8.31509  22.04145  -1.56477
     ... ...
   2679       avhrr3_metop-b    0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000
   2680       avhrr3_metop-b    0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000

Among the lines describing the radiance bias correction coefficients,
the EnKF also starts to inventory the observation number and types for
both conventional and radiance data. Observations of various types are
read in (from the ``diag**`` files) and the number of the observations
and time spent reading in the observations are shown:

Start to check the radiance and ozone observations:

::

       1             sbuv2_n16  nread=         0  nkeep=         0  num_obs_tot=         0
       1             amsua_n15  nkeep=         0  num_obs_tot=         0
       2             sbuv2_n17  nread=         0  nkeep=         0  num_obs_tot=         0
       3             sbuv2_n18  nread=         0  nkeep=         0  num_obs_tot=         0
       4             sbuv2_n19  nread=         0  nkeep=         0  num_obs_tot=         0
       5              omi_aura  nread=         0  nkeep=         0  num_obs_tot=         0
       6          gome_metop-a  nread=         0  nkeep=         0  num_obs_tot=         0
       7          gome_metop-b  nread=         0  nkeep=         0  num_obs_tot=         0
              0  ozone obs
       2             amsua_n18  nkeep=      6284  num_obs_tot=      6284
       3             amsua_n19  nkeep=         0  num_obs_tot=      6284
       6            amsua_aqua  nkeep=         0  num_obs_tot=      6284
       7         amsua_metop-a  nkeep=         0  num_obs_tot=      6284
       8             airs_aqua  nkeep=         0  num_obs_tot=      6284
      11         hirs4_metop-a  nkeep=       154  num_obs_tot=      6438
      12               mhs_n18  nkeep=         0  num_obs_tot=      6438
      13               mhs_n19  nkeep=         0  num_obs_tot=      6438
      14           mhs_metop-a  nkeep=         0  num_obs_tot=      6438
      15          goes_img_g11  nkeep=         0  num_obs_tot=      6438
      25             ssmis_f17  nkeep=         0  num_obs_tot=      6438
      45            sndrd1_g15  nkeep=         0  num_obs_tot=      6438
      46            sndrd2_g15  nkeep=         0  num_obs_tot=      6438
      47            sndrd3_g15  nkeep=         0  num_obs_tot=      6438
      48            sndrd4_g15  nkeep=         0  num_obs_tot=      6438
      49          iasi_metop-a  nkeep=         0  num_obs_tot=      6438
      52            seviri_m10  nkeep=         0  num_obs_tot=      6438
      53         amsua_metop-b  nkeep=         0  num_obs_tot=      6438

Start to check the conventional observations:

::

    columns below obtype,nread, nkeep
       t       8043       7904
       q       2625       2621
      ps      14133      14133
      uv      21982      21982
     sst          0          0
     gps          0          0
      pw          0          0
      dw          0          0
     srw          0          0
      rw          0          0
     tcp          0          0

A summary of the total number of conventional (:math:`1^{st}` number
46640), ozone (:math:`2^{nd}` number 0), and radiance observations
(:math:`3^{rd}` number 6438) in diag files is given:

::

    nobs_conv, nobs_oz, nobs_sat =        46640           0        6438

Also, the following normalization factors for the radiance bias
predictors could be inside the bias correction coefficient lines:

::

    npred  =           12
              1 biasprednorm =    1.000000
              2 biasprednorm =    1.000000
   ... ...
             11 biasprednorm =    1.000000
             12 biasprednorm =    1.000000

Then, the time spent reading in the observations and the total number of
observations kept and rejected are shown:

::

    max time in mpireadobs  =   0.6105460
    total number of obs        53078
          53078 obs kept
              0 total obs rejected
    time in read_obs =   1.25072427513078      on proc           0

In the stdout, regional averaged (northern hemisphere - NH, southern
hemisphere - SH, and tropics - TR) statistics of the ensemble ‘priors‘
fit to all observations are provided, partly for checking if the
inflation is appropriate. The details of this part are discussed in next
Section `1.2 <#ch4_tuningandinflat>`__.

::

    innovation statistics for prior:
    conventional obs
    region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
   NH    all ps  14083 -0.591E-02  0.860E+00  0.814E+00  0.601E+00  0.550E+00
   TR    all ps     50 -0.629E-01  0.422E+00  0.661E+00  0.367E+00  0.550E+00
   NH     all t   7766 -0.228E+00  0.163E+01  0.128E+01  0.480E+00  0.119E+01
   TR     all t    138 -0.130E+00  0.118E+01  0.119E+01  0.234E+00  0.117E+01
   NH    all uv  21680  0.136E-01  0.295E+01  0.276E+01  0.140E+01  0.237E+01
   TR    all uv    302 -0.379E+00  0.251E+01  0.268E+01  0.124E+01  0.238E+01
   NH     all q   2553 -0.323E-01  0.146E+00  0.157E+00  0.868E-01  0.131E+00
   TR     all q     68 -0.721E-01  0.202E+00  0.164E+00  0.582E-01  0.154E+00
    satellite brightness temp
    instrument, channel #, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
              amsua_n18   1   241 -0.199E+00  0.194E+01  0.306E+01  0.176E+01  0.250E+01
              amsua_n18   2   248 -0.194E+00  0.145E+01  0.234E+01  0.806E+00  0.220E+01
              amsua_n18   3   248 -0.620E+00  0.119E+01  0.206E+01  0.497E+00  0.200E+01
              amsua_n18   4   248 -0.189E+00  0.398E+00  0.568E+00  0.140E+00  0.550E+00
              amsua_n18   5   248  0.188E-01  0.337E+00  0.315E+00  0.966E-01  0.300E+00
              amsua_n18   6  1203 -0.396E-01  0.198E+00  0.238E+00  0.610E-01  0.230E+00
              amsua_n18   7  1225 -0.215E+00  0.324E+00  0.235E+00  0.480E-01  0.230E+00
              amsua_n18   8  1155 -0.235E+00  0.369E+00  0.259E+00  0.675E-01  0.250E+00
              amsua_n18  10  1211  0.408E+00  0.491E+00  0.353E+00  0.429E-01  0.350E+00
              amsua_n18  11    12  0.720E+00  0.723E+00  0.404E+00  0.567E-01  0.400E+00
              amsua_n18  15   245 -0.311E+00  0.198E+01  0.380E+01  0.148E+01  0.350E+01
          hirs4_metop-a   3     1  0.115E+01  0.115E+01  0.534E+00  0.631E-01  0.530E+00
          hirs4_metop-a   4    42  0.260E+00  0.521E+00  0.403E+00  0.488E-01  0.400E+00
          hirs4_metop-a   5     7 -0.534E+00  0.558E+00  0.367E+00  0.696E-01  0.360E+00
          hirs4_metop-a   6     7 -0.748E+00  0.782E+00  0.473E+00  0.109E+00  0.460E+00
          hirs4_metop-a   7     9 -0.557E+00  0.815E+00  0.580E+00  0.108E+00  0.570E+00
          hirs4_metop-a   8     9  0.374E+00  0.675E+00  0.100E+01  0.208E-01  0.100E+01
          hirs4_metop-a  10     9  0.199E+00  0.581E+00  0.612E+00  0.120E+00  0.600E+00
          hirs4_metop-a  11     9 -0.380E-01  0.718E+00  0.143E+01  0.770E+00  0.120E+01
          hirs4_metop-a  12    34 -0.988E+00  0.179E+01  0.213E+01  0.141E+01  0.160E+01
          hirs4_metop-a  13     9  0.694E-01  0.330E+00  0.382E+00  0.116E+00  0.364E+00
          hirs4_metop-a  14     9 -0.369E-01  0.257E+00  0.294E+00  0.137E+00  0.260E+00
          hirs4_metop-a  15     9  0.352E-01  0.289E+00  0.281E+00  0.107E+00  0.260E+00
    time in estimate_work_enkf1 =   5.723375361412764E-002  secs

Next, the analysis variable fields and the observations are distributed
to the different processors. The following lines show the maximum and
minimum number of observations and grid on subdomain and the time used
to setup those decompositions:

::

    time in estimate_work_enkf1 =   5.723375361412764E-002  secs
    min/max numobs          32        7663
    min/max estimated work       637809      638478
    npts =         9030
    min/max number of points per proc =          243         317
    time to do model space decomp =   1.529277069494128E-003
    nobstot =        53078
    min/max number of obs per proc =         1658        1659
    time to do ob space decomp =   3.935033455491066E-004
    sending out observation prior ensemble perts from root ...
    nobstot*nanals               1061560
    npts*ndim               2266530
    ... took   1.730861375108361E-002  secs
    time in load_balance =  7.929413160309196E-002 on proc           0

Then, the ensemble background members of the analysis variables are read
in and the maximum and minimum values of the fields at each vertical
level are displayed. The maximum and minimum values are useful for a
quick confirmation that the background fields have been read
successfully. The size of the real array of ensemble perturbations
updated on each processor and the time spent to read in and distribute
the background ensemble are also shown:

::

    anal_chunk size =      1591340
    READGRIDDATA_ARW: U           1  -20.55222       20.72632
     ...
    READGRIDDATA_ARW: U          50  -12.60845       52.03727
    READGRIDDATA_ARW: V          51  -11.67894       21.03603
    ...
    READGRIDDATA_ARW: V
    ...
    READGRIDDATA_ARW: V         100  -17.20951       21.87867
    READGRIDDATA_ARW: T         101  -62.00526       13.33912
    ...
    READGRIDDATA_ARW: T         150   469.5381       509.8757
    READGRIDDATA_ARW: QVAPOR         151  8.4173851E-05  1.7215064E-02
    ...
    READGRIDDATA_ARW: QVAPOR         200  1.5460877E-06  1.0506745E-05
    READGRIDDATA_ARW: PH         201  -14.64355       4.688477
    ...
    READGRIDDATA_ARW: PH         250   7195.078       14206.67
    READGRIDDATA_ARW: MU         251  -1326.610       3290.569
    time in readgridata on root  0.326851664343849      secs
    time to scatter state on root  0.163762195268646      secs
    time in read_ensemble =  0.494254242163152      on proc           0

In EnSRF, observations can be skipped/not assimilated due to the
adaptive observation thinning. The following lines show how many
observations are skipped or not assimilated by this thinning. In this
case, paoverpb_thresh = 0.99, which lead to 423 of the observations
being skipped.

::

    assimilate obs in order they were read in
    1 timing on proc     0  =    0.51    0.24    0.01    0.10    0.00    0.15    0
          22152  out of       53078 obs skipped,       30926  used
          22736  out of       30926  same lat/long
    time to broadcast obfit_post =   5.432746838778257E-004  secs, niter =
              1

Next, the update to the analysis variables is performed and the time for
the updating is shown:

::

   time in enkf_update =   4.79503449611366      on proc           0

The regional averaged statistics of the inflation values are shown for
surface pressure:

::

    global ps prior std. dev min/max =    21.52301       172.3270
    NH mean ps prior standard deviation =    57.82042
    NH mean ps posterior standard deviation (before inflation)=    29.20955
    NH mean ps posterior standard deviation (after inflation) =    54.58403
    NH mean ps inflation =    2.421974
    TR mean ps prior standard deviation =    31.09247
    TR mean ps posterior standard deviation (before inflation)=    28.31872
    TR mean ps posterior standard deviation (after inflation) =    30.79383
    TR mean ps inflation =    1.106122
    time in inflate_ens =  1.366839744150639E-002 on proc           0

After the EnKF analysis, innovation statistics of the ensemble analyses
are shown, similar to the ensemble priorsfit to all observations:

::

    innovation statistics for posterior:
    conventional obs
    region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
   NH    all ps  14083  0.503E-03  0.757E+00  0.569E+00  0.146E+00  0.550E+00
   TR    all ps     50 -0.562E-01  0.398E+00  0.585E+00  0.198E+00  0.550E+00
   NH     all t   7766 -0.780E-01  0.145E+01  0.122E+01  0.263E+00  0.119E+01
   TR     all t    138 -0.107E+00  0.115E+01  0.118E+01  0.199E+00  0.117E+01
   NH    all uv  21680  0.406E-01  0.267E+01  0.246E+01  0.655E+00  0.237E+01
   TR    all uv    302 -0.259E+00  0.236E+01  0.246E+01  0.636E+00  0.238E+01
   NH     all q   2553 -0.174E-01  0.114E+00  0.138E+00  0.444E-01  0.131E+00
   TR     all q     68 -0.279E-01  0.168E+00  0.159E+00  0.413E-01  0.154E+00
     satellite brightness temp
    instrument, channel #, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
              amsua_n18   1   241  0.614E+02  0.915E+02  0.263E+01  0.823E+00  0.250E+01
              amsua_n18   2   248 -0.132E+03  0.171E+03  0.223E+01  0.376E+00  0.220E+01
              amsua_n18   3   248 -0.303E+02  0.398E+02  0.201E+01  0.244E+00  0.200E+01
              amsua_n18   4   248 -0.231E+00  0.580E+01  0.555E+00  0.776E-01  0.550E+00
              amsua_n18   5   248 -0.788E+00  0.199E+01  0.305E+00  0.575E-01  0.300E+00
              amsua_n18   6  1203 -0.138E+01  0.158E+01  0.234E+00  0.423E-01  0.230E+00
              amsua_n18   7  1225 -0.273E+01  0.287E+01  0.232E+00  0.322E-01  0.230E+00
              amsua_n18   8  1155 -0.131E+01  0.186E+01  0.254E+00  0.437E-01  0.250E+00
              amsua_n18  10  1211 -0.179E+01  0.196E+01  0.352E+00  0.374E-01  0.350E+00
              amsua_n18  11    12 -0.130E+01  0.133E+01  0.404E+00  0.537E-01  0.400E+00
              amsua_n18  15   245 -0.163E+03  0.198E+03  0.357E+01  0.693E+00  0.350E+01
          hirs4_metop-a   3     1  0.118E+01  0.118E+01  0.533E+00  0.558E-01  0.530E+00
          hirs4_metop-a   4    42  0.286E+00  0.542E+00  0.403E+00  0.455E-01  0.400E+00
          hirs4_metop-a   5     7 -0.451E+00  0.487E+00  0.364E+00  0.562E-01  0.360E+00
          hirs4_metop-a   6     7 -0.724E+00  0.755E+00  0.468E+00  0.852E-01  0.460E+00
          hirs4_metop-a   7     9 -0.572E+00  0.817E+00  0.577E+00  0.871E-01  0.570E+00
          hirs4_metop-a   8     9  0.122E+01  0.136E+01  0.100E+01  0.205E-01  0.100E+01
          hirs4_metop-a  10     9  0.855E+00  0.100E+01  0.609E+00  0.104E+00  0.600E+00
          hirs4_metop-a  11     9 -0.236E-01  0.543E+00  0.130E+01  0.507E+00  0.120E+01
          hirs4_metop-a  12    34 -0.738E+00  0.153E+01  0.178E+01  0.790E+00  0.160E+01
          hirs4_metop-a  13     9  0.121E+01  0.128E+01  0.375E+00  0.905E-01  0.364E+00
          hirs4_metop-a  14     9  0.776E-01  0.256E+00  0.281E+00  0.106E+00  0.260E+00
          hirs4_metop-a  15     9  0.281E+00  0.384E+00  0.273E+00  0.838E-01  0.260E+00

Finally, the minimum and maximum of the analysis increments are shown as
below. The analysis increments should be within a reasonable range. The
computational time of these steps are also shown:

::

    time to gather state on root  0.101591360988095      secs
    time level            1
    --------------
    ens. mean anal. increment min/max ps  -123.0895       264.4738
    ens. mean anal. increment min/max var           1  -7.882635       7.015289
    ens. mean anal. increment min/max var           2  -7.460623       6.953684
    ens. mean anal. increment min/max var           3  -4.598419       1.846811
    ens. mean anal. increment min/max var           4 -0.2952938      0.4161232
    ens. mean anal. increment min/max var           5  0.0000000E+00  0.0000000E+00
    time to gather ens mean increment on root  2.756185410544276E-002 secs
    time in writegriddata on root  0.255841280566528      secs
    time in write_ensemble =  0.385106393136084      on proc           0

The next line indicates the time of the analysis finish:

::

        ENDING DATE-TIME    JUL 14,2016  00:01:11.329  196  THU   2457584
        PROGRAM ENKF_ANL HAS ENDED.
   * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .

The next lines show the computer resources used in the analysis and a
sign that EnKF has finished, "all done":

::

   * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
   *****************RESOURCE STATISTICS*******************************
   The total amount of wall time                        = 7.431095
   The total amount of time in user mode                = 6.480014
   The total amount of time in sys mode                 = 0.713891
   The maximum resident set size (KB)                   = 122848
   Number of page faults without I/O activity           = 46688
   Number of page faults with I/O activity              = 0
   Number of times filesystem performed INPUT           = 0
   Number of times filesystem performed OUTPUT          = 0
   Number of Voluntary Context Switches                 = 24121
   Number of InVoluntary Context Switches               = 138
   *****************END OF RESOURCE STATISTICS*************************

    all done!

.. _ch4_tuningandinflat:

Tuning of Inflation and Localization
------------------------------------

Proper inflation and localization values need to be determined by
experimenting with different inflation values. The goal is to make the
total ensemble spreads of priors match the innovations as much as
possible. In the tuning process, the vertical and horizontal structure
of the match should be carefully examined. The 3D distribution of the
inflation values can be plotted offline from the output file
``"covinflate.dat"``.

The tuning processes can vary for different resolutions and physics of
the models as well as the observation types assimilated. The performance
of the tuning may be better examined after multiple assimilation cycles
allowing their effects to be accumulated and converged in cycling
assimilation. Please also note that proper settings of observation error
estimations are also critical for the inflation tuning.

Next, the test case is taken as an example to check if the inflation is
appropriate. In the following lines of the standard output (``stdout``),
the number (column ``nobs``), mean (column ``bias``), and standard
deviation (column ``innov stdev``) of the ensemble background fits to
Ps, wind, temperature, and water vapor observations in the NH and TR are
shown:

::

    innovation statistics for prior:
    conventional obs
    region, obtype, nobs, bias, innov stdev, sqrt(S+R), sqrt(S), sqrt(R):
   NH    all ps  14083 -0.591E-02  0.860E+00  0.814E+00  0.601E+00  0.550E+00
   TR    all ps     50 -0.629E-01  0.422E+00  0.661E+00  0.367E+00  0.550E+00
   NH     all t   7766 -0.228E+00  0.163E+01  0.128E+01  0.480E+00  0.119E+01
   TR     all t    138 -0.130E+00  0.118E+01  0.119E+01  0.234E+00  0.117E+01
   NH    all uv  21680  0.136E-01  0.295E+01  0.276E+01  0.140E+01  0.237E+01
   TR    all uv    302 -0.379E+00  0.251E+01  0.268E+01  0.124E+01  0.238E+01
   NH     all q   2553 -0.323E-01  0.146E+00  0.157E+00  0.868E-01  0.131E+00
   TR     all q     68 -0.721E-01  0.202E+00  0.164E+00  0.582E-01  0.154E+00

It would be desirable that the total ensemble spreads of
prior/background match the innovation standard deviations as close as
possible (Equation 15). In practice, one should check if there is any
substantial ensemble spread deficiency. In the above sample standard
output lines, the ensemble prior spreads are shown in the column
:math:`\sqrt(S)`, and the total spreads in the column
:math:`\sqrt(S+R)`. Here, :math:`\sqrt(R)` is the observation error.

The above sample statistics shows that in NH, the total spread of
temperature observation priors (0.128E+01 ) is close to the innovation
standard deviation (0.163E+01) of the priors fit to the observations.

Tuning EnKF through Key Namelist Options
----------------------------------------

The namelist parameters controlling the EnSRF analysis are set in the
namelist sections: /``enkf_nml/``, ``/satobs_enkf/``, and ``/nam_wrf/``
(all in the file ``enkf.nml``). To obtain a successful analysis, it is
very important to setup/tune properly the namelist variables,
particularly those related to inflation and localization, for each
specific application configuration of various models and observation
types.

Set Up the Analyses Time and Data Location
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The analysis time and working location are setup by the following
parameters:

.. table:: Parameter Options

+------------+-----------------------------------------------------------+
| *datein*   | analysis time (YYYYMMDDHH)                                |
+------------+-----------------------------------------------------------+
| *datapath* | path to data directory (include trailing slash). In most  |
| \          | cases, this should point to the current directory because |
| \          | the run scripts have setup the run environment in the     |
| \          | working directory including copying the EnKF executable   |
| \          | and data into the working directory.                      |
+------------+-----------------------------------------------------------+

Set Up Analysis Algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~

In the current implementation, a few variations/flavors of the EnKF are
available, including the ensemble square root Kalman filter (EnSRF,
*Whitaker et al.*, 2010, etc), the perturbed observations EnKF, and the
local transformed Kalman filter (LETKF). These options are set in the
namelists below:

.. table:: Namelist Options

+-------------------+--------------------------------------------------------+
| deterministic     | = true, use EnSRF without perturbed observations;      |
+-------------------+--------------------------------------------------------+
| \                 | = false, use perturbed observations version of EnKF.   |
+-------------------+--------------------------------------------------------+
| sortinc           | if deterministic = false, re-order observations to     |
| \                 | minimize regression errors as described in Anderson    |
| \                 | (2003).                                                |
+-------------------+--------------------------------------------------------+
| letkf_flag        | if true, use the LETKF scheme.                         |
+-------------------+--------------------------------------------------------+
| boxsize           | observation box size for LETKF (deg)                   |
+-------------------+--------------------------------------------------------+
| iassim_order      |  = 0, assimilation of observations in order read       |
+-------------------+--------------------------------------------------------+
| \                 |  = 1 in random order                                   |
+-------------------+--------------------------------------------------------+
| \                 |  = 2 in order of predicted posterior variance reduction|
+-------------------+--------------------------------------------------------+
| paoverpb_thresh   | the threshold of the ratio of observation space        |
| \                 | posterior variance divided by prior variance (<= 1.0), | 
| \                 | that is, the expected reduction of prior variance by   |
| \                 | the observations.                                      |
| \                 | Smaller values mean only those observations with       |
| \                 | stronger impact will be assimilated.                   |
| \                 | If =1, no thinning of observations is done. Typical    |
| \                 | values: 0.99 or 0.98.                                  |
+-------------------+--------------------------------------------------------+

Set Up the Analyses Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The analyses variables for both global and regional models are hardcoded
in the EnSRF. There are several options using the following parameters:

::

          nvars: number of 3d model variables to update.

For hydrostatic global models, there are typically 5, including U, V, T,
QVAPOR, Ozone. For non-hydrostatic regional models, like WRF, different
combinations of the 3D and 2D analyses variables are set as below:

::

   For ARW            (arw =true)
         nvars           =3: U, V, T, and MU
                         =4: U, V, T, QVAPOR, and MU
                         =5: U, V, T, QVAPOR, PH, and MU 
                         =6: U, V, W, T, QVAPOR, PH, and MU

   For NMM           (nmm =true)
          nvars         =3: U, V, T, and PD
                        =4: U, V, T, QVAPOR, and PD
                        =5: U, V, T, QVAPOR, CWM, and PD

The analyses variable list can be adjusted or additional variables, such
as moisture related variables, can be added to the list by modifying the
subroutines ``"gridinfo.F90"`` and ``"gridio.F90"`` accordingly.

+----------------+--------------------------------------------------------+
| *pseudo_rh*:   | use or not ’pseudo-rh’ analysis variable, as in GSI.   |
+----------------+--------------------------------------------------------+
| *cliptracers*: | if true, tracers are clipped to zero when read in,     |
| \              | and just before they are written out.                  |
+----------------+--------------------------------------------------------+

Set Up the Ensemble Backgrounds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following parameters define which background fields will be used in
the analyses and their dimensions:

========== ==========================================================
*regional* = true, perform regional EnSRF analyses using either ARW
\          or NMM inputs as the background.
\          =false, perform a global analysis. If either the parameter
\          nmm or wrf is set to true, it will be set to true.
*nmm*      if true, background comes from WRF NMM. When using
\          other background fields, set it to false.
*wrf*      if true, background comes from WRF ARW. When using
\          other background fields, set it to false.
*nlons*    number of grid points in longitude of the model background
*nlats*    number of grid points in latitude of the model background
*nlevs*    number of vertical levels of the model background
*nanls*    number of ensemble members
========== ==========================================================

Set Up Localization Distances
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In the current EnKF implementation, distances for localization can be
set separately in the northern hemisphere, tropics and southern
hemisphere, and in the horizontal, vertical and time dimensions, and for
different observation types using namelist parameters. The length scales
should be given in km for the horizontal, hours for time, and scale
heights (units of -:math:`\log(P/P_{ref})`) in the vertical.

There are two options to setup the localization distances in horizontal
and vertical. These are decided by the namelist variable
``"readin_localization"``:

======================
========================================================================
*readin_localization*: =.true., customized horizontal and vertical localization
\                      values varying with model levels are read in from the
\                      external text file ``"hybens_locinfo"``.
\                      = .false., the horizontal and vertical localization
\                      distances are set by the following parameters
*corrlengthnh*:        length for horizontal localization in km in the northern
\                      hemisphere (25N-90N, NH)
*corrlengthtr*:        length for horizontal localization in km in the tropics
\                      (25S-25N, TR)
*corrlengthsh*:        length for horizontal localization in km in the southern
\                      hemisphere (25S-90S, SH)
*lnsigcutoffnh*:       scale height for vertical localization in -:math:`\log(P/P_ref)` in NH.
*lnsigcutofftr*:       scale height for vertical localization in -:math:`\log(P/ P_ref)` in TR.
*lnsigcutoffsh*:       scale height for vertical localization in -:math:`\log(P/P_ref)` in SH.
======================
========================================================================

The text file ``"hybens_locinfo"`` contains vertical profile of
horizontal and vertical localization length scales (in e-folding scale).
These scales are converted to the full distance width (*1/0.388 in km)
of Gaspari-Cohn function, where it goes to zero. The horizontal and
vertical scales can be adjusted for each observation types easily in the
subroutine ``"read_locinfo.f90"``.

For satellite radiance and surface pressure observations, the vertical
localization distances are set separately using the following
parameters:

::

              lnsigcutoffsatnh, lnsigcutoffsattr, lnsigcutoffsatsh; and 
              lnsigcutoffpsnh, lnsigcutoffpstr, lnsigcutoffpssh

| There is also one option to setup the time localization window, which
  is the time away from the analyses time. This is decided by the
  following namelist variables:

============ ======================================================
*obtimelnh*: time in hours away from the analyses time to move 2800
\            km at 30 ms-1 in the northern hemisphere
*obtimeltr*: same as obtimelnh but for the tropics
*obtimelsh*: same as obtimelnh but for the southern hemisphere
============ ======================================================

The empirical determination/tuning of proper localization distances is
important for successful analyses with ensemble data assimilation. In
general, large horizontal scale observations, like GPS radio occultation
and radiosonde observations, require larger horizontal localization
distances to fully take advantage of the observations. On the other
hand, high horizontal resolution observations, like radar observations,
may need shorter horizontal localization distances. The same is true for
the localization distance in vertical.

Set Up Adaptive Posterior Inflation Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The inflation can be set up by the following parameters:

============== =====================================================
*anapertwtnh*: inflation parameter in NH.
*anapertwttr*: inflation parameter in TR.
*anapertwtsh*: inflation parameter in SH.
The parameters = 0 means no inflation.
\              = 1 means inflation all the way back to prior spread.
============== =====================================================

These are the key tuning parameters for the performance of EnSRF.
Typical values:  0.95.

The inflation factor fields can be smoothed out using the following
parameter:

.. table::

+-------------+-------------------------------------------------------------------+
| smoothparm: | parameter for smoothing inflation factor,                         |
+-------------+-------------------------------------------------------------------+
| \           | = -1 for no smoothing.                                            |
+-------------+-------------------------------------------------------------------+
| \           | > 0, the estimated inflation factor is smoothed using a           |
| \           | Gaussian spectral filter with an e-folding scale of the parameter |
+-------------+-------------------------------------------------------------------+
| latbound:   | where the transition latitude starts (=25N or 25S)                |
+-------------+-------------------------------------------------------------------+
| delat:      | latitude width of transition zone where the inflation             |
| \           | parameter is smoothed.                                            |
+-------------+-------------------------------------------------------------------+

The minimum and maximum inflation values allowed can be controlled by
the following parameters:

::

          covinflatemin: minimum inflation factor 
          covinflatemax: maximum inflation factor

Satellite Observations Related Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following parameters are used to setup the adaptive satellite bias
correction:

.. table::

+--------------+----------------------------------------------------------------+
| lupd_satbias:| = .true., update the satellite bias correction coefficients    |
| \            | using the adaptive bias correction scheme (when                |
| \            | the iterations of numiter > 1).                                |
+--------------+----------------------------------------------------------------+
| \            |= .false., the bias correction coefficients are not updated.    |
+--------------+----------------------------------------------------------------+
| numiter:     | number of iterations for state/bias correction                 |
|              | coefficients update. (only relevant when satellite             |
|              | radiances assimilated, i.e. nobs_sat>0)                        |
+--------------+----------------------------------------------------------------+
| biasvar:     | background error variance for radiance bias coefficients       | 
|              | (used in radbias.f90). The default is GSI value.               |
+--------------+----------------------------------------------------------------+
| saterrfact:  | factor to multiply satellite radiance errors.                  |
+--------------+----------------------------------------------------------------+
| sattypes_rad:| list of satellite observation types to be assimilated.         |
+--------------+----------------------------------------------------------------+
| sattypes_oz: | list of ozone observation types to be assimilated.             |
+--------------+----------------------------------------------------------------+

Observation QC Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to the observation quality flags decided in the GSI
observer, the EnSRF itself also conducts a gross check on the
observations.

The outlier test computes the difference between the observation value
and the prior ensemble mean. It then computes a standard deviation by
taking the square root of the sum of the observation error variance and
the prior ensemble variance for the observation. If the difference
between the ensemble mean and the observation value is more than the
specified number of standard deviations, then the observation is tossed.
The threshold of the check is set as following:

.. table::

+-----------+--------------------------------------------------------------------+
| sprd_tol:| tolerance for gross check of observations.                         |
| \         | Reject a observation if the prior mean is more than this many      |
| \         | :math:`\sqrt{(S+R)}` from the observation, where S is ensemble     |
| \         | prior variance and R is observation error variance.                |
| \         | Typical value:  3.0                                                |
+-----------+--------------------------------------------------------------------+

The following parameters are used to control if the GSI quality control
procedure is performed in EnSRF:

============== ==================================================================
 varqc:        = .false., no variational quality control performed (default)
 huber:        = .true., use huber norm instead of "flat-tail" (default: .false.)
 zhuberleft:   departure parameter of huber norm.
 zhuberright:  departure parameter of huber norm.
============== ==================================================================