Reproducibility for the HWRF_for_V16 Branch

Submitted by peter.marinescu on Wed, 06/16/2021 - 10:06
Forum: Users | General

Hello again, 

I have run into a reproducibility issue with the HWRF_for_V16 branch -- namely if I run the mode forecast twice, I do not get the same result. I am judging reproducibility by comparing the *atcfunix files; although I also performed some "diff" commands on the output grib files and they were consistent with the reproducibility errors. 

Interestingly, the results are reproducible for the first two cycles of the storm (Humberto 2019, AL09), but the third cycle I am running is where the differences occur. 

Also, interestingly, I have run these first three cycles 8 times, and the results for the third cycle oscillate between two outcomes -- that is that 4/8 forecasts are identical, and the other 4/8 are identical.

I have reproduced the issue with scrubbing turned off:

The *atcfunix files are saved here for two of different experiments:

/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/09L16_3c
/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/09L16_3d

The scrub directories are here:

/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c
/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d

I also have the scrub directory data for one additional experiment that is identical to experiment 09L16_3c; it is experiment 09L16_3b, in case that is helpful as well.

The code base I am using is saved here: /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/H_V16

Also, note that I ran the forecasts for shorter (24 hours) to save on computational time for these experiments, but since the reproducibility issues start at time 0, I hope this will not be an issue.

Also, note that I also run 5 cycles of an Eastern Pacific Storm (Kiko 2019, EP13) 3 separate times and the results there were identical for these three runs. 

I know some others have also had some reproducibility issues with various versions of HWRF on different machines, so hopefully these output can help figure out part of the problem.

Would you be able to help me resolve/understand this reproducibility issue?

Thank you for your time.

Best,

Peter

 

Hi, Peter,

Adjust time step and high level optimization during compiling can break the b4b match. On the other hand, if the system is unstable/chaotic (e.g., Lorenz 1963), the b4b match can also be broken after a longer time integration.

I checked the third cycle run and the differences are small, for example,

AL, 09, 2019091306, 03, HWRF, 024, 264N,  782W,  38, 1004, XX,  34, NEQ, 0070, 0000, 0000, 0068, 1012,   53,  46,   0,   0,    ,   0,    ,   0,   0,           ,  ,   ,    ,   0,   0,   0,   0,       THERMO PARAMS,     -85,    1045,    -524, Y, 10, DT, -999

vs

AL, 09, 2019091306, 03, HWRF, 024, 264N,  780W,  36, 1003, XX,  34, NEQ, 0124, 0010, 0000, 0059, 1011,   33,  57,   0,   0,    ,   0,    ,   0,   0,           ,  ,   ,    ,   0,   0,   0,   0,       THERMO PARAMS,     -80,    1362,    -195, Y, 10, DT, -999

Thanks,

Linlin

Hello Linlin, 

Thank you very much for the reply.

From my understanding when compiling HWRF on the hera machine, the results should be perfectly reproducible. But based on your response, it seems like this is not always the case. Can you confirm whether HWRF on hera should or should not be reproducible.

Also, is your explanation consistent with the fact the the eastern pacific storm that I tested with the exact same code was perfectly reproducible for the 4 times I reran it. If the issue was associated with the compilation of the code, shouldn't the eastern pacific storm, which was run with the same code have the same reproducibility issues?

Also, can you clarify what you mean by the "b4b match?" I am not familiar with this term?

Lastly, while the changes are small initially, they lead to larger differences at later times in that one cycle and subsequent cycles. We are conducting OSE experiments for NOAA whereby we run the same forecast several times with the addition or removal of certain assimilated observations. Therefore, we need the forecasts to be exactly reproducible; otherwise, we cannot be certain whether differences in the forecasts are due to the changes in assimilated observations.

Can you help me understand whether running this branch of HWRF on hera should be reproducible or not?

Thank you again for your help and time.

Peter 

Hi, Peter,

b4b match means bit-for-bit match

According to your runs, the first two cycles are reproducible, and the third cycle is not reproducible.

In my understanding, the minor differences can be generated when the integration goes enough longer time. In other words, if the results for the third cycle are identical, the results from fourth cycle can still be different.

 For your experiment, you may try to use lower optimization in the compiling to see whether it can make the third cycle is reproducible.

On the hand, if the differences between different observation runs are bigger than the differences during the reproducibility test for the third cycle, the results still make sense. So the third cycle reproducibility issue would not be a big issue for your experiments.

Thanks,

Linlin

Hi Peter,

While Linlin is right that b4b results are never a guarantee, we have typically been able to get b4b results with HWRF in the past. If this has changed recently, we should try to understand where the differences are coming from, and see if we can make it b4b. I understand why this would be a priority for your OSE experiments.

I've been looking at your directories. The first instance I can find of non-b4b results is when the EnKF runs in the second cycle. The resulting analysis for each member differs, e.g., for mem032:                                                 

$ ~/diffwrf /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/2019091300/09L/intercom/enkf/analysis.mem032 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/2019091300/09L/intercom/enkf/analysis.mem032

 Just plot  F

Diffing /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/2019091300/09L/intercom/enkf/analysis.mem032 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/2019091300/09L/intercom/enkf/analysis.mem032

 Next Time 2019-09-13_00:00:00

     Field   Ndifs    Dims       RMS (1)            RMS (2)     DIGITS    RMSE     pntwise max

        PD    244288    2   0.8645125386E+05   0.8644140245E+05   3   0.9675E+02   0.6478E-01

         Q  18078120    3   0.8860288486E-02   0.8913885690E-02   2   0.6077E-03   0.5945E+00

         T  18077688    3   0.2599912060E+03   0.2599668442E+03   4   0.5512E+00   0.2042E-01

         U  18078050    3   0.9296688317E+01   0.9740591950E+01   1   0.1208E+01   0.2627E+00

         V  18078052    3   0.3862753240E+01   0.4059661530E+01   1   0.1148E+01   0.3351E+00

But the first guess for mem032 is identical:                                            

$ ~/diffwrf /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/2019091300/09L/enkf/firstguess.mem032 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/2019091300/09L/enkf/firstguess.mem032

 Just plot  F

Diffing /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/2019091300/09L/enkf/firstguess.mem032 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/2019091300/09L/enkf/firstguess.mem032

 Next Time 2019-09-13_00:00:00

     Field   Ndifs    Dims       RMS (1)            RMS (2)     DIGITS    RMSE     pntwise max

This suggests to me that the EnKF is the cause of the differences, which is what Jason demonstrated in a different email thread.

Jason and Zhan told me that they think the version of nco is responsible for these differences – i.e., it is changing the results slightly when it compresses and uncompresses the member files as they move back and forth from the com/ directory. But, the analysis.mem* files in the intercom/enkf directory haven't been touched by nco, and yet they differ between the experiments c and d. I could be missing something, but otherwise, I somewhat doubt that nco is the cause of the differences.

Have you determined anything else about this problem since you last replied to this thread? Did you try using different versions of nco?

Thanks,

evan

Hello Evan, 

Thank you for the detailed response and taking a look at this issue.

I had tried switching the version of nco, but the results for that third cycle continued to oscillate between the two outcomes. So, as you deduced above, it seems like the version of nco is not causing the difference. 

I had tried running a different storm (dorian) for its first four cycles. I had run it four times, and produced identical results for the first four cycles (2019082406-2019082500). I ran it our for 4 cycles, since the enkf picks up in the third cycle. I did not save all the output for these tests, but their atcfunix files are saved here: 

/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/05L16
/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/05L16_a
/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/05L16_b
/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/noscrub/05L16_c

I will note that for the 09L case with the oscillating differences, the ensda_relocate tasks did not start for any of the runs. I thought that was odd, so I just wanted to point it out. 

Other than that I have not had a determined anything else about the problem.

Thank you again for taking a look at this, and hope you are well.

Best,

Peter

Hello Evan, 

I just wanted to note that I have been speaking to Dr. Zhan Zhang, who has also been looking into this problem.

He noted that for my Humberto (09L) tests that the "wrfinput_d02" files in the following directories are identical:

/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/2019091218/09L/intercom/ensda/001/fcst/wrfinput_d02 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/2019091218/09L/intercom/ensda/001/fcst/wrfinput_d02

but the "wrfinput_d02" files in the following directories were different:

/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/com/2019091218/09L/nine09l.2019091218.ensda_001.wrfinput_d02 /scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3d/com/2019091218/09L/nine09l.2019091218.ensda_001.wrfinput_d02

And that these files should have the same content, except one is compressed. Therefore he suggested that the "ncks" compression may be the culprit. 

I wanted to give you a quick update.

Best and thank you for your help,

Peter

Hello again, 

Another quick update. I think we have ruled out that the ncks/nco was the issue. Differences as a result of the ncks compression only changed the metadata, which led us to believe that the actually variable values were changing.

Therefore, we believe the issue in my Humberto runs for the HWRF_for_V16 code on Hera is a result of differing initial conditions of the ensemble members at cycle 2 (2019091300).

  • In checking the pre-GSI and post-GSI files for D02 & D03 (wrfghost_d02 and wrfghost_d03 for each FGAT time, gsi_out_storm1ghost_parent, and gsi_out_storm1ghost), files were identical. This implies that the issue isn’t on the GSI side (i.e., the gsi analysis that goes into initializing the ensemble members isn’t causing the differences found).

  • On the ENKF side, the following 5 files in the 2019091300/09L/intercom/ensda/001/ directory were different in the two experiments that produced different results (/scratch1/BMC/qosap/Peter.Marinescu/HWRF_AeoTest/scrub/09L16_3c/ , .../09L16_3d/) but were the same in the two experiments that produced identical results (../09L16_3c, ../09L16_3b). 

    • /fcst/wrfinput_d01

    • /fcst/wrfinput_d02

    • /fcst/wrfout_d02_2019-09-13_00_00_00

    • /fcst/wrfout_d02_2019-09-13_06_00_00

    • /merge/wrfanl_d02

  • Also on the ENKF side, the initial forecast time for the ensemble members are different in the experiments that are different, but the same in the experiments that are identical (i.e., .../2019091300/09L/intercom/ensda.001.post-f00h00m/ensda.001.post-f00h00m/ensda.001.post-f00h00m-ensdadom.egrb). This suggests that the issue is happening before the ensda_init tasks kicks off.

  • Therefore, we think we isolated the issue as potentially happening in one (or more) of the following tasks: meanhx, enshx_init, or enkf. 

 

We aren’t sure where to dig into next so would appreciate any guidance from your ends.

Thank you for your help,

Peter

Hi, Peter,

Based on comments and suggestions from Evan and others, I got bit-for-bit match for the experiments after the following modifications:

1) using -O2 and -fp-model consistent in HWRF and WPS, 

2) using -O2 and -fp-model consistent in ensemble and gsi data assimilation, 

3) using -O0 and -fp-model consistent in hwrf_ensemble, hwrf_enkf, and hwrf_interpolate, and

4) using -O2 and -fp-model consistent in UPP 

Here is the running directory.

/scratch2/BMC/det/lpan/test/hwf/08262021/HWRF_AeoTest

Thanks,

Linlin