GSI runtime error on Orion: "/work/noaa/aoml-hafs1/lgramer/HB20_orion/exec/hwrf_gsi: error while loading shared libraries: libnetcdf.so.15: cannot open shared object file: No such file or directory"

Submitted by lew.gramer on Fri, 10/16/2020 - 16:05
Forum: Developers | HPC

We are attempting a test run of HWRF-B on Orion, using build configuration and data staged by Mrinal Biswas. Source (and rocoto subdirectory) here:

/work/noaa/aoml-hafs1/lgramer/src/HB20_orion/rocoto

Model outputs, including logs, here:

/work/noaa/aoml-hafs1/lgramer/pytmp/HB20_orion/2018100812/00L/hwrf_gsi_d02_storm3.log

We encountered an identical error from our *init* tasks in a prior attempt. However, when I cleaned and remade all source, the init tasks were able to complete. However, despite having cleaned and remade the contents of sorc/ProdGSI multiple times, including deleting the object files and executables by hand before remaking, this issue persists with our gsi tasks. 

I am guessing there is something missing in the LD path (and perhaps a module version issue?), but suggestions are very welcome!

 

--Lew

 

Hi Lew,

Thanks for letting us know about this issue on Orion. As we discussed on the call this morning, there have been some changes to Orion's software stack recently, and that is probably why you cannot run there currently. Fixing it will likely require some adjustments to the modules that HWRF loads at compile and runtime. We will try to reproduce your error using the current trunk and come up with a fix. I'll get back to you when we know more.

Cheers,

evan

 

I just pushed a fix for this issue to the HWRF trunk. The fix is to load the netcdf and hdf5 modules last on Orion. In modulefiles/orion/HWRF/build:

 module load intel/2018.4

 module load impi/2018.4

 module load pnetcdf/1.12.0

-module load netcdf/4.7.2

-module load hdf5/1.10.5

 module load wgrib/2.0.8

 module load contrib

 module load rocoto/1.3.2

 module load nco

+module load netcdf/4.7.2

+module load hdf5/1.10.5

This may seem like a trivial change, but by loading the netcdf and hdf5 modules after nco, it stops nco from loading its own preferred versions of netcdf and hdf5 (which caused the library problem that you reported). In my test, the GSI jobs complete successfully after making this change. However, I did not perform a full run.

 

After making this change,

cd to your top-level HWRF code directory

Load the new modules:

module purge

module use modulefiles/orion

module load HWRF/build

Clean and rebuild the GSI code:

cd sorc/ProdGSI

./clean -a

cd ush/

./build_all_cmake_hwrf.sh

Install the new GSI executable:

cd to your top-level HWRF code directory

cd sorc/

make install

Then rewind and rerun the GSI jobs. Please let us know if there are any lingering issues.

Lew, when I tried to run subsequent jobs after the GSI, I found that I had to rebuild the entire system after making the above changes to the orion modulefile, or there would be library errors in the other jobs.

Thanks, Evan. I tried the suggested tweak (so simple!), and was able to get our GSI tasks to complete on Orion! Unfortunately, a very similar issue caused all succeeding "merge" tasks to fail, so I did a make clean_dist (just as you suggest in your comment above - which I don't think I got an email about??) and have now succeeded in starting my first forecast there. So issue resolved!

I also wanted to suggest a potentially important addition to the forum interface: CCs. For example, I wanted to add a CC to this forum issue, so that my colleague Gus would also receive these updates, but I cannot see how to do that. (That's why I'm replying via email, by the way.)

Lew

Permalink

In reply to by lew.gramer

Glad to hear it. I will ask about whether we can include a CC field. I think you did not receive an email about my previous message because I submitted it as a new comment instead of replying to your old message. It is good to know that I have to hit the Reply button to ensure that you receive an email. I think you can adjust this setting on a per-topic basis on your end by clicking "Notify me when new comments are posted...all comments" instead of "Notify me when new comments are posted...Replies to my comment." However, I'll make sure to adjust what I'm doing on my end.