- Log in to post comments
The stack size on Orion must be set to unlimited in order for HWRF (and HWRF-B) components to run with the memory they need. The default stack size is too small to accommodate large files and complex jobs. I mentioned the recurrence of the "stack size" problem on Orion in the HWRF Developer's call today (11/16). Even though the stack size is being explicitly set to "ulimit -s unlimited" in the ~/.bashrc file, it seems as though the HWRF system might not be recognizing that the default shell should be Bash. This had worked until recently, likely due to some change on the Orion side. As I was investigating the log files, I noticed this potential error in the header of the log file, during the environment/module loads:
++ __ms_ksh_test= /work/noaa/aoml-hafs1/galaka/HB20_orion/ush/hwrf_pre_job.sh.inc: line 54: __ms_function_name: unbound variable +++ cat ++ __ms_bash_test= ++ [[ ! -z '' ]] ++ [[ ! -z '' ]] ++ __ms_shell=sh
This unbound variable prevents the job from realizing the shell should be Bash, so it defaults to sh. The simple fix was to also set the stack size to be unlimited in ~/.profile, so that both "sh" and "bash" shells will have unlimited stack sizes. However, it seems as though hwrf_pre_job.sh.inc should be updated so the correct shell can be used on Orion. I checked the HWRF trunk version of hwrf_pre_job.sh.inc in case I am missing an update, but my version is identical.
I am open to ideas on how to proceed. The fix I mentioned allows the job to succeed, but I am guessing many more users will run into this issue on Orion.
Best,
Gus