HWRF, gsi_d02_wrapper, returncode=174

Submitted by lliu on Mon, 08/02/2021 - 09:45
We are installing and running HWRF package version 4.0a. We compiled no problem. 
 
In running HWRF, we completed Steps 1, 2, and 3 with no obvious mistakes in the information file. There are also results data in the output directory. Everything seems fine.
 
However, when we ran Step 4 "gsi_d02_wrapper" file, the program stopped with the error message in the log file:

07/30 09:57:55.421 hwrf.gsi_d02 (gsi.py:942) CRITICAL: [MainThread] GSI failed for <WRFDomain name=storm1ghost_parent> domain: exe('/usr/bin/srun')['--export=ALL','--cpu_bind=core','--distribution=block:block','/projects/ees/dhs-crc/aniya/HWRFmodel/hwrfrun/sorc/GSI/src/gsi.exe'].in('gsiparm.anl',string=False).out('stdout',append=False).env(OMP_NUM_THREADS=1, KMP_AFFINITY=scatter, KMP_NUM_THREADS=1, OMP_STACKSIZE=128M): non-zero exit status (returncode=174)

What is the problem here? Please help

Permalink

In reply to by linlin.pan

I suppose the threads number is 1, I see GSI_THREADS=1 in the out_gsi_d02 file. I suppose the GSI_THREADS will pass on to threads and then the OMP_NUM_THREADS will be assigned as the threads value in gsi.py

In the job submit file I put "#SBATCH -n 240", and also in the wrapper file we have "TOTAL_TASKS=240"

I saw the thread number in the log file but not the processor number, so i am wondering whether that information is passed correctly. One way to test is to run the gsi_d02 part manually to see whether it runs.

Thanks!

I don't know how to run the gsi_d02 part manually. Do you mean to use srun for gsi.exe, or the gsi.py, or the wrapper gsi_d02_wrapper directly?

Greatly appreciate your help! We are really stuck here for days, desperate.

Here is the job submit file:

#!/bin/bash

#

#SBATCH --job-name=gsi_d02

#SBATCH --output=out_gsi_d02

#SBATCH --error=log_gsi_d02

#SBATCH -p batch

#SBATCH -n 240

# Run your executable

./gsi_d02_wrapper

 

Here is the wrapper file:

. ./global_vars.sh

# Source the start file and holdvars
. "$startfile"
. "$holdvars"

if [ -z "$PYTHONPATH" ] ; then
        export PYTHONPATH=${USHhwrf}
else
        export PYTHONPATH=${PYTHONPATH}:${USHhwrf}
fi
export TOTAL_TASKS=64
export OMP_NUM_THREADS=2
cd $WORKhwrf

######################################################################
#   Main
######################################################################

export GSI_THREADS=2
export GSI_DOMAIN=D02
$USHhwrf/rocoto_pre_job.sh $EXhwrf/exhwrf_gsi.py

 

I mean going to the running directory and submit the running job directly without using wrapper to test that.