error message with "error opening wrfinput for writing"

Submitted by lwj on Mon, 07/12/2021 - 01:08
Forum: Users | Forecast

Hello,

 

I compiled HWRF v3.9a with Intel18, netcdf4.4.4, pnetcdf-1.10.0, mvapich2-2.3 and successfully got wrf_nmm.exe and wrf.exe.

The geogrid.exe, ungrib.exe, metgrib.exe are successfully run while I run init_gfs_wrapper, but I met an error message as below.

--------------------------------

FATAL CALLED FROM FILE: <stdin> LINE: 778

real: error opening wrfinput for writing

---------------------------

[cli_0]: aborting job:

application called MPI Abort (MPI_COMM_WORLD, 1) - process 0

 

I don't know what is wrong.

Please help me out.

 

Thanks in advance.

Woojeong

 

For your reference, my module lists are as below.

sge

intel18/icc18

intel18/mkl18

intel18/impi18

intel18/mvapich2-2.3

intel18/hdf5-1.10.4

intel18/netcdf-4.4.4

intel18/cdo-1.9.5

intel18/pnetcdf-1.10.0

intel18/ncview-2.1.8

gnu48/ncl-6.5.0

Permalink

In reply to by linlin.pan

Dear Linlin,

 

Thank you for your reply.

Available disk is 76 T when I check its capacity.

Is there any possible  reason for the error?

========================================

Tried to open wrfinput_d01 writing: no valid io_form (    11)

FATAL CALLED FROM FILE: <stdin> LINE: 778

real: error opening wrfinput for writing.

========================================

 

If the error is caused to disk full or over quota problems, I think that geogrid, ungrib and metgrid.exe shouldn't be also run.

However, I got those output files (geo_nmm.d01.nc, met_nmm_d01.$date.nc)

Please let me know if you see anything suspicious.

Thanks,

Woojeong

Hi, Woojeong,

Are you running the model on NOAA  (Hera or Jet)/NCAR machine (e.g., Cheyenne), or Orion? I have access on those machines and can check that for you.

Thanks,

Linlin

 

 

Permalink

In reply to by linlin.pan

Dear Linlin,

 

I use Linux 3.10.0-862.9.1.e17_lustre.86_64 machine. 

However, I am afraid that the linux machine is not allowed to people outside.

Do you have any clue to solve the problem ?

 

Thank you for your consideration.

Woojeong

Hi, Woojeong,

The problem seems relate to system/disk setting. One way to test it is that copy the met*.nc and namelist files to any machine/place to run the real manually. That's why i mentioned NOAA/NCAR machines.

Thanks,

Linlin

Permalink

In reply to by linlin.pan

Dear Linlin,

 

I turned off the quilt in the namelist.input  

----------------------------------------------
&namelist_quilt
  poll_servers = F,
  nio_tasks_per_group = 0,
  nio_groups = 2,
----------------------------------------------

and rerun manually, but, I got another error messages as below. :(

-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[62013,1],9]
  Exit code:    1
--------------------------------------------------------------------------
taskid: 0 hostname: node10
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
real_nmm.exe       0000000001AC2F0E  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002ADE59AA86D0  Unknown               Unknown  Unknown
libc-2.17.so       00002ADE59CEB547  kill                  Unknown  Unknown
real_nmm.exe       0000000001B01CF8  for_dealloc_alloc     Unknown  Unknown
real_nmm.exe       0000000000B8CA2F  Unknown               Unknown  Unknown
real_nmm.exe       0000000000A65FF1  Unknown               Unknown  Unknown
real_nmm.exe       0000000000414879  Unknown               Unknown  Unknown
real_nmm.exe       0000000000411FEF  Unknown               Unknown  Unknown
real_nmm.exe       0000000000411C5E  Unknown               Unknown  Unknown
libc-2.17.so       00002ADE59CD7445  __libc_start_main     Unknown  Unknown
real_nmm.exe       0000000000411B69  Unknown               Unknown  Unknown
 

-------------------------------------------------------------------------

Please tell me any comments.

Thanks,

Woojeong

 

 

Permalink

In reply to by linlin.pan

Thank you for your reply.

 

Actually, I solved the problem !. The version of MPI conflicted with its NetCDF.

Thank you for your concern.

 

Meanwhile, I wonder that the loop should be iterated for a while in the post process as below.

I can see the right output files after post process despite of the below messages, but it takes time for the loop.

I would like to reduce the time to run HWRF.

Can you give me an answer, please ?

===============================================================

08/11 18:06:11.553 exhwrf_post (exhwrf_post.py:114) INFO: Post loop iteration took only 0 seconds, which is less than the threshold of 5 seconds.  Will sleep 20 seconds.
08/11 18:06:31.562 exhwrf_post (exhwrf_post.py:121) INFO: Done sleeping.
08/11 18:06:31.641 exhwrf_post (exhwrf_post.py:114) INFO: Post loop iteration took only 0 seconds, which is less than the threshold of 5 seconds.  Will sleep 20 seconds.
08/11 18:06:51.661 exhwrf_post (exhwrf_post.py:121) INFO: Done sleeping.
08/11 18:06:52.116 exhwrf_post (exhwrf_post.py:114) INFO: Post loop iteration took only 1 seconds, which is less than the threshold of 5 seconds.  Will sleep 20 seconds.