Removal of Content Advisory - April 2024

Advisory to Numerical Weather Prediction (NWP) containers users: As of the beginning of April 2024, all support assets for Numerical Weather Prediction (NWP) containers will be removed from the DTC website. Users should download all reference materials of interest prior to April 2024.

NWP Containers Online Tutorial | NWP components > Running WRF on multiple nodes with Singularity

One of the main advantages of Singularity is its broad support for HPC applications, specifically its lack of root privilege requirements and its support for scalable MPI on multi-node machines. This page will give an example of the procedure for running this tutorial's WPS/WRF Singularity container on multiple nodes on the NCAR Cheyenne supercomputer. The specifics of running on your particular machine of interest may be different, but you should be able to apply the lessons learned from this example to any HPC platform where Singularity is installed.

 

Step-by-step instructions

Load the singularity, gnu, and openmpi modules

module load singularity

module load gnu

module load openmpi

Set up experiment per usual (using snow case in this example)

export PROJ_DIR=`pwd`
export PROJ_VERSION="4.1.0"

git clone git@github.com:NCAR/container-dtc-nwp -b v${PROJ_VERSION}

mkdir data/ && cd data/

tcsh bash
foreach f (/glade/p/ral/jntp/NWP_containers/*.tar.gz)
  tar -xf "$f"
end
for f in /glade/p/ral/jntp/NWP_containers/*.tar.gz; do tar -xf "$f"; done
export CASE_DIR=${PROJ_DIR}/snow

mkdir -p ${CASE_DIR} && cd ${CASE_DIR}

mkdir -p wpsprd wrfprd gsiprd postprd pythonprd metprd metviewer/mysql

export TMPDIR=${CASE_DIR}/tmp

mkdir -p ${TMPDIR}

Pull singularity image for wps_wrf from DockerHub

The Singularity containers used in this tutorial take advantage of the ability of the software to create Singularity containers from existing Docker images hosted on DockerHub. This allows the DTC team to support both of these technologies without the additional effort to maintain a separate set of Singularity recipe files. However, as mentioned on the WRF NWP Container page, the Docker containers in this tutorial contain some features (a so-called entrypoint script) to mitigate permissions issues seen with Docker on some platforms. Singularity on multi-node platforms does not work well with this entrypoint script, and because Singularity does not suffer from the same permissions issues as Docker, we have provided an alternate Docker container for use with Singularity to avoid these issues across multiple nodes:

singularity pull docker://dtcenter/wps_wrf:${PROJ_VERSION}_for_singularity

Create a sandbox so the container is stored on disk rather than memory/temporary disk space

In the main tutorial, we create Singularity containers directly from the Singularity Image File (.sif). For multi-node Singularity, we will take advantage of an option known as "Sandbox" mode:

singularity build --sandbox ${CASE_DIR}/wps_wrf ${CASE_DIR}/wps_wrf_${PROJ_VERSION}_for_singularity.sif

This creates a directory named "wps_wrf" that contains the entire directory structure of the singularity image; this is a way to interact with the Singularity container space from outside the container rather than having it locked away in the .sif file. You can use the ls command to view the contents of this directory, you will see it looks identical to the top-level directory structure of a typical linux install:

ls -al wps_wrf
total 75
drwxr-xr-x 18 kavulich ral  4096 Feb  8 13:49 .
drwxrwxr-x 11 kavulich ral  4096 Feb  8 13:49 ..
-rw-r--r--  1 kavulich ral 12114 Nov 12  2020 anaconda-post.log
lrwxrwxrwx  1 kavulich ral     7 Nov 12  2020 bin -> usr/bin
drwxr-xr-x  4 kavulich ral  4096 Feb  8 12:33 comsoftware
drwxr-xr-x  2 kavulich ral  4096 Feb  8 13:49 dev
lrwxrwxrwx  1 kavulich ral    36 Feb  8 13:42 environment -> .singularity.d/env/90-environment.sh
drwxr-xr-x 57 kavulich ral 16384 Feb  8 13:42 etc
lrwxrwxrwx  1 kavulich ral    27 Feb  8 13:42 .exec -> .singularity.d/actions/exec
drwxr-xr-x  4 kavulich ral  4096 Feb  8 12:52 home
lrwxrwxrwx  1 kavulich ral     7 Nov 12  2020 lib -> usr/lib
lrwxrwxrwx  1 kavulich ral     9 Nov 12  2020 lib64 -> usr/lib64
drwxr-xr-x  2 kavulich ral  4096 Apr 10  2018 media
drwxr-xr-x  2 kavulich ral  4096 Apr 10  2018 mnt
drwxr-xr-x  3 kavulich ral  4096 Dec 27 15:32 opt
drwxr-xr-x  2 kavulich ral  4096 Nov 12  2020 proc
dr-xr-x---  5 kavulich ral  4096 Dec 27 16:00 root
drwxr-xr-x 13 kavulich ral  4096 Dec 27 16:20 run
lrwxrwxrwx  1 kavulich ral    26 Feb  8 13:42 .run -> .singularity.d/actions/run
lrwxrwxrwx  1 kavulich ral     8 Nov 12  2020 sbin -> usr/sbin
lrwxrwxrwx  1 kavulich ral    28 Feb  8 13:42 .shell -> .singularity.d/actions/shell
lrwxrwxrwx  1 kavulich ral    24 Feb  8 13:42 singularity -> .singularity.d/runscript
drwxr-xr-x  5 kavulich ral  4096 Feb  8 13:42 .singularity.d
drwxr-xr-x  2 kavulich ral  4096 Apr 10  2018 srv
drwxr-xr-x  2 kavulich ral  4096 Nov 12  2020 sys
lrwxrwxrwx  1 kavulich ral    27 Feb  8 13:42 .test -> .singularity.d/actions/test
drwxrwxrwt  7 kavulich ral  4096 Feb  8 12:53 tmp
drwxr-xr-x 13 kavulich ral  4096 Nov 12  2020 usr
drwxr-xr-x 18 kavulich ral  4096 Nov 12  2020 var

You can explore this directory to examine the contents of this container, but be cautious not to make any modifications that could cause problems later down the road!

Run WPS as usual

The command for running WPS is similar to that used in the main tutorial. Specifically, the fact that we are using a sandbox rather than creating a container straight from the singularity image file, requires a change to the run command. Note the bold part that is different from the original tutorial:

singularity exec -B${PROJ_DIR}/data/WPS_GEOG:/data/WPS_GEOG -B${PROJ_DIR}/data:/data -B${PROJ_DIR}/container-dtc-nwp/components/scripts/common:/home/scripts/common -B${PROJ_DIR}/container-dtc-nwp/components/scripts/snow_20160123:/home/scripts/case -B${CASE_DIR}/wpsprd:/home/wpsprd -B ${CASE_DIR}/wrfprd:/home/wrfprd  ${CASE_DIR}/wps_wrf /home/scripts/common/run_wps.ksh

Prepare the wrfprd directory

Now this part is still a little hacky...but this will be cleaned up in future versions. Enter the wrfprd directory and manually link the met_em output files from WPS and rename them to the proper "nocolons" convention. Then, link in the contents of the WRF run directory containing the static input files and compiled executables from the container we created in a sandbox, and replace the default namelist with our case's custom namelist:

 

cd ${CASE_DIR}/wrfprd/

ln -sf ${CASE_DIR}/wps_wrf/comsoftware/wrf/WRF-4.3/run/* .

rm namelist.input

cp $PROJ_DIR/container-dtc-nwp/components/scripts/snow_20160123/namelist.input .

Finally, request as many cores/nodes as you want, reload the environment on compute nodes, and run!

qsub -V -I -l select=2:ncpus=36:mpiprocs=36 -q regular -l walltime=02:00:00 -A P48503002

ln -sf ${CASE_DIR}/wpsprd/met_em.* .

tcsh

bash

foreach f ( met_em.* )
setenv j `echo $f | sed s/\:/\_/g`
mv $f $j
end
for f in met_em.*; do mv "$f" "$(echo "$f" | sed s/\:/\_/g)"; done
mpiexec -np 4 singularity run -u -B/glade:/glade ${CASE_DIR}/wps_wrf ./real.exe

mpiexec -np 72 singularity run -u -B/glade:/glade ${CASE_DIR}/wps_wrf ./wrf.exe

Note: If you see "MPT ERROR:" or something similar, you may need to re-run the module load commands from the top of this page ( module load singularity; module load gnu; module load openmpi)

The rest of the tutorial can be completed as normal.

No