Removal of Content Advisory - April 2024

Advisory to Numerical Weather Prediction (NWP) containers users: As of the beginning of April 2024, all support assets for Numerical Weather Prediction (NWP) containers will be removed from the DTC website. Users should download all reference materials of interest prior to April 2024.

NWP Containers Online Tutorial | Introduction > Container Software Commands and Tips

This page contains common commands and tips for Docker or Singularity (click link to jump to respective sections).

Docker

Common docker commands

Listed below are common Docker commands, some of which will be run during this tutorial. For more information on command-line interfaces and options, please visit: https://docs.docker.com/engine/reference/commandline/docker/

  • Getting help:
    • docker --help : lists all Docker commands
    • docker run --help : lists all options for the run command

 

  • Building or loading images:
    • docker build -t my-name . : builds a Docker image from Dockerfile
    • docker save my-name > my-name.tar : saves a Docker image to a tarfile
    • docker load < my-name.tar.gz : loads a Docker image from a tarfile

 

  • Listing images and containers:
    • docker images : lists the images that currently exist, including the image names and ID's
    • docker ps -a : lists the containers that currently exist, including the container names and ID's

 

  • Deleting images and containers:
    • docker rmi my-image-name: removes an image by name or ID
    • docker rmi $(docker images -q) or docker rmi `docker images -q` : removes all existing images
    • docker rm my-container-name : removes a container by name or ID
    • docker rm $(docker ps -a -q) or docker rm `docker ps -aq` : removes all existing containers
    • These commands can be forced with by adding -f

 

  • Creating and running containers:
    • docker run --rm -it \
      --volumes-from container-name \
      -v local-path:container-path \
      --name my-container-name my-image-name
       : creates a container from an image, where:

      • --rm : automatically removes the container when it exits
      • -it : creates an interactive shell in the container
      • --volumes-from : mounts volumes from the specified container
      • -v : defines a mapping between a local directory and a container directory
      • --name : assigns a name to the container
    • docker exec my-container-name : executes a command in a running container
    • docker-compose up -d : defines and runs a multi-container Docker application

Docker run commands for this tutorial

This tutorial makes use of docker run commands with several different arguments. Here is one example:

docker run --rm -it -e LOCAL_USER_ID=`id -u $USER` \
 -v ${PROJ_DIR}/container-dtc-nwp/components/scripts/common:/home/scripts/common \
 -v ${PROJ_DIR}/container-dtc-nwp/components/scripts/sandy_20121027:/home/scripts/case \
 -v ${CASE_DIR}/wpsprd:/home/wpsprd -v ${CASE_DIR}/gsiprd:/home/gsiprd -v ${CASE_DIR}/wrfprd:/home/wrfprd \
 --name run-sandy-wrf dtcenter/wps_wrf:${PROJ_VERSION} /home/scripts/common/run_wrf.ksh

The different parts of this command are detailed below:

  • docker run --rm -it
    • As described above, this portion creates and runs a container from an image, creates an interactive shell, and automatically removes the container when finished
  • -e LOCAL_USER_ID=`id -u $USER`
    • The `-e` flag sets an environment variable in the interactive shell. In this case, our containers have been built with a so-called "entrypoint" that automatically runs a script on execution that sets the UID within the container to the value of the LOCAL_USER_ID variable. In this case, we are using the command `id -u $USER` to output the UID of the user outside the container. This means that the UID outside the container will be the same as the UID inside the container, ensuring that any files created inside the container can be read outside the container, and vice versa
  • -v ${PROJ_DIR}/container-dtc-nwp/components/scripts/common:/home/scripts/common
    • As described above, the `-v` flag mounts the directory ${PROJ_DIR}/container-dtc-nwp/components/scripts/common outside of the container to the location /home/scripts/common within the container. The other -v commands do the same
  • --name run-dtc-nwp-derecho
    • Assigns the name "run-dtc-nwp-derecho" to the running container
  • dtcenter/wps_wrf:${PROJ_VERSION} 
    • The "${PROJ_VERSION}" tag of image "dtcenter/wps_wrf" will be used to create the container. This should be something like "4.0.0", and will have been set in the "Set Up Environment" step for each case
  • /home/scripts/common/run_wrf.ksh
    • Finally, once the container is created (and the entrypoint script is run), the script "/home/scripts/common/run_wrf.ksh" (in the container's directory structure) will be run, which will set up the local environment and input files and run the WRF executable.

Common Docker problems

Docker is a complicated system, and will occasionally give you unintuitive or even confusing error messages. Here are a few that we have run into and the solutions that have worked for us:

  • Problem: When mounting a local directory, inside the container it is owned by root, and we can not read or write any files in that directory

  • Solution: Always specify the full path when mounting directories into your container.

    • When "bind mounting" a local directory into your container, you must specify a full path (e.g. /home/user/my_data_directory) rather than a relative path (e.g. my_data_directory). We are not exactly sure why this is the case, but we have reliably reproduced and solved this error many times in this way.

  • Problem: Log files for executables run with MPI feature large numbers of similar error messages:

    • Read -1, expected 682428, errno = 1
      Read -1, expected 283272, errno = 1
      Read -1, expected 20808, errno = 1
      Read -1, expected 8160, errno = 1
      Read -1, expected 390504, errno = 1
      Read -1, expected 162096, errno = 1
    • etc. etc.

    • These errors are harmless, but their inclusion can mask the actual useful information in the log files, and in long WRF runs can cause the log files to swell to unmanageable sizes. 

  • Solution: Add the flag "--cap-add=SYS_PTRACE" to your docker run command

    • For example:

      docker run --cap-add=SYS_PTRACE --rm -it -e LOCAL_USER_ID=`id -u $USER` -v ${PROJ_DIR}/container-dtc-nwp/components/scripts/common:/home/scripts/common -v ${CASE_DIR}/wrfprd:/home/wrfprd -v ${CASE_DIR}/wpsprd:/home/wpsprd -v ${PROJ_DIR}/container-dtc-nwp/components/scripts/derecho_20120629:/home/scripts/case --name run-dtc-nwp-derecho dtcenter/wps_wrf:${PROJ_VERSION} /home/scripts/common/run_wrf.ksh -np 8

Singularity

Common Singularity commands

Listed below are common Singularity commands, some of which will be run during this tutorial. For more information on command-line interfaces and options, please visit: https://sylabs.io/guides/3.7/user-guide/quick_start.html#overview-of-the-singularity-interface

If a problem occurs during the building of a singularity image from dockerhub (for example, running out of disk space) it can result in a corrupted cache for that image causing it to not run properly, even if you re-pull the container. You can clear the cache using the command:

singularity cache clean --name image_file_name.sif

On the Cheyenne supercomputer, downloading and converting a singularity image file from Dockerhub may use too many resources and result in you being kicked off a login node. To avoid this, you can run an interactive job on the Casper machine using the `execdav` command (this example requests a 4-core job for 90 minutes)

execdav --time=01:30:00 --ntasks=4

Common Singularity problems

As with Docker, Singularity is a complicated system, and will occasionally give you unintuitive or even confusing error messages. Here are a few that we have run into and the solutions that have worked for us:

  • Problem: When running a singularity container, I get the following error at the end:

    • INFO:    Cleaning up image...
      ERROR:   failed to delete container image tempDir /some_directory/tmp/rootfs-979715833: unlinkat /some_directory/tmp/rootfs-979715833/root/root/.tcshrc: permission denied

  • Solution: This problem is due to temporary files created on disk when running Singularity outside of "sandbox" mode. They can be safely ignored