Estimated time: 25 min
Set the environment variable $EXPTDIR to your experiment directory to make navigation easier:
EXPTDIR=${SR_WX_APP_TOP_DIR}/../expt_dirs/${EXPT_SUBDIR}
The workflow can be run using the./launch_FV3LAM_wflow.sh script which contains the rocotorun and rocotostat commands needed to launch the tasks and monitor the progress. There are two ways to run the ./launch_FV3LAM_wflow.sh script: 1) manually from the command line or 2) by adding a command to the user’s crontab. Both options are described below.
To run the workflow manually:
./launch_FV3LAM_wflow.sh
Once the workflow is launched with the launch_FV3LAM_wflow.sh script, a log file named log.launch_FV3LAM_wflow will be created in $EXPTDIR.
To see what jobs are running for a given user at any given time, use the following command:
This will show any of the workflow jobs/tasks submitted by rocoto (in addition to any unrelated jobs the user may have running). Error messages for each task can be found in the task log files located in the $EXPTDIR/log directory. In order to launch more tasks in the workflow, you just need to call the launch script again in the $EXPTDIR directory as follows:
until all tasks have completed successfully. You can also look at the end of the $EXPT_DIR/log.launch_FV3LAM_wflow file to see the status of the workflow. When the workflow is complete, you no longer need to issue the ./launch_FV3LAM_wflow.sh command.
To run the workflow via crontab:
For automatic resubmission of the workflow (e.g., every 3 minutes), the following line can be added to the user’s crontab:
and insert the line:
The state of the workflow can be monitored using the rocotostat command, from the $EXPTDIR directory:
The workflow run is completed when all tasks have “SUCCEEDED”, and the rocotostat command will output the following:
=================================================================================
201906150000 make_grid 4953154 SUCCEEDED 0 1 5.0
201906150000 make_orog 4953176 SUCCEEDED 0 1 26.0
201906150000 make_sfc_climo 4953179 SUCCEEDED 0 1 33.0
201906150000 get_extrn_ics 4953155 SUCCEEDED 0 1 2.0
201906150000 get_extrn_lbcs. 4953156 SUCCEEDED 0 1 2.0
201906150000 make_ics 4953184 SUCCEEDED 0 1 16.0
201906150000 make_lbcs 4953185 SUCCEEDED 0 1 71.0
201906150000 run_fcst 4953196 SUCCEEDED 0 1 1035.0
201906150000 run_post_f000 4953244 SUCCEEDED 0 1 5.0
201906150000 run_post_f001 4953245 SUCCEEDED 0 1 4.0
...
201906150000 run_post_f048 4953381 SUCCEEDED 0 1 5.0
If something goes wrong with a workflow task, it may end up in the DEAD state:
=================================================================================
201906150000 make_grid 20754069 DEAD 256 1 11.0
This means that the DEAD task has not completed successfully, so the workflow has stopped. Go to the $EXPTDIR/log directory and look at the make_grid.log file, in this case, to identify the issue. Once the issue has been fixed, the failed task can be re-run using the rocotowind command:
rocotorewind -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10 -c 201906150000 -t make_grid
where -c specifies the cycle date (first column of rocotostat output) and -t represents the task name (second column of rocotostat output). After using rocotorewind, the next time rocotorun is used to advance the workflow, the job will be resubmitted.