MET Tool: PB2NC

MET Tool: PB2NC
IMPORTANT NOTE: If you are returning to the tutorial, you must source the tutorial setup script before running the following instructions. If you are unsure if you have done this step, please navigate to the Verify Environment is Set Correctly page.

PB2NC Tool: General

PB2NC Functionality

The PB2NC tool is used to stratify (i.e. subset) the contents of an input PrepBufr point observation file and reformat it into NetCDF format for use by the Point-Stat or Ensemble-Stat tool. In this session, we will run PB2NC on a PrepBufr point observation file prior to running Point-Stat. Observations may be stratified by variable type, PrepBufr message type, station identifier, a masking region, elevation, report type, vertical level category, quality mark threshold, and level of PrepBufr processing. Stratification is controlled by a configuration file and discussed on the next page.

The PB2NC tool may be run on both PrepBufr and Bufr observation files. As of met-6.1, support for Bufr is limited to files containing embedded tables. Support for Bufr files using external tables will be added in a future release.

For more information about the PrepBufr format, visit:

https://emc.ncep.noaa.gov/emc/pages/infrastructure/bufrlib.php

For information on where to download PrepBufr files, visit:

https://dtcenter.org/community-code/model-evaluation-tools-met/input-data

PB2NC Usage

View the usage statement for PB2NC by simply typing the following:
pb2nc
Usage: pb2nc  
  prepbufr_file input prepbufr path/filename
  netcdf_file output netcdf path/filename
  config_file configuration path/filename
  [-pbfile prepbufr_file] additional input files
  [-valid_beg time] Beginning of valid time window [YYYYMMDD_[HH[MMSS]]]
  [-valid_end time] End of valid time window [YYYYMMDD_[HH[MMSS]]]
  [-nmsg n] Number of PrepBufr messages to process
  [-index] List available observation variables by message type (no output file)
  [-dump path] Dump entire contents of PrepBufr file to directory
  [-obs_var var] Sets the variable list to be saved from input BUFR files
  [-log file] Outputs log messages to the specified file
  [-v level] Level of logging
  [-compression level] NetCDF file compression

At a minimum, the input prepbufr_file, the output netcdf_file, and the configuration config_file must be passed in on the command line. Also, you may use the -pbfile command line argument to run PB2NC using multiple input PrepBufr files, likely adjacent in time.

When running PB2NC on a new dataset, users are advised to run with the -index option to list the observation variables that are present in that file.

cindyhg Tue, 06/25/2019 - 09:25

Configure

Configure

PB2NC Tool: Configure

Start by making an output directory for PB2NC and changing directories:
mkdir -p ${METPLUS_TUTORIAL_DIR}/output/met_output/pb2nc
cd ${METPLUS_TUTORIAL_DIR}/output/met_output/pb2nc

The behavior of PB2NC is controlled by the contents of the configuration file passed to it on the command line. The default PB2NC configuration may be found in the data/config/PB2NCConfig_default file.

Prior to modifying the configuration file, users are advised to make a copy of the default:
cp ${MET_BUILD_BASE}/share/met/config/PB2NCConfig_default PB2NCConfig_tutorial_run1
Open up the PB2NCConfig_tutorial_run1 file for editing with your preferred text editor.
vi PB2NCConfig_tutorial_run1

The configurable items for PB2NC are used to filter out the PrepBufr observations that should be retained or derived. You may find a complete description of the configurable items in the pb2nc configuration file section of the MET User's Guide or in the Configuration File Overview.

For this tutorial, edit the PB2NCConfig_tutorial_run1 file as follows:

  • Set:
    message_type = [ "ADPUPA", "ADPSFC" ];

    to retain only those 2 message types. Message types are described in:
    http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_1.htm

  • Set:
    obs_window = {
       beg = -1800;
       end =  1800;
    }

    so that only observations within 1800 second (30 minutes) of the file time will be retained.

  • Set:
    mask = {
       grid = "G212";
       poly = "";
    }

    to retain only those observations residing within NCEP Grid 212, on which the forecast data resides.

  • Set:
    obs_bufr_var = [ "QOB", "TOB", "UOB", "VOB", "D_WIND", "D_RH" ];

    to retain observations for specific humidity, temperature, the u-component of wind, and the v-component of wind and to derive observation values for wind speed and relative humidity.

    While we are request these observation variable names from the input file, the following corresponding strings will be written to the output file: SPFH, TMP, UGRD, VGRD, WIND, RH. This mapping of input PrepBufr variable names to output variable names is specified by the obs_prepbufr_map config file entry. This enables the new features in the current version of MET to be backward compatible with earlier versions.

Next, save the PB2NCConfig_tutorial_run1 file and exit the text editor.
cindyhg Tue, 06/25/2019 - 09:26

Run

Run

PB2NC Tool: Run

Next, run PB2NC on the command line using the following command:
pb2nc \
${METPLUS_DATA}/met_test/data/sample_obs/prepbufr/ndas.t00z.prepbufr.tm12.20070401.nr \
tutorial_pb_run1.nc \
PB2NCConfig_tutorial_run1 \
-v 2
If this run fails due to runtime issues, please download a copy of the output file here and manually save it as tutorial_pb_run1.nc.

PB2NC is now filtering the observations from the PrepBufr file using the configuration settings we specified and writing the output to the NetCDF file name we chose. This should take a few minutes to run. As it runs, you should see several status messages printed to the screen to indicate progress. You may use the -v command line option to turn off (-v 0) or change the amount of log information printed to the screen.

Inspect the PB2NC status messages.

If you'd like to filter down the observations further, you may want to narrow the time window or modify other filtering criteria. We will do that after inspecting the resultant NetCDF file.

cindyhg Tue, 06/25/2019 - 09:27

Output

Output

PB2NC Tool: Output

When PB2NC is finished, you may view the output NetCDF file it wrote using the ncdump utility.

Run the following command to view the header of the NetCDF output file:
ncdump -h tutorial_pb_run1.nc

In the NetCDF header, you'll see that the file contains nine dimensions and nine variables. The obs_arr variable contains the actual observation values. The obs_qty variable contains the corresponding quality flags. The four header variables (hdr_typ, hdr_sid, hdr_vld, hdr_arr) contain information about the observing locations.

The obs_varobs_unit, and obs_desc variables describe the observation variables contained in the output. The second entry of the obs_arr variable (i.e. var_id) lists the index into these array for each observation. For example, for observations of temperature, you'd see TMP in obs_varKELVIN in obs_unit, and TEMPERATURE OBSERVATION in obs_desc. For observations of temperature in obs_arr, the second entry (var_id) would list the index of that temperature information.

Inspect the output of ncdump before continuing.

Plot-Point-Obs

The plot_point_obs tool plots the location of these NetCDF point observations. Just like plot_data_plane is useful to visualize gridded data, run plot_point_obs to make sure you have point observations where you expect.

Run the following command:
plot_point_obs \
tutorial_pb_run1.nc \
tutorial_pb_run1.ps
Display the output PostScript file by running the following command:
gv tutorial_pb_run1.ps &

Each red dot in the plot represents the location of at least one observation value. The plot_point_obs tool has additional command line options for filtering which observations get plotted and the area to be plotted.

View its usage statement by running the following command:
plot_point_obs

By default, the points are plotted on the full globe.

Next, try rerunning plot_point_obs using the -data_file option to specify the grid over which the points should be plotted:
plot_point_obs \
tutorial_pb_run1.nc \
tutorial_pb_run1_zoom.ps \
-data_file ${METPLUS_DATA}/met_test/data/sample_fcst/2007033000/nam.t00z.awip1236.tm00.20070330.grb

MET extracts the grid information from the first record of that GRIB file and plots the points on that domain.

Display the output PostScript file by running the following command:
gv tutorial_pb_run1_zoom.ps &

The plot_data_plane tool can be run on the NetCDF output of any of the MET point observation pre-processing tools (pb2ncascii2ncmadis2nc, and lidar2nc).

cindyhg Tue, 06/25/2019 - 09:29

Reconfigure and Rerun

Reconfigure and Rerun

PB2NC Tool: Reconfigure and Rerun

Now we'll rerun PB2NC, but this time we'll tighten the observation acceptance criteria.

Start by making a copy of the configuration file we just used:
cp PB2NCConfig_tutorial_run1 PB2NCConfig_tutorial_run2
Open up the PB2NCConfig_tutorial_run2 file and edit it as follows:
vi PB2NCConfig_tutorial_run2
  • Set:
    message_type = [];

    to retain all message types.

  • Set:
    obs_window = {
       beg = -25*30;
       end =  25*30;
    }

    so that only observations 25 minutes before and 25 minutes after the top of the hour are retained.

  • Set:
    quality_mark_thresh = 1;

    to retain only the observations marked "Good" by the NCEP quality control system.

Next, run PB2NC again but change the output name using the following command:
pb2nc \
${METPLUS_DATA}/met_test/data/sample_obs/prepbufr/ndas.t00z.prepbufr.tm12.20070401.nr \
tutorial_pb_run2.nc \
PB2NCConfig_tutorial_run2 \
-v 2
If this run fails due to runtime issues, please download a copy of the output file here and manually save it as tutorial_pb_run2.nc.
Inspect the PB2NC status messages and note that fewer observations were retained than the previous example.

The majority of the observations were rejected because their valid time no longer fell inside the tighter obs_window setting.

When configuring PB2NC for your own projects, you should err on the side of keeping more data rather than less. As you'll see, the grid-to-point verification tools (Point-Stat and Ensemble-Stat) allow you to further refine which point observations are actually used in the verification. However, keeping a lot of point observations that you'll never actually use will make the data files larger and slightly slow down the verification. For example, if you're using a Global Data Assimilation (GDAS) PREPBUFR file to verify a model over Europe, it would make sense to only keep those point observations that fall within your model domain.

cindyhg Tue, 06/25/2019 - 09:31