Session 9: Python Embedding
Session 9: Python EmbeddingMETplus Practical Session 9
In this session you will learn:
- What Python Embedding is
- How Python Embedding works for both point and gridded data with MET tools
- How to construct a Python script for Python Embedding
- How to use your Python script with MET tools
- How to use your Python script with METplus Wrappers
What is Python Embedding?
Put simply, Python Embedding is a way to allow users to write their own Python scripts which can be integrated into workflows that use MET tools and METplus Wrappers. The primary ways that Python Embedding is leveraged by MET users are reading data from a format that MET tools do not support, and deriving intermediate fields within a workflow that cannot be calculated by MET tools. A simplified workflow using the MET Grid-Stat tool with Python Embedding is shown below:
In this example, the user has a gridded analysis dataset in an HDF-5 file format ("HDF5 ANX") and a gridded forecast dataset in GRIB format ("GRIB FCST"). Grid-Stat has support to read the GRIB forecast data, but cannot read the HDF-5 data. Therefore, the user writes a Python script to open the HDF-5 file and prepare it for Grid-Stat. When Grid-Stat runs, the Python script is called and the data is handed off from the users' Python script directly to Grid-Stat in memory without writing a physical output file. There are cases when MET will write a physical file, which will be covered in the next chapter of this session along with other details of Python Embedding.
Python Embedding Overview
Python Embedding OverviewGeneral MET Python Embedding Elements
The MET tools support Python Embedding for both 2D planes of gridded data and point data. Both gridded data and point data have some specific requirements which are covered below. More generally, Python Embedding can be broken down into three key elements that are required by MET tools.
The first is a Python installation that is used by the MET tools when Python Embedding is requested. When MET is installed, the --enable-python compile flag must be used, which requires a local Python installation. After the MET tools are installed, the version of Python provided with the --enable-python compile flag is the version of Python that will be used by default when a user requests Python Embedding. Any Python packages installed in the version of Python that MET was compiled against are available to a user when using Python Embedding.
The second element of Python embedding is a Python Embedding Keyword. These keywords are used to instruct the MET tools that Python Embedding is being requested:
PYTHON_XARRAY is used when passing 2D planes of data in an Xarray DataArray object only
The third element of Python embedding is the absolute path to your Python script along with any command line arguments that the script requires. These three elements enable the user to invoke Python Embedding within the MET tools.
Details for Python Embedding Scripts with 2D Gridded Data
In your Python Embedding script, be sure to adhere to the following requirements:
- Your variable containing the 2D dataplane of gridded data must be named met_data. This applies to both NumPy N-dimensional array objects (for PYTHON_NUMPY), and Xarray DataArray objects (for PYTHON_XARRAY).
- For PYTHON_NUMPY, you must define a Python variable that is a dictionary named attrs in your script that contains the following keys and their respective values :
- valid
- init
- lead
- accum
- name
- long_name
- level
- units
- grid
- For PYTHON_XARRAY, your Xarray DataArray must have a dictionary of attributes attached to it (accessible via the .attrs method of the DataArray object), and the keys must match the keys listed about for PYTHON_NUMPY.
Details for Python Embedding Scripts with Point Data
In your Python Embedding script, be sure to adhere to the following requirements:
- If you are using Python Embedding with ascii2nc, your data must be in a nested list (i.e. "list of lists") representation of the MET 11-column point data format where each list is one of the 11 columns. The fastest way to achieve this is to use the Python package Pandas, and use the method to_list() on the Pandas DataFrame object. Additionally, the nested list variable must be named point_data.
- If you are using Python Embedding with other MET tools for point data such as plot_point_obs, point_stat, ensemble_stat, or point2grid, you must provide the point data in a special format that can be created from the MET 11-column format. This can be accomplished by creating a nested list (i.e. "list of lists") of the MET 11-column point data format and then using the helper Python function called convert_point_data(), which can be found in the met_point_obs class in ${MET_BUILD_BASE}/scripts/python/met_point_obs.py. Additionally, the variable that is returned from convert_point_data() must be named met_point_data which differs from using ascii2nc.
Advanced Python Requirements
In some cases, a user may require one or more Python packages that are not installed in the version of Python that was used when installing the MET tools. In this case, the user can set a special environment variable called MET_PYTHON_EXE, which contains the relative path to the "/bin" directory where the Python executable is that contains the Python packages the user requires.
NOTE: using MET_PYTHON_EXE will force MET to write data files to a temporary area and then read them in again, instead of receiving data directly from within memory. This may negatively effect (increase) workflow run time. In some cases this cannot be avoided (i.e. multiple users sharing a single MET installation), and allows users maximum accessibility to the Python ecosystem, but users should be aware it could increase run time.
Setup for Python Embedding Practice
In the next two sections, you will practice using Python Embedding for both gridded and point data using MET tools directly and also via METplus Wrappers. To prepare for those sections, please follow the setup instructions below:
csh:
bash:
Python Embedding for Gridded Data
Python Embedding for Gridded DataA Simple Gridded Data Example with MET Tools
To demonstrate how to use Python Embedding for gridded data, you will use some test data included with the MET installation, the MET plot_data_plane tool, and the sample Python Embedding script named my_gridded_pyembed.py that you created earlier in this session. There are four required elements to the command for using Python Embedding with plot_data_plane:
- The path to the plot_data_plane MET executable
- The Python Embedding keyword
- The plot_data_plane output_filename argument
- The plot_data_plane field_string argument, modified for Python Embedding
Let's build the command!
Element 1: Use your tutorial environment variable MET_BUILD_BASE to access plot_data_plane:
Element 2: For this exercise, we are using plot_data_plane which uses gridded data as input, so the keyword will be PYTHON_NUMPY. The data are not contained in an Xarray object, so we will not use PYTHON_XARRAY.
Element 3: Choose your own output filename. The plot_data_plane tool outputs an image in PostScript format so typically the filename will end with ".ps":
Element 4: The field_string argument for plot_data_plane. The field_string argument contains the name of the field being plotted, and optionally the level. For Python Embedding, the name of the field will be replaced with your Python Embedding script including any arguments that the Python Embedding script uses. The Python Embedding script being used here takes two arguments; the path to the input data file and any string you wish to represent the name of the data in the input file. No level information is used for Python Embedding:
Let's run the command!
Verify you are in the Python Embedding practice directory:
Copy each of the four elements from above to construct the full Python Embedding command for plot_data_plane:
View the output image
The output file is a PostScript graphic file that typically can only be viewed with certain software. If you do not have a display tool that can view PostScript files, you can use the ImageMagick convert command to convert to a PNG file type which may be easier to view:
A Simple Gridded Data Example with METplus Wrappers
Now instead of using plot_data_plane directly, you will practice the above example using the METplus Wrappers. For each of the four elements shown above, the equivalent configuration items for METplus Wrappers will be described. But first, you will need to set up a basic METplus Wrappers configuration file:
Verify you are in the Python Embedding practice directory:
Open a new file named my_gridded_pyembed.conf with a text editor:
Copy the following METplus configuration items into the file and save it when complete. Some basic elements are required to get the METplus wrappers to run. For this example, you need to provide some time looping information despite not actually looping over those times. You also need to provide a top-level output directory (OUTPUT_BASE):
LOOP_BY=VALID
VALID_BEG=20230101
VALID_END=20230101
VALID_TIME_FMT=%Y%m%d
OUTPUT_BASE={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/wrappers_gridded_output
For Element 1 above, the METplus Wrappers will find plot_data_plane through the MET_INSTALL_DIR and PROCESS_LIST configuration items. We also need the Wrappers to find the data through the METPLUS_DATA:
PROCESS_LIST=PlotDataPlane
METPLUS_DATA={ENV[METPLUS_DATA]}
For Element 2 above, we will use the PLOT_DATA_PLANE_INPUT_TEMPLATE configuration item:
For Element 3 above, we will use the PLOT_DATA_PLANE_OUTPUT_TEMPLATE configuration item, which includes the OUTPUT_BASE configuration item:
For Element 4 above, we will use the PLOT_DATA_PLANE_FIELD_NAME configuration item:
Now save the file, and run METplus wrappers:
You can verify that the output image my_gridded_pyembed_wrappers_plot.ps (which is found in your METplus OUTPUT_BASE) directory, exactly matches the image above when using plot_data_plane directly.
Python Embedding for Point Data
Python Embedding for Point DataA Simple Point Data Example with MET Tools
To demonstrate how to use Python Embedding for point data, you will use some test data included with the MET installation, the MET plot_point_obs tool, and the sample Python Embedding script named my_point_pyembed.py that you created earlier in this session. There are three required elements to the command for using Python Embedding with plot_point_obs:
- The path to the plot_point_obs MET executable
- The plot_point_obs nc_file argument, modified for Python Embedding
- The plot_point_obs ps_file argument
Let's build the command!
Element 1: Use your tutorial environment variable MET_BUILD_BASE to access plot_point_obs:
Element 2: The nc_file argument for plot_point_obs. The nc_file argument is typically the path to a netCDF data file containing point observations in a specific MET format created by a MET tool like pb2nc. However, you can also replace this argument with the path to a Python Embedding script. Unlike the gridded data example above, the Python Embedding keyword is included with the Python Embedding script rather than as a separate command line element. Since you are using point data, you will use PYTHON_NUMPY as the Python Embedding keyword. The Python Embedding script being used here only takes a single argument that is the path to the input data file:
Element 3: Choose your own output filename. The plot_point_obs tool outputs an image in PostScript format so typically the filename will end with ".ps":
Let's run the command!
Verify you are in the Python Embedding practice directory:
Copy each of the four elements from above to construct the full Python Embedding command for plot_data_plane:
View the output image
The output file is a PostScript graphic file that typically can only be viewed with certain software. If you do not have a display tool that can view PostScript files, you can use the ImageMagick convert command to convert to a PNG file type which may be easier to view:
A Simple Point Data Example with METplus Wrappers
Now instead of using plot_point_obs directly, you will practice the above example using the METplus Wrappers. For each of the three elements shown above, the equivalent configuration items for METplus Wrappers will be described. But first, you will need to set up a basic METplus Wrappers configuration file:
Verify you are in the Python Embedding practice directory:
Open a new file named my_point_pyembed.conf with a text editor:
Some basic elements are required to get the METplus wrappers to run. For this example, you need to provide some time looping information despite not actually looping over those times. You also need to provide a top-level output directory (OUTPUT_BASE):
LOOP_BY=VALID
VALID_BEG=20230101
VALID_END=20230101
VALID_TIME_FMT=%Y%m%d
OUTPUT_BASE={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/wrappers_point_output
For Element 1 above, the METplus Wrappers will find plot_point_obs through the MET_INSTALL_DIR and PROCESS_LIST configuration items:
PROCESS_LIST=PlotPointObs
For Element 2 above, we will use the PLOT_POINT_OBS_INPUT_TEMPLATE configuration item:
For Element 3 above, we will use the PLOT_POINT_OBS_OUTPUT_TEMPLATE configuration item:
Now save the file, and run METplus wrappers:
You can verify that the output image my_point_pyembed_wrappers_plot.ps (which is found in your METplus OUTPUT_BASE) directory, exactly matches the image above when using plot_point_obs directly.
Writing a Python Script for Python Embedding
Writing a Python Script for Python EmbeddingWriting a Python Script for Python Embedding
In this section, you will use Python Embedding to open a NetCDF file containing weather satellite brightness temperature data, and use plot_data_plane to create an image of this field. The reason you will use Python Embedding is because the brightness temperature data have units of degrees Kelvin, but you need to create an image with data plotted in degrees Celsius, so the Python Embedding script will perform the following functions:
- Open the NetCDF data file
- Convert the units of the gridded data from degrees Kelvin to degrees Celsius
- Construct the required data attributes
- Prepare the data for plot_data_plane
Verify you are in the Python Embedding practice directory:
Open an empty text file with the name practice_gridded_pyembed.pywith a text editor of your choice:
Now copy and paste each of the following code snippets into the file, then save it:
First, we will add the necessary modules:
import os
import sys
Obtain the command line argument controlling whether we will use a NUMPY or XARRAY object for Python Embedding:
Next, add the path to the input file:
Open the file with Xarray:
Construct a Python dictionary named attrs that contains the required attributes needed for Python Embedding:
attrs['valid'] = '20190521_010000'
attrs['init'] = '20190521_010000'
attrs['lead'] = '000000'
attrs['accum'] = '000000'
attrs['name'] = 'Temperature'
attrs['long_name'] = 'Brightness Temperature in Celsius'
attrs['level'] = 'Surface'
attrs['units'] = 'Celsius'
Construct a Python dictionary named grid_info that contains the required grid information needed for Python Embedding. In this example, some of these attributes are included in the sample file being read, so we will re-use those attributes where possible:
grid_info['type'] = satellite_temperature.attrs['Projection']
grid_info['hemisphere'] = satellite_temperature.attrs['hemisphere']
grid_info['name'] = 'GOES-16'
grid_info['scale_lat_1'] = float(satellite_temperature.attrs['scale_lat_1'])
grid_info['scale_lat_2'] = float(satellite_temperature.attrs['scale_lat_2'])
grid_info['lat_pin'] = float(satellite_temperature.attrs['lat_pin'])
grid_info['lon_pin'] = float(satellite_temperature.attrs['lon_pin'])
grid_info['x_pin'] = float(satellite_temperature.attrs['x_pin'])
grid_info['y_pin'] = float(satellite_temperature.attrs['y_pin'])
grid_info['lon_orient'] = float(satellite_temperature.attrs['lon_orient'])
grid_info['d_km'] = float(satellite_temperature.attrs['d_km'])
grid_info['r_km'] = float(satellite_temperature.attrs['r_km'])
grid_info['nx'] = float(1620)
grid_info['ny'] = float(1120)
Add the grid_info to the attrs dictionary:
Add an if/else block to change between PYTHON_NUMPY and PYTHON_XARRAY. If PYTHON_XARRAY is requested. then the Xarray DataArray is subset so a single variable is selected, and those data are converted to Celsius and renamed to met_data, and then the attrs are added to the Xarray DataArray object. If PYTHON_NUMPY is requested, then the Xarray DataArray is subset so a single variable is selected, and a NumPy N-dimensional array representation is requested from Xarray using .values on the DataArray, and it is renamed to met_data. Note that for PYTHON_NUMPY, no attrs are attached to met_data in any manner, it is simply enough for attrs to exist as a variable in your Python script:
met_data = satellite_temperature['channel_14_brightness_temperature']-273.15
met_data.attrs = attrs
elif numpy_or_xarray=='numpy':
met_data = satellite_temperature['channel_14_brightness_temperature'].values-273.15
else:
print("FATAL! MUST PROVIDE EITHER xarray OR numpy AS AN ARGUMENT TO practice_gridded_pyembed.py")
sys.exit(1)
Congratulations, You Just Wrote a Python Embedding Script!
Use Your Python Embedding Script with MET Tools
Use Your Python Embedding Script with MET ToolsUsing Your Python Embedding Script with MET Tools
This section will largely follow the instructions in section titled "Python Embedding for Gridded Data", except you will be using your own Python Embedding script.
Recall the four Required Elements for Python Embedding with gridded data:
- The path to the MET tool executable you will be using
- The Python Embedding keyword
- The required argument for the MET tool you are using that must be modified for Python Embedding
- Any additional required arguments for the MET tool you are using
In this section, you will use plot_data_plane to run your Python Embedding script. The script you wrote in the "Writing A Python Script For Python Embedding" section supports both PYTHON_NUMPY and PYTHON_XARRAY.
First, verify you are in the Python Embedding practice directory:
Now try both of the sample commands below to run your Python Embedding script:
PYTHON_NUMPY
PYTHON_XARRAY
The resulting image, regardless of which approach you take (except for the string annotation which will change), should look like the following:
Use Your Python Embedding Script with METplus Wrappers
Use Your Python Embedding Script with METplus WrappersUsing Your Python Embedding Script With METplus Wrappers
This section will largely follow the instructions in section titled "Python Embedding for Gridded Data", except you will be using your own Python Embedding script.
In this section, you will use METplus Wrappers to run plot_data_plane with your Python Embedding script. The script you wrote in the "Writing A Python Script For Python Embedding" section supports both PYTHON_NUMPY and PYTHON_XARRAY, so sample METplus wrappers configuration files are shown for both approaches below.
First, verify you are in the Python Embedding practice directory:
PYTHON_NUMPY
Open a new file named practice_gridded_pyembed_NUMPY.conf with a text editor.
Copy the following METplus Wrappers configuration items into the file and save it when complete. Recall there are some basic required elements that METplus wrappers needs to run including some time looping information that is not used for this example, as well as the top-level output directory:
LOOP_BY=VALID
VALID_BEG=20230101
VALID_END=20230101
VALID_TIME_FMT=%Y%m%d
OUTPUT_BASE={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/practice_pyembed_wrappers_output
MET_INSTALL_DIR={ENV[MET_BUILD_BASE]}
PROCESS_LIST=PlotDataPlane
PLOT_DATA_PLANE_INPUT_TEMPLATE=PYTHON_NUMPY
PLOT_DATA_PLANE_OUTPUT_TEMPLATE={OUTPUT_BASE}/practice_pyembed_wrappers_numpy_plot.ps
PLOT_DATA_PLANE_FIELD_NAME={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/practice_gridded_pyembed.py numpy
PYTHON_XARRAY
Open a new file named practice_gridded_pyembed_XARRAY.conf with a text editor.
Copy the following METplus Wrappers configuration items into the file and save it when complete. Recall there are some basic required elements that METplus wrappers needs to run including some time looping information that is not used for this example, as well as the top-level output directory:
LOOP_BY=VALID
VALID_BEG=20230101
VALID_END=20230101
VALID_TIME_FMT=%Y%m%d
OUTPUT_BASE={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/practice_pyembed_wrappers_output
MET_INSTALL_DIR={ENV[MET_BUILD_BASE]}
PROCESS_LIST=PlotDataPlane
PLOT_DATA_PLANE_INPUT_TEMPLATE=PYTHON_XARRAY
PLOT_DATA_PLANE_OUTPUT_TEMPLATE={OUTPUT_BASE}/practice_pyembed_wrappers_xarray_plot.ps
PLOT_DATA_PLANE_FIELD_NAME={ENV[METPLUS_TUTORIAL_DIR]}/python_embed/practice_gridded_pyembed.py xarray
If you completed the previous section using your Python Embedding script directly with plot_data_plane, then the image you generated from either of the approaches above should identically match that and look like this:
End of Session 9
End of Session 9Congratulations! You have completed Session 9!