Using Threshold-Weighted Spatial Verification Methods to Compare a Data Driven Model to a Physical NWP Model

Date: -
Location: Virtual
Speaker:
Nicholas Loveday
Description:

Understanding the performance of the latest data-driven (pure AI) models is a pressing scientific question. While much of the evaluation in the past has focused on point-based verification, spatial verification techniques have received less attention. Additionally, there has been a lack of evaluation using threshold-weighted scoring rules to assess how well these models predict extremes.

In this talk, we demonstrate how combining threshold-weighted scoring rules with spatial verification techniques allows us to compare how well the HRRR and GraphCast models in predict extreme events. This verification approach has several advantages; a) it does not suffer from the double penalty issue within a specified radius, b) it can emphasize the performance of predicting extremes, c) it discourages hedging and does not reward biased forecasts, d) it can account for climatological differences when calculating the mean score across the domain, and e) it can be used to compare models with different grid resolutions. The verification approach demonstrated has the potential to be a useful tool in the future to complement other evaluation methods in a testing and evaluation framework.