Poster

Community members

Model evaluation method affects the interpretation of machine learning models for identifying compound drivers of maize variability

Sweet, Lily-Belle1 , Zscheischler, J.1
  1. Helmholtz Centre for Environmental Research

Extreme impacts can be caused by the compounding effects of multiple drivers, such as weather events that might not individually be considered extreme. An example of this is the phenomenon of ‘false spring’, where a combination of a warm late winter or early spring, followed by a frost once the plants have entered a vulnerable stage of development, results in severe crop damage. The relationships between growing-season climate conditions and end-of-season crop yield are complex and nonlinear, and improving our understanding of such interactions could aid in narrowing the uncertainty in estimates of climate risk to food security. Additionally, data-driven methods that are capable of identifying such compounding effects could be useful for the study of other sectoral impacts.

Machine learning is an option for capturing such complex and nonlinear relationships for yield prediction. In order to extract these relationships, explainable or interpretable machine learning has been identified as a potential tool. However, the usefulness of those extracted interpretations is dependent on the assumption that the model has learned the expected relationships. One prerequisite for this assumption is that the model has sufficient predictive skill. The method chosen for measuring model performance is therefore an important methodological decision, but as yet the ‘best practice’ when handling spatiotemporal climate data is not clearly defined.

In this study we train machine learning models to predict maize yield variability from growing-season climate data, using global climate reanalysis data and corresponding driven process-based crop model output. We assess the impact of the cross-validation procedure used for model skill measurement on each step of the modelling process: hyperparameter tuning, feature selection, performance evaluation and model interpretation. We show that the method of evaluating model skill has significant impacts on results when using interpretable machine learning methods. Our results suggest that the design of the cross-validation procedure should reflect the purpose of the study and the qualities of the data used, which in our case are highly-correlated spatiotemporal climate and crop yield data.