Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programming
Iulia Ilie1, Peter Dittrich2,3, Nuno Carvalhais1,4, Martin Jung1, Andreas Heinemeyer5, Mirco Migliavacca1, James I. L. Morison8, Sebastian Sippel1, Jens-Arne Subke6, Matthew Wilkinson8, and Miguel D. Mahecha1,3,71Max Planck Institute for Biogeochemistry, Department Biogeochemical Integration, Hans-Knoell-Str. 10, 07745 Jena, Germany 2Bio Systems Analysis Group, Institute of Computer Science, Jena Centre for Bioinformatics and Friedrich Schiller University, 07745 Jena, Germany 3Michael Stifel Center Jena for Data-Driven and Simulation Science, 07745 Jena, Germany 4CENSE, Departamento de Ciências e Engenharia do Ambiente, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal 5Department of Environment, Stockholm Environment Institute, University of York, York YO105NG, UK 6Biological and Environmental Sciences, School of Natural Sciences, University of Stirling, Stirling, UK 7German Centre for Integrative Biodiversity Research (iDiv), Deutscher Platz 5e, 04103 Leipzig, Germany 8Forest Research, Alice Holt Lodge, Farnham, Surrey, GU10 4LH, UK
Received: 29 Sep 2016 – Accepted for review: 04 Nov 2016 – Discussion started: 07 Nov 2016
Abstract. Accurate modelling of land-atmosphere carbon fluxes is essential for future climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments complemented with a steadily evolving body of mechanistic theory provides the main basis for developing the respective models. The strongly increasing availability of measurements may complicate the traditional hypothesis driven path to developing mechanistic models, but it may facilitate new ways of identifying suitable model structures using machine learning as well. Here we explore the potential to derive model formulations automatically from data based on gene expression programming (GEP). GEP automatically (re)combines various mathematical operators to model formulations that are further evolved, eventually identifying the most suitable structures. In contrast to most other machine learning regression techniques, the GEP approach generates models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (Random Forests, Support Vector Machines, Artificial Neural Networks, and Kernel Ridge Regressions). The case of real observations explores different components of terrestrial respiration at an oak forest in south-east England. We find that GEP retrieved models are often better in prediction than established respiration models. Furthermore, the structure of the GEP models offers new insights to driver selection and interactions. We find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. However, we also noticed that the GEP models are only partly portable across respiration components; equifinality issues possibly preventing the identification of a "general" terrestrial respiration model. Overall, GEP is a promising tool to uncover new model structures for terrestrial ecology in the data rich era, complementing the traditional approach of model building.
Ilie, I., Dittrich, P., Carvalhais, N., Jung, M., Heinemeyer, A., Migliavacca, M., Morison, J. I. L., Sippel, S., Subke, J.-A., Wilkinson, M., and Mahecha, M. D.: Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programming, Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-242, in review, 2016.