Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
https://doi.org/10.5194/gmd-2017-242
© Author(s) 2017. This work is distributed under
the Creative Commons Attribution 4.0 License.
Development and technical paper
27 Oct 2017
Review status
This discussion paper is a preprint. It is a manuscript under review for the journal Geoscientific Model Development (GMD).
Best practice regarding the three P's: profiling, portability and provenance when running HPC geoscientific applications
Wendy Sharples1,2,3, Ilya Zhukov1, Markus Geimer1, Klaus Goergen2,4, Stefan Kollet2,4, Sebastian Luehrs1, Thomas Breuer1, Bibi Naz2,4, Ketan Kulkarni1,4, and Slavko Brdar1,4 1Jülich Supercomputing Centre, Research Centre Jülich, Jülich, Germany
2Institute of Bio- and Geosciences, Agrosphere (IBG-3), Research Centre Jülich, Jülich, Germany
3Meteorological Institute, University of Bonn, Bonn, Germany
4Centre for High-Performance Scientific Computing in Terrestrial Systems, Geoverbund ABC/J, Jülich, Germany
Abstract. Geoscientific modeling is constantly evolving, with next generation geoscientific models and applications placing high demands on high performance computing (HPC) resources. These demands are being met by new developments in HPC architectures, software libraries, and infrastructures. New HPC developments require new programming paradigms leading to substantial investment in model porting, tuning, and refactoring of complicated legacy code in order to use these resources effectively. In addition to the challenge of new massively parallel HPC systems, reproducibility of simulation and analysis results is of great concern, as the next generation geoscientific models are based on complex model implementations and profiling, modeling and data processing workflows.

Thus, in order to reduce both the duration and the cost of code migration, aid in the development of new models or model components, while ensuring reproducibility and sustainability over the complete data life cycle, a streamlined approach to profiling, porting, and provenance tracking is necessary.We propose a run control framework (RCF) integrated with a workflow engine which encompasses all stages of the modeling chain: 1. preprocess input, 2. compilation of code (including code instrumentation with performance analysis tools), 3. simulation run, 4. postprocess and analysis, to address these issues.Within this RCF, the workflow engine is used to create and manage benchmark or simulation parameter combinations and performs the documentation and data organization for reproducibility. This approach automates the process of porting and tuning, profiling, testing, and running a geoscientific model. We show that in using our run control framework, testing, benchmarking, profiling, and running models is less time consuming and more robust, resulting in more efficient use of HPC resources, more strategic code development, and enhanced data integrity and reproducibility.


Citation: Sharples, W., Zhukov, I., Geimer, M., Goergen, K., Kollet, S., Luehrs, S., Breuer, T., Naz, B., Kulkarni, K., and Brdar, S.: Best practice regarding the three P's: profiling, portability and provenance when running HPC geoscientific applications, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2017-242, in review, 2017.
Wendy Sharples et al.
Wendy Sharples et al.
Wendy Sharples et al.

Viewed

Total article views: 178 (including HTML, PDF, and XML)

HTML PDF XML Total Supplement BibTeX EndNote
137 37 4 178 8 3 1

Views and downloads (calculated since 27 Oct 2017)

Cumulative views and downloads (calculated since 27 Oct 2017)

Viewed (geographical distribution)

Total article views: 178 (including HTML, PDF, and XML)

Thereof 176 with geography defined and 2 with unknown origin.

Country # Views %
  • 1

Saved

Discussed

Latest update: 20 Nov 2017
Publications Copernicus
Download
Short summary
Next generation geoscientific models are based on complex model implementations and workflows. Next generation HPC systems require new programming paradigms and code optimization. In order to meet the challenge of running complex simulations on new massively parallel HPC systems, we developed a run control framework which facilitates code portability, code profiling and provenance tracking to reduce both the duration and the cost of code migration and development, while ensuring reproducibility.
Next generation geoscientific models are based on complex model implementations and workflows....
Share