The coupling of Earth system model components, which work on different grids, into an Earth System Model (ESM) provokes the necessity to transfer data from one grid to another. Additionally, each of these model components might require data import onto its specific grid. Usually, one of two approaches is used: Either all input data is preprocessed to the employed grid, or the imported data is interpolated on-line, i.e. during model integration to the required grid. For the former, each change in the model resolution requires the re-preprocessing of all data. The latter option implies that in each model integration computing time is required for the grid mapping. If all components of an ESM use only one single point of import and the same mapping software, only one software package needs to be changed for code optimisation, inclusion of additional interpolation methods or the implementation of new data formats.
As the Modular Earth Submodel System (MESSy) is mainly used for research
purposes which require frequent changes of the model setup including
the model resolution or the application of different sets of
input data (e.g., different emission scenarios), the idea of a common
procedure for data import was implemented in MESSy in form of the
infrastructure submodel IMPORT.
Currently, IMPORT consists of two submodels: IMPORT_TS for reading
and processing
abstract time series data and IMPORT_GRID, utilising the
infrastructure submodel GRID which
provides procedures for grid transformations using the remapping
software packages NREGRID
This article describes the main functionalities of the two MESSy infrastructure submodels GRID and IMPORT. The Supplement of this article contains stand-alone tools of both IMPORT subsubmodels, IMPORT_TS and IMPORT_GRID. Their handling is explained in detail in the IMPORT User Manual which is also part of the Supplement.
An important part of Earth System Model (ESM) infrastructure is input
and output of data (I/O). While the output is centralised in most
models, input is often performed directly where it is needed, i.e.,
corresponding routines are spread throughout the model. Usually ESMs
require lots of input, e.g., the land-sea
mask, land types, leaf area indices or – for atmospheric chemistry
models – emission maps, and so forth.
For many models (e.g., ECHAM, COSMO, CESM1)
these external data need to be preprocessed for each attributable
model resolution and (in case of a regional model) domain, because the model
requires the input data to be on its own grid. This is the fastest
method with respect to model run-time performance and
therefore might be the best solution for operational application
of models, e.g., in weather prediction.
In contrast to an operational model,
for a research model a change of the input data is a frequent procedure,
e.g., a change of the horizontal resolution or of an emission
inventory. In terms of high flexibility of the system,
it seems a more desirable design of a research model, to store
all possible input data only in the finest available
resolution, while the model
itself transforms the data to the respective model grid.
This approach is less storage space consumptive and more flexible
as the repeated preprocessing of all required input data. On the other side,
more computing time is required during the model integration for the
on-line remapping. This was
already implemented for the EMAC The ECHAM/MESSy Atmospheric
Chemistry (EMAC) model is a numerical chemistry and climate
simulation system that includes sub-models describing tropospheric
and middle atmosphere processes and their interaction with oceans,
land and human influences Note: The infrastructure submodel
previously used in EMAC is named NCREGRID, while the remapping
algorithm itself is called NREGRID. It is much easier to keep track of the data imported, as all
imported data is listed in one namelist. All data is handled consistently and the usage of additional or new import
data is less error-prone. The outline of the model code is much clearer, as not each model part
depending on input data needs to include importing and regridding routines. In case of import optimisation only one source code needs to be
changed. For code extensions, i.e., introduction of new file formats or
new mapping routines, the corresponding routines have to be added only at one
point in the model code.
Figure Gridded here implies geo-referenced.
In order to unify the usage of the different mapping softwares, the
so-called “geo-hybrid grid” structure, as already defined in
NCREGRID, is extended and
conversion routines between the
grid definitions required by NREGRID and SCRIP are provided.
The mapping routines are not only used during data import, but also
for grid transformations
within the model, e.g., for mapping between the ocean and the
atmospheric grid when MPIOM is used as a MESSy submodel in EMAC
In the following the functionality of the generic MESSy submodels GRID
and IMPORT are described.
Information about the general usage of these submodels is provided
here. Further details and more technical information, required for
model developers to implement the remapping routines into their own
code, are supplied in the user manuals in the Supplement of this
article.
Section
The generic MESSy submodel GRID builds the basis for all required grid
transformations.
Most of its internal data types follow the netCDF data format definitions.
The hierachical data structures follow mostly those of NCREGRID
The submodel core layer (SMCL) of GRID contains the definition of the geo-hybrid grid structure, i.e. a grid defined horizontally by geographical longitude and latitude and vertically by hybrid pressure coefficients. The structure provides all information required for the grid conversion. For different types of grids different containers for the definition of the horizontal grid are specified. The remapping algorithms automatically applies the correct conversion routines, depending on the containers filled. The details are explained in the “GRID User Manual”, which is part of the Supplement. The GRID SMCL routines also comprise subroutines for the handling of the grid structures, i.e., routines for initialising, copying, importing, exporting and printing a variable of the grid structure type. Beyond that, routines necessary for defining a grid, storing it in a concatenated list, locating an already defined grid within this list, and for comparing grids are part of the GRID SMCL.
The main target of the GRID submodel is to provide routines for the transformation of gridded geo-located data. So far, two different transformation algorithms are part of GRID_TRAFO: NREGRID and SCRIP. While the core mapping algorithms differ, GRID_TRAFO provides unified interfaces for the conversion between different grids.
NREGRID, the mapping algorithm and the core of NCREGRID, is
a recursive algorithm, which is applicable to arbitrary orthogonal
(including curvi-linear) grids of any dimension. The algorithm does not apply
a point-to-point interpolation, but a transformation based on overlaps
between the different grid volumes. Details about the algorithm
applied have been published by
As NREGRID is limited to the remapping between orthogonal grids, the
implementation of an algorithm able to interpolate between different
curvi-linear or even unstructured grids became necessary. To reach
this aim the SCRIP
software
Unfortunately, SCRIP is only a software for horizontal grid transformation. The easiest way to add vertical remapping is to use NREGRID for the vertical grid transformation, after the horizontal remapping via SCRIP is conducted. Additional vertical interpolation schemes can be easily added in the future.
The backbone of each model is its grid, e.g., for an atmospheric model, the horizontal space is given by a definition of the longitudes and latitudes of the models grid midpoints and the grid corners. The vertical space is defined by a height or pressure coordinate. As this grid is the reference for most submodels and processes, this grid is defined in the basemodel interface layer (BMIL) for the usage in all MESSy submodels. Most importantly, it is used by IMPORT as the default target grid for data import. Additionally, the BMIL of GRID allows to broadcast the geo-hybrid grid structure. This is required if a geo-hybrid grid is initially defined on one parallel task, but is used by all parallel tasks later in the simulation.
IMPORT supplies MESSy with a standardised interface for data
import. So far IMPORT includes submodels
for import of abstract time series data (IMPORT_TS) and for gridded
(time-dependent or static) data (IMPORT_GRID).
If required, IMPORT can be easily expanded by additional
subsubmodels to import other data representations.
In this way, all data traffic into the model is managed by IMPORT,
while the generic MESSy submodel
CHANNEL
Both, IMPORT_TS and IMPORT_GRID are namelist controlled. The following sections give an overview of the submodels and explain basic setups of the IMPORT namelists. Further details about the namelist settings and additional information for model developers are provided in the “IMPORT User Manual”, which is part of the Supplement of this article.
Currently, two horizontal mapping algorithms (NREGRID and SCRIP) are available in IMPORT_GRID. The default scheme depends on the basemodel, e.g., for the regional COSMO/MESSy model SCRIP is automatically chosen, as the COSMO model domain is usually defined on a rotated grid and thus NREGRID is not applicable. NREGRID is the default for the global EMAC model.
The imported data is made available to the other MESSy submodels as
CHANNEL objects using the data infrastructure submodel
CHANNEL For more information see
As default, IMPORT_GRID assumes the basemodel grid to be the target
grid for the imported data. This grid is defined in the BMIL of GRID
(Sect.
The mechanism driving IMPORT_GRID is the same as described for OFFLEM
Each namelist entry consists of four different parts: the TIMER
information, the name, the counter and the action string.
The TIMER information directly relates to the definition of
an The name, here The counter provides the information which time steps from the data
file are to be read. In this example, the second time step would be
read at model start, subsequently the time step is
increased by 1 until it reaches 24. Afterwards, the program
continues with step 13 asf. The action string contains the information for
the remapping process. In the example, it only contains the name of
the namelist file to be processed by IMPORT_GRID (see below).
Likewise, the name of the regridding algorithm and the name of the
target grid is given here, if the defaults should not be used.
The remapping algorithm can be changed by naming the
interpolation method ( Furthermore, adding the identifier
The regridding namelist for the example is:
The individual regridding namelists contain
the name and the path of the input file ( the name of the time variable ( the names of the longitude ( the respective ranges of the longitude and latitude axes
( the quantity to be remapped from the file, including an optional
scaling factor and a new name. Here, the original field
Figure
For regional rotated grids only SCRIP is applicable.
Figure
In order to reduce the memory consumption, we currently deviate from our strategy of one single point of import for the initialisation of tracers. It is advantageous to initialise the tracers in the BMIL of the tracer submodel. Here, the full tracer structure is straightforwardly accessible. Additionally, tracer initialisation is only required at model start and therefore the time event control of IMPORT_GRID is not required here.
As illustrated in Fig.
In principle, the processing of the data (import and remapping) can proceed in parallel. Depending on the calling model different methods are applicable. In case of a stand-alone tools parallelisation is possible but is not necessarily required. For 3-D models parallel domain decomposition can be used, i.e., each parallel task processes the data required for its respective part of the model domain. For IMPORT_GRID this is the case for the COSMO model. In models with a more complex domain decomposition (e.g., ECHAM5) this is not straightforwardly applicable. Therefore, no parallelisation is applied unless the number of variables contained in one file is large enough. In this case parallelisation takes place over the number of variables. This is the case for the tracer initialisation in EMAC. If possible, parallelisation over the domain is prefered over parallelisation over variable number. Therefore, the tracer initialisation in COSMO/MESSy is parallelised over the domain.
The IMPORT submodel IMPORT_TS reads standardised abstract time series data from
ASCII or netCDF files.
Time series data generally consist of an equidistant time axis and
a parameter axis.
The time axis covers data defined annually, monthly, daily, hourly,
every minute or every second. The parameter axis can be freely chosen.
It may consist of a number of vertical levels
or be just a collection of different data. For example, the parameter axis of
a radio sonde measurement could be
At the beginning of a simulation, the file is read (i.e., all
time steps and parameters).
During the simulation the data is processed according to the
namelist entries: for simulation dates, which do not exactly match the
times provided by the input data, the available data is interpolated
to the current
date by using the previous or the next point in time, or by interpolationg
linearly between the two nearest points in time. For more details see
Sect.
If the data is available as an ASCII file
(see example in Fig. the flag for the time interval used, the start year, the end year of the data and the number of parameters (columns of the table except time information).
In the example in Fig.
In a netCDF file, the information about the data origin is stored in
attributes. Additionally, the composition of the parameter axis should be
contained in an attribute describing the parameter axis.
For a netCDF file, the interval of the time axis is detected by the
analysis of the time unit and the time coordinate variable. The length
of the parameter axis is determined automatically from its dimension.
Afterwards, the data set (time
IMPORT_TS is driven by the The first string defines the name of the time series data set
and thus the name of the CHANNEL object containing the finally
processed data. By means of this name the data can be
accessed in other parts of the model. The second string comprises the name, including the full path, of the
data file. Only for netCDF files, the string contains the name
of the variable to be read. Its name has to be given at the beginning
of the string and is seperated from the filename by an The next two float entries determine the valid range of the data.
In case of Fortran intrinsic The next two integer variables set the valid time range for
the time series data, i.e.,
if data is provided in cases where the simulation date lies outside of the
time span covered by the data file.
If set to “0” the model execution is stopped, where as “1” allows for
the continuation of the simulation. In the second case, the data of the
nearest point in time present in the file is used.
As the desired policy may differ for dates before and after the covered
time span, the first integer determines the method used for dates prior to
the time span comprised in the file, and the second integer the method
used after the provided time span. In the example
(Fig. The third integer defines the mapping method for time steps in
between the points in time defined by the time series data.
The previous point in time is used. A linear interpolation between the two nearest points in time is performed. The next point in time is used. The following six integers allow for the selection of a specific date
or a specific time span of the data file. The order of entries is
By default, i.e., all six variables are not set, the data is selected
according to the actual simulation date. The last float variable defines an offset. The unit of this offset is
Stand-alone tools of IMPORT_GRID and IMPORT_TS
are part of the electronic
Supplement import_grid.zip and import_ts.zip
Historically, each submodel in MESSy performed its own data import. At the
beginning, when MESSy was only connected to ECHAM5
The implementation of MESSy into the regional weather prediction and climate model COSMO
MECO(n) = MESSyfied ECHAM and COSMO models
nested n-times
This article gives a short overview of the generic MESSy submodels
GRID and IMPORT. GRID provides a standard interface for transformations between
different grids. Currently, two regridding software packages are available:
the, in EMAC well established, NREGRID algorithm
The generic MESSy submodel IMPORT establishes a single point for data
import into a MESSy model.
Currently, IMPORT consists of two submodels:
IMPORT_GRID to read and remap gridded data from
netCDF-files and IMPORT_TS to read in and process abstract time series data.
If import of an additional data representation or the implementation of a new
data format is required, IMPORT can be easily expanded by further subsubmodels.
The code described here is part of the Modular Earth Submodel System (MESSy),
which is continuously further developed and applied by a consortium of
institutions. The usage of MESSy and access to the source code is licenced to
all affiliates of institutions which
are members of the MESSy Consortium. Institutions can be a member of the MESSy
Consortium by signing the MESSy Memorandum of Understanding. More information
can be found on the MESSy Consortium Website (
The work was financed by the German Ministry of Education and Research (BMBF)
in the framework of the MiKlip
(Mittelfristige Klimaprognose/Decadal Prediction) subproject FLAGSHIP
(Feedback of a Limited-Area model to the Global-Scale implemented for HIndcasts
and Projections, funding ID 01LP1127A).
We are grateful to Mariano Mertens (DLR) for testing and improving
IMPORT_GRID for application in COSMO/MESSy. We thank Bastian Kern (DLR)
and Andrea Pozzer (MPIC) for fruitful discussions concerning the
application of SCRIP
for remapping between the EMAC and the MPIOM grid.
The authors acknowledge use of the Ferret program for the graphics in
this paper. Ferret is a product
of NOAAs Pacific Marine Environmental Laboratory (information
is available at
Structure of the generic MESSy submodels GRID (blue) and IMPORT (yellow). The orange boxes indicate the connections between IMPORT and the generic MESSy submodels TIMER and CHANNEL. Each of the boxes stand for one or more subsubmodels. IMPORT comprises the subsubmodels IMPORT_GRID and IMPORT_TS. Additional import subsubmodels (IMPORT_…) can be easily added in the future. IMPORT_GRID utilises GRID_TRAFO. This subsubmodel of GRID depends on the grid definition and handling routines of GRID and provides access to different remapping algorithms (GRID_TRAFO_NRGT, GRID_TRAFO_SCRP). In future additional algorithms can be easily added (GRID_TRAFO_…).
Road traffic
Structure of the generic MESSy submodels IMPORT_GRID and its
connections to other generic submodels based
on Fig.
Example for an ASCII data file for IMPORT_TS.
Example for the CTRL_TS namelist of IMPORT_TS.