Data assimilation has become a popular method to integrate observations from multiple sources with land surface models to improve predictions of the water and energy cycles of the soil-vegetation-atmosphere continuum. Multivariate data assimilation refers to the simultaneous assimilation of observation data from multiple model state variables into a simulation model. In recent years, several land data assimilation systems have been developed in different research agencies. Because of the software availability or adaptability, these systems are not easy to apply for the purpose of multivariate land data assimilation research. We developed an open source multivariate land data assimilation framework (DasPy) which is implemented using the Python script language mixed with the C++ and Fortran programming languages. LETKF (Local Ensemble Transform Kalman Filter) is implemented as the main data assimilation algorithm, and uncertainties in the data assimilation can be introduced by perturbed atmospheric forcing data, and represented by perturbed soil and vegetation parameters and model initial conditions. The Community Land Model (CLM) was integrated as the model operator. The implementation allows also parameter estimation (soil properties and/or leaf area index) on the basis of the joint state and parameter estimation approach. The Community Microwave Emission Modelling platform (CMEM), COsmic-ray Soil Moisture Interaction Code (COSMIC) and the Two-Source Formulation (TSF) were integrated as observation operators for the assimilation of L-band passive microwave, cosmic-ray soil moisture probe and land surface temperature measurements, respectively. DasPy has been evaluated in several assimilation studies of neutron count intensity (soil moisture), L-band brightness temperature and land surface temperature. DasPy is parallelized using the hybrid Message Passing Interface and Open Multi-Processing techniques. All the input and output data flows are organized efficiently using the commonly used NetCDF file format. Online 1-D and 2-D visualization of data assimilation results is also implemented to facilitate the post simulation analysis. In summary, DasPy is a ready to use open source parallel multivariate land data assimilation framework.
Hydrological and land surface processes can be studied on the basis of measurement data or numerical models. For example, remote sensing provides valuable measurements of soil moisture (SM) (Entekhabi et al., 2010; Kerr et al., 2010), land surface temperature (LST) (Wan and Li, 2008) and snow (Hall et al., 2002). In addition to remote sensing measurements, large amounts of in situ data from monitoring networks measure land surface processes locally (Li et al., 2013; Simmer et al., 2014; Vereecken et al., 2010; Zreda et al., 2012). These measurements are representative of the real world, but have a limited spatio-temporal coverage and resolution. This is a main limitation for the assessment of climate change or water resources research purely on the basis of data (Anderson et al., 2008; Li et al., 2009). Therefore, numerical modeling approaches (e.g. land surface and hydrologic models) have become very important tools for earth science research (Liang et al., 1994; Niu et al., 2011; Sellers et al., 1996; Shrestha et al., 2014; Wood et al., 2011). With numerical models, the spatio-temporal variables of land surface states can be estimated at high spatial and temporal resolution and in a cheap and efficient way. However, because of uncertainties in the atmospheric forcings, soil and vegetation parameters and the biophysical processes themselves, usually there are large differences as well as biases in the model simulation results (Bosilovich et al., 2007; De Lannoy et al., 2007; Dee, 2005; Liu and Gupta, 2007).
Land data assimilation can integrate many observation types into land surface or hydrological models to improve the model performance. The uncertainties associated with measurement data and numerical model results are optimally weighted in the land data assimilation approaches (McLaughlin, 2002; Montzka et al., 2012; Moradkhani, 2008; Reichle, 2008). In recent years, many studies have shown the important role of data assimilation for the characterization of water and energy cycles, and multisource observations have been used to improve the characterization of soil moisture (Crow et al., 2008; Kumar et al., 2012), streamflow (Brocca et al., 2010b), land surface fluxes (Bateni and Entekhabi, 2012; Xu et al., 2014) and snow (Che et al., 2014; De Lannoy et al., 2012), amongst others.
For example, soil moisture profile characterization can be improved by assimilating multisource soil moisture data (Crow et al., 2008; X. Han et al., 2012, 2015a; Pauwels et al., 2007; Reichle et al., 2007; Yang et al., 2009); which was also proven to be beneficial for the characterization of land surface fluxes (Han et al., 2015a; Jin and Li, 2009; Xu et al., 2011). Remotely sensed land surface temperature products have been assimilated into land surface models to improve the estimation of land surface fluxes (Ghent et al., 2010; Han et al., 2015a; Huang et al., 2008; Reichle et al., 2010; Xu et al., 2014). Streamflow is sensitive to soil moisture and its prediction can be improved by soil moisture assimilation (Brocca et al., 2010b; E. J. Han et al., 2012; Lee et al., 2011; Mascaro and Vivoni, 2012; Meier et al., 2011; Pauwels et al., 2001) or through streamflow assimilation (Clark et al., 2008; Lee et al., 2011; Li et al., 2011; Warrach-Sagi and Wulfmeyer, 2010). Assimilation of snow cover fraction (SCF) (Andreadis and Lettenmaier, 2006; Clark et al., 2006), snow water equivalent (SWE) (De Lannoy et al., 2012; Dong et al., 2007) and passive microwave brightness temperature data (Che et al., 2014) are all helpful for snow characterization. Terrestrial water storage characterization can be improved by assimilating the Gravity Recovery and Climate Experiment (GRACE) data (Lo et al., 2010; Zaitchik et al., 2008).
Data assimilation is also used in the context of joint state and parameter (or bias) estimation (Ruiz et al., 2013). One possible approach in this context is the state augmentation method (Bateni and Entekhabi, 2012; Franssen and Kinzelbach, 2008; Han et al., 2014). Another approach is dual state and parameter estimation in which the models need to be run twice compared with the state augmentation method (Cammalleri and Ciraolo, 2012; Moradkhani et al., 2005). Further approaches are the combined data assimilation and optimization method (Vrugt et al., 2005; Xu et al., 2011; Yang et al., 2009), and joint state and bias estimation methods (Bosilovich et al., 2007; De Lannoy et al., 2007; Dee, 2005; Kumar et al., 2012).
Usually the development of the necessary land data assimilation system for conducting the above-mentioned research is laborious. Examples of land data assimilation systems recently developed are the Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004), the European Land Data Assimilation System (ELDAS) (Wilker et al., 2006), the Chinese Land Data Assimilation System (CLDAS) (Li et al., 2007; Tian et al., 2010), the Land Information System (LIS) (Kumar et al., 2008), the Canadian Land Data Assimilation System (CaLDAS) (Balsamo et al., 2007), the GEOS-5 Data Assimilation System (Rienecker et al., 2008), the Data Assimilation Research Testbed (DART) (Anderson et al., 2009), the Korea Land Data Assimilation System (KLDAS) (Lim et al., 2012), the Earth Observation Land Data Assimilation System (EO-LDAS) (Lewis et al., 2012), the Parallel Data Assimilation Framework (PDAF) (Nerger and Hiller, 2013), the Open Data Assimilation library (OpenDA) (Ridler et al., 2014) and the Auto-Tuned Land Data Assimilation System (ATLAS) (Crow and Yilmaz, 2014). The source codes of these systems are not available for the broad data assimilation community except for GEOS-5, DART, EO-LDAS, PDAF and OpenDA. However, GEOS-5, DART, EO-LDAS, PDAF and OpenDA are not specifically designed for land data assimilation, and complex code structures are used in GEOS-5, DART, PDAF and OpenDA for the purpose of versatility. More technical efforts are still necessary to adapt these systems for multisource and multivariate land data assimilation including parameter estimation or bias estimation. For example, observation operators are necessary to assimilate the radiance observation directly. However, observation operators are usually implemented with various programming languages like Fortran and Matlab. Yet additional efforts are required to integrate the observation operators into a single data assimilation framework.
Therefore, a ready to use multivariate land data assimilation system is
still necessary for the large land data assimilation community. In this
paper, we present the development of an open-source multivariate land data
assimilation framework (DasPy). The main programming language of DasPy is
Python (
The objective of this paper is to introduce the implementation of the DasPy land data assimilation framework. A real-world case study with land surface temperature assimilation including parameter estimation illustrates the capability of DasPy at the catchment scale. This paper is organized as follows: Sect. 2 introduces the framework of DasPy implementation; the land surface temperature assimilation with DasPy is shown in Sect. 3; Sect. 4 presents the discussion; and the final conclusion is summarized in Sect. 5.
The objective of the DasPy development is to integrate multisource measurements into a land surface model to obtain an improved characterization of the water and energy cycles. DasPy has been designed in a modular form and is integrated in a parallel computing framework.
DasPy contains 8 key components: (1) assimilation approaches, (2) model operator, (3) representation of model uncertainties, (4) observation operators, (5) multivariate data assimilation, (6) parallel computing, (7) organization of input and output, (8) visualization. The system flow is illustrated in Fig. 1.
The script programming language Python was used to implement the DasPy
structure, moreover DasPy benefits from the powerful Python-based ecosystem
of open-source software for mathematics, science and engineering (Fig. 2).
The Numpy package (
The ensemble Kalman filter (EnKF) (Evensen, 2003) has become the most popular data assimilation algorithm in land surface modelling because it is robust against non-linear model dynamics and can easily handle different uncertainty sources (Reichle, 2008). Many variants of EnKF have been proposed (Bishop et al., 2001; Whitaker and Hamill, 2002) since its classical version with observation perturbation (Burgers et al., 1998). The Local Ensemble Transform Kalman Filter (LETKF) without observation perturbation (Hunt et al., 2007; Miyoshi and Yamane, 2007; Ott et al., 2004) was chosen as the main data assimilation algorithm of DasPy. LETKF is very efficient for parallel computing (Miyoshi et al., 2014) and easy to implement because the data assimilation is handled separately for each model grid cell.
In LETKF, observation localization (OL) is used to assimilate spatially distributed observation data (Greybush et al., 2011; Han et al., 2015b). Spatial correlation is an intrinsic characteristic of land surface states such as soil moisture and soil temperature (Brocca et al., 2010a; Ryu and Famiglietti, 2006). Observation data can be extrapolated from the observed locations to unobserved locations in LETKF with OL (Greybush et al., 2011; X. Han et al., 2012; Hunt et al., 2007) on the basis of spatial correlation characteristics of model state variables or observation data. The combination of spatial correlation and land data assimilation was implemented by the OL technique in LETKF and DasPy (X. Han et al., 2012, 2015b). Spatial correlations are typically estimated using geostatistical analysis methods such as semivariogram modeling (Goovaerts, 1997). Several semivariogram models are available to characterize the spatial horizontal correlation of observation data. These are the Gaussian model, exponential model, spherical model and the more general Matern model (Goovaerts, 1997; X. Han et al., 2012; Minasny and McBratney, 2005). The popular 5th order polynomial correlation function is also implemented as an option in DasPy (Gaspari and Cohn, 1999; Reichle and Koster, 2003).
Filter inbreeding, the underestimation of ensemble variance in data
assimilation, is a common problem related to small ensemble sizes and the
inadequate characterization of all sources of uncertainty. As a consequence
of filter inbreeding, the ensemble spread narrows down along with the data
assimilation progress, ultimately even leading to filter divergence so that
the observation data cannot be fully exploited. Therefore the spread of
ensemble members should be kept to retain the ensemble variance and to avoid
the filter divergence (Anderson, 2009). The multiplicative
ensemble inflation method is implemented as follows (Whitaker and
Hamill, 2012):
A modified version of the LETKF code developed by Takemasa Miyoshi
(
The current version of DasPy is mainly designed for the Community Land Model
(CLM – version 4.5) (Oleson
et al., 2013), which is a commonly used land surface model and is still
under very active development (
CLM is loosely coupled into DasPy. DasPy calls the executable program of CLM directly, and the data transfer between DasPy and CLM is by the initial files of CLM in NetCDF format. There are two advantages with loose coupling: (1) the source code of CLM did not need to be modified (which is an advantage because its structure is very complex), (2) it can easily be adapted to future new versions of CLM and other model operators. The main disadvantage is the loss of computational efficiency, however, the binary NetCDF file format used by CLM could compensate for loss of computational performance.
The data assimilation performance depends on the suitable representation of
model uncertainties (Crow and Van Loon, 2006; Moradkhani, 2008). The
uncertainties from the atmospheric forcing data, model initial conditions
and model parameters are commonly considered in land data assimilation
(Reichle, 2008). These uncertainties can be represented by randomly
perturbing model parameters and atmospheric forcing data, for example by
additive noise and multiplicative noise. In order to facilitate the
application of DasPy, standard values for the perturbation of model forcings
and model parameters are implemented as follows:
Atmospheric forcing perturbation: multiplicative noise with a log-normal
distribution (mean Soil properties perturbation: the soil hydraulic and thermal properties
of CLM are derived from the soil properties of sand fraction, clay fraction
and organic matter density (Han et al., 2014; Oleson et al., 2013).
Additive perturbations are imposed on the soil properties, typically using
a uniformly distributed random noise U[ Vegetation properties: leaf area index is typically perturbed using
a multiplicative noise with uniform distribution U[0.75, 1.25]. If more
detailed information is available, the perturbation of model forcings, soil
parameters and vegetation parameters can be modified according to the data
basis.
Model forcings can also be perturbed including spatio-temporally correlated
noise, for example with help of the Fast Fourier Transform approach and
a first-order auto-regressive model. See (Reichle et al., 2010) for more
details. Spatial correlation of soil and vegetation properties can be
considered by geostatistical simulation methods (Goovaerts, 1997), this
part was implemented in DasPy with the R geostatistical package geoR
(
The measurements to be assimilated, such as soil moisture, land surface temperature and snow cover fraction, can provide direct information on the land surface states. However, some measurements may provide indirect information, examples are the passive microwave brightness temperature (Entekhabi et al., 2010) or the fast neutron intensity as measured by the cosmic-ray soil moisture probe (Zreda et al., 2012). The observation operators need to be integrated to model the relationship between the land surface states and the indirect measurements of different sensors. The spatial support of a measurement might also differ from the CLM spatial resolution. Therefore, the observation operators should include downscaling/upscaling procedures (Mascaro and Vivoni, 2012; Merlin et al., 2013).
Three observation operators (Fig. 3) have been coupled to DasPy. These observation operators handle the assimilation of passive microwave brightness temperature, land surface temperature and cosmic-ray neutron intensity observations:
The forward observation operator for L-band passive microwave brightness temperature of the land surface is the Community Microwave Emission Model (CMEM) developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Drusch et al., 2009; Holmes et al., 2008). CMEM can be used to assimilate the direct radiance measurements of L-band passive microwave sensors (Han et al., 2013, 2014), such as from the Soil Moisture Ocean Salinity (SMOS) (Kerr et al., 2010) and Soil Moisture Active Passive (SMAP) (Entekhabi et al., 2010) missions. CMEM was loosely coupled in DasPy, like CLM. The forward observation operator for land surface temperature products (e.g. Moderate-resolution Imaging Spectroradiometer, MODIS, and Geostationary Operational Environmental Satellites, GOES) is the so-called Two-Source Formulation (TSF) (Anderson et al., 2005; Kustas and Anderson, 2009). CLM only calculates the ground temperature and vegetation temperature, and does not model the composed land surface temperature from the view point of remote sensors. The TSF can be used to calculate the composed land surface temperature using ground temperature and vegetation temperature of CLM (Han et al., 2013). TSF is tightly coupled in DasPy and the data transfer between DasPy and TSF is by memory. The new developed cosmic-ray soil moisture probe is a noninvasive soil moisture sensor at the field scale (Zreda et al., 2012). DasPy integrates the COsmic-ray Soil Moisture Interaction Code (COSMIC) model (Rosolem et al., 2014; Shuttleworth et al., 2013) as the observation operator to assimilate the fast neutron intensity measured by the cosmic-ray soil moisture probe (Han et al., 2015a). COSMIC was developed under the COsmic-ray Soil Moisture Observing System project (COSMOS;
The multivariate data assimilation capabilities of DasPy include: (1) soil moisture data assimilation with direct soil moisture measurements (X. Han et al., 2012), assimilation of L-band passive microwave brightness temperature (Han et al., 2013, 2014) and cosmic-ray fast neutron intensity (Han et al., 2015a); (2) soil temperature data assimilation with direct soil temperature measurements and land surface temperature products of remote sensing (Han et al., 2015a, 2013); (3) joint estimation of soil moisture and soil properties (sand fraction, clay fraction and organic matter density) (Han et al., 2014); (4) joint estimation of soil temperature and leaf area index (Han et al., 2015a); (5) all the above capabilities combined together. Moreover, DasPy can easily be extended for the assimilation of other measurements like land surface fluxes, snow, ground water table and leaf area index. Parameter estimation can be further extended and is not limited to soil properties and leaf area index. The multivariate data assimilation of DasPy is summarized in Fig. 4.
DasPy can run on a standard PC in serial mode, and also on a cluster or supercomputer in parallel mode. Four components of DasPy were parallelized for efficient data assimilation applications: (1) the model operators and observation operators have to run in parallel for different ensemble members; (2) the spatial domain needs to be decomposed into small blocks in order to meet the memory limitation of each computer node of the cluster; (3) the file operations (input/output) of CLM need to be in parallel for large ensemble size; (4) the data assimilation of each model grid cell was parallelized in LETKF.
The parallel implementation of DasPy is illustrated in Fig. 5. First, the mpi4py library is used to initialize the global communicator (MPI_COMM_WORLD) of MPI, which controls the communication among different computer nodes, and then the servers of parallelpython (ppserver) at each computer node are started by MPI. The ensemble members of ppserver are utilized to organize the parallel computation of model simulation, filter operations and so on. The ensemble members of the model operator and the observation operators, and the file operations of DasPy are distributed on each computer node evenly. For data assimilation, the domain decomposition of DasPy needs to be specified according to the number of computer nodes and the memory requirement. The parallel calculation of each spatial block (according domain decomposition) for data assimilation is distributed to each node by ppserver.
The OpenMP technique is also implemented in DasPy for two reasons: (1) LETKF is parallelized using Fortran and OpenMP to do data assimilation for each model grid cell of each sub-domain (Fig. 5); (2) the model states, updated after data assimilation, are used to update the CLM initial files using C++ and OpenMP.
The NetCDF file format (
DasPy also supports online plotting. The scientific results need to be displayed graphically for a quick assessment of simulation results. Two visualization methods have been implemented using Python plotting library matplotlib: (1) 1-D plotting method for time series data, (2) 2-D plotting method for spatial data. The results can be checked in the online mode while DasPy is running. This allows a very fast visualization of analysis results and it releases researchers from the intensive work of data processing and plotting.
Land surface temperature is a key indicator of the Earth surface energy
budget, and has large implications on hydrological processes at the land
surface, such as sensible and latent heat fluxes. MODIS (Terra/Aqua)
measures the land surface temperature four times per day (twice during
daytime and twice during night) with a high spatial resolution (1
CLM calculates soil temperature and vegetation temperature from the incoming
and outgoing heat fluxes
(Oleson et al., 2013):
The heat flux
The change in vegetation temperature
From Eqs. (2) and (4) it follows that the calculated soil/vegetation temperature is sensitive to the sensible and latent heat fluxes. The sensitivity of sensible and latent heat fluxes to the CLM states/parameters has been evaluated in recent studies (Bonan et al., 2011; Hou et al., 2012; Schwinger et al., 2010). It was shown that the most sensitive CLM states/parameters were soil moisture, leaf area index, maximum rate of carboxylation (Vcmax), decay factor of subsurface runoff (Fdrai) and maximum drainage when the water table is at the surface (Qdrai). In this study, DasPy was extended to include the estimation of the three mentioned CLM-parameters above.
The parameter values of Vcmax, Fdrai, and Qdrai are tuned globally in CLM
(Oleson et al., 2013). These
values need to be calibrated for the specific applications at the smaller
catchment scale. Therefore, MODIS LST data were assimilated in CLM to update
these sensitive variables/parameters with joint state and parameter
estimation in DasPy. However, a limitation is that the leaf area index as
provided by MODIS underestimates the true LAI by 0.66 (
The data assimilation was carried out for the Rur catchment (2454
The land surface temperature measurements to be assimilated in CLM were obtained from the daily MODIS land surface temperature
product MOD11A1 in 2012 (
The spinup of CLM was done for 10 years using the forcing data from 2010 and 2011 repeatedly. Data assimilation for the year 2012 was carried out for the following seven simulation/assimilation scenarios: (1) Openloop – CLM open loop simulation without data assimilation, (2) LST_Vcmax – update Vcmax by assimilating LST; (3) LST_Fdrai – update Fdrai assimilating LST; (4) LST_Qdrai – update Qdrai by assimilating LST; (5) LST_LAI_SAI – update LAI and SAI (stem area index) by assimilating LST, (6) LST_Par_All – update Vcmax, Fdrai, Qdrai, LAI and SAI by assimilating LST, (7) LST_Feedback_Par_All – update Vcmax, Fdrai, Qdrai, LAI and SAI and soil moisture by assimilating LST. The soil temperature profiles were also updated in all assimilation scenarios.
The two-source formulation model (TSF, see Sect. 2.4) of DasPy was used as the observation operator. Fifty ensemble members of
atmospheric forcing data, soil properties and leaf area index were generated with the random perturbation method and using the
standard perturbation values introduced in Sect. 2.3. The parameter estimation of Vcmax, Fdrai, Qdrai,
LAI and SAI was carried out with help of the state augmentation approach. The update of the soil moisture profile (scenario
LST_Feedback_Par_All) was also done with the state augmentation approach. As Vcmax, Fdrai and Qdrai
are hard coded parameter values in CLM, the CLM source code was modified to allow the changes of these parameters. The empirical
values of Vcmax, Fdrai and Qdrai can vary between [10.75 (
DasPy was run on the JUROPA supercomputer platform
(
Field measurement data obtained from the Transregional Collaborative Research Centre 32 (TR32) project and the Terrestrial
Environmental Observatories (TERENO) were used for the validation. These data include: sensible and latent heat fluxes measured by
Eddy Covariance (EC) systems (Aubinet et al., 2012; Liu et al., 2011) at the stations of Kall-Sistig, Merzenhausen, Rollesbroich,
Niederzier and Selhausen (see Fig. 6), and soil moisture and soil temperature at three depths (5, 20 and 50
The Root Mean Square Error (RMSE) was used to compare simulation results:
Figure 7 shows the RMSE values of the latent heat flux for the open loop run and the different data assimilation scenarios. The RMSE values for the five EC-stations are summarized. The results show that only in case LAI and SAI were updated together with LST, the characterization of the latent and sensible heat fluxes improved slightly (small RMSE-reduction). Nevertheless, the RMSE values at the Merzenhausen site increased slightly.
Results for the sensible heat flux characterization are displayed in Fig. 8 for the same scenarios. The assimilation scenarios where only one parameter was updated together with LST (LST_Vcmax, LST_Fdrai and LST_Qdrai) show a clear reduction of the RMSE values. However, the results for three sites (Merzenhausen, Rollesbroich and Selhausen) show a slightly worse performance for the scenarios of LST_LAI_SAI, LST_Par_All and LST_Feedback_Par_All compared with the scenarios of LST_Vcmax, LST_Fdrai and LST_Qdrai. The results are however not worse than for the open loop scenario. It seems that for the sensible heat flux characterization, the calibration of Vcmax has the largest positive impact.
The RMSE values for the characterization of the soil moisture and soil temperature profile at the Rollesbroich site are shown in
Figs. 9 and 10. During the winter period, both the measurements and simulated values show a very high soil water content related
to frequent rainfall and low evapotranspiration. Interesting for an analysis of the differences between simulated and measured
soil moisture content is the period between April and September. The results of the open loop simulation for soil moisture content
at 20 and 50
Figure 11 shows the total annual actual evapotranspiration at five EC-stations for the different simulation scenarios (the basin
scale characterization of annual actual evapotranspiration is plotted in Fig. 12). Figure 11 shows that assimilation of LST in
combination with estimation of Vcmax, Fdrai or Qdrai alone hardly modified the performance of the
simulations. The calibration of LAI and SAI results in higher actual evapotranspiration compared with the open loop
simulation. The MODIS product of actual evapotranspiration shows the largest values. For the two grassland sites Rollesbroich and
Kall-Sistig the difference between CLM-simulations and the MODIS product is large, whereas for the other sites the difference can
be considered relatively small. Considering the relatively low RMSE values (around 40
The updated LAI for five sites is shown in Fig. 13. The red lines show the annual LAI from the MODIS product, interpolated linearly by CLM. The underestimation of LAI by the MODIS sensor is reduced by assimilating land surface temperature for four of the sites. However, for the forest areas no significant update was found.
The results of LST assimilation for the Rur catchment show the ability of the DasPy system for joint state and parameter
estimation. The characterization of the latent heat flux was (slightly) improved with the update of the soil temperature profile,
LAI and SAI. The calibration of Vcmax, Fdrai and Qdrai could (slightly) improve the sensible heat flux
estimation. The EC measurement data used for the validation were not corrected for the energy balance deficit. The EC system
usually underestimates the latent and sensible heat fluxes (Wilson et al., 2002). In addition, the EC-footprint at the sites may
not be consistent with the model spatial resolution. However, in this latter case we do not expect a systematic impact on the
results but rather a random impact. The assimilation could reduce the soil temperature RMSE by 0.3
Overall, the assimilation of LST data hardly improves the characterization of hydrological states and fluxes. These results are consistent with previous findings (Bosilovich et al., 2007; Ghent et al., 2010; Reichle et al., 2010; Xu et al., 2014). A main reason why LST assimilation most probably did not improve much the assimilation results is the limited information content of a single LST measurement. We expect that LST measurements carry more information on model parameters, whose update could help to more structurally improve the characterization of hydrological states and fluxes. This was done in this study. However, instead of assimilating data measured at a single time point, it could be better to assimilate data from a daily LST-cycle with help of a smoother approach. This was not tested in this work and is subject of future research including extension of DasPy. Other reasons for the limited impact of LST-assimilation are probably: (1) incorrect values for model parameters like soil texture, leaf area index, Vcmax, and others; (2) uncertainties of atmospheric forcing data, such as precipitation, air temperature, wind speed and incoming shortwave radiation; (3) model structural errors and (4) intrinsic errors related to model parameterizations. The bias/error of the MODIS LST products and the spatial and temporal mismatch of the CLM model and MODIS products could also have influenced the assimilation performance negatively. Soil moisture content and vegetation coverage are constantly high for the Rur catchment and the dense vegetation and wet soil could decrease the performance of LST assimilation as compared to areas with sparse vegetation and dry soil (Bateni and Entekhabi, 2012; Xu et al., 2014). The ground validation data could also be biased due to the scale mismatch of the CLM simulations and in-situ sensors like the EC footprint. All these factors could be reasons why the assimilation of LST did not improve the CLM simulations significantly (Reichle et al., 2010). The inclusion of other data types besides LST in the data assimilation, like for example soil moisture content, leaf area index, and the combination with parameter estimation could be promising to improve the results as compared to univariate data assimilation of LST (Bosilovich et al., 2007).
The development of DasPy is for the purpose of scientific research and not for operational application. The performance of DasPy
is influenced by many aspects: the ensemble generation (perturbations of soil and vegetation parameters and atmospheric forcing
data), characterization of the observation error, and the calibration of the model operator and observation operators. The users
should pay attention to these aspects in order to achieve good data assimilation results with DasPy. Although hybrid parallel
computation with MPI and OpenMP has been implemented in DasPy, the computational efficiency of DasPy is not very excellent because
of the Python script language, intensive data transfer among the different modules of DasPy by NetCDF files and loose coupling
methods for the model operator and the observations operators. For large scale applications, the data transfer among different
components should be changed from file to memory making use of the high performance global array package, e.g. petsc4py
(
DasPy provides a free multivariate land data assimilation framework to the community, whose development required several years of effort. DasPy is open source and modular. We leave more space for the users to improve or expand the performance of DasPy.
DasPy has been evaluated for spatial scales ranging from 100
The multivariate land data assimilation framework DasPy is introduced in this paper. DasPy has been utilized successfully in several multivariate land data assimilation studies of soil moisture, land surface temperature, L-band passive microwave brightness temperature, cosmic-ray neutron intensity, joint estimation of soil moisture and soil properties, and the joint estimation of soil temperature and leaf area index. It has been applied in applications at the point scale and the catchment scale. In this study, the performance of DasPy was illustrated for a new case study of the assimilation of a land surface temperature product from MODIS for the Rur catchment in western Germany.
DasPy is implemented in the Python programming language which is easy to learn and use for scientific computing. Three observation operators have been integrated for the assimilation of land surface temperature, L-band passive microwave brightness temperature and cosmic-ray neutron intensity, respectively. The LETKF algorithm is utilized as the main data assimilation method because of its easy parallelization, and the state augmentation approach for the joint state and parameter estimation is also implemented as an option for the data assimilation. DasPy can be executed in serial mode on a laptop but also in the high performance computing environments with many processors.
The current version of DasPy is designed for CLM specifically. DasPy is not limited to soil moisture and soil temperature assimilation. DasPy provides a flexible framework to extend assimilation of snow depth, terrestrial water storage and the ground water table depth and include the estimation of other model parameters. Assimilation of the normalized difference vegetation index, leaf area index, and net ecosystem exchange (NEE) measurements can also be integrated relatively easily if the biogeochemical version of CLM is used as model operator.
In summary, DasPy is a ready to use multivariate land data assimilation framework. DasPy is under continuous further development and more multivariate data assimilation capabilities will be added to DasPy in the nearby future. Further work is planned to enable the assimilation of active microwave backscatter, e.g. from the ASCAT (MetOp Advanced Scatterometer), SMAP or ESA Sentinel-1 satellite missions, and the adaptive error characterizations of model operator and observation data. The code is open-source for the community (Appendix).
The source code, user guide and example data of DasPy can be accessed from
The study of this work was supported by the DFG (Deutsche Forschungsgemeinschaft) Forschergruppe 2131 “Data Assimilation for Improved Characterization of Fluxes across Compartmental Interfaces”, the NSFC (National Natural Science Foundation of China) project (grant number: 41271357, 91425303), the Transregional Collaborative Research Centre 32 (TR32), financed by the DFG and the Terrestrial Environmental Observatories (TERENO). The MODIS data product is courtesy of the online Data Pool at the NASA Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota. The support of the supercomputing facilities of Forschungszentrum Julich (JUROPA) is gratefully acknowledged. Eddy covariance measurements were provided by the central service project Z3 and project C3 (Karl Schneider/Univ. Cologne) of the DFG TR32. Processing of the eddy covariance data was done by project Z3 of TR32. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association.
System Flow of DasPy (bold words represent the function names of DasPy).
Python packages used in DasPy.
The observation operators in DasPy.
Multivariate data assimilation with DasPy.
Parallel Implementation of DasPy.
Landcover types for the Rur Catchment and locations of measurement stations.
RMSE values of latent heat flux at five EC-stations for the open loop run, and assimilation scenarios of joint LST and Vcmax estimation (LST_Vcmax), joint LST and Fdrai estimation (LST_Fdrai), joint LST and Qdrai estimation (LST_Qdrai), joint LST, LAI and SAI estimation (LST_LAI_SAI), joint LST, Vcmax, Fdrai, Qdrai, LAI and SAI estimation (LST_Par_All) and joint LST, soil moisture, Vcmax, Fdrai, Qdrai, LAI and SAI estimation (LST_Feedback_Par_All).
RMSE values of sensible heat flux at five EC-stations for different scenarios (see Fig. 7).
RMSE values of soil moisture (from April to September) for 5, 20 and 50
RMSE values of soil temperature (from April to September) for 5, 20 and 50
Actual evapotranspiration (ET) at five EC-stations in the Rur catchment in 2012 for different simulation/assimilation scenarios and for the MODIS product (MODIS).
Catchment scale ET for the Rur catchment in 2012 for the open loop run (left), LST_SAI_LAI (middle) and MODIS (right).
Updated leaf area index for four scenarios at the five stations.