Refine
Language
- English (10)
Is part of the Bibliography
- yes (10)
Keywords
- aquifer (2)
- catchment (2)
- climate change impacts (2)
- coupled surface (2)
- dynamics (2)
- floods (2)
- flow (2)
- hydrological models (2)
- inference (2)
- machine learning (2)
Institute
- Institut für Umweltwissenschaften und Geographie (10) (remove)
Dimension reduction for integrating data series in Bayesian inversion of geostatistical models
(2019)
This study explores methods with which multidimensional data, e.g. time series, can be effectively incorporated into a Bayesian framework for inferring geostatistical parameters. Such series are difficult to use directly in the likelihood estimation procedure due to their high dimensionality; thus, a dimension reduction approach is taken to utilize these measurements in the inference. Two synthetic scenarios from hydrology are explored in which pumping drawdown and concentration breakthrough curves are used to infer the global mean of a log-normally distributed hydraulic conductivity field. Both cases pursue the use of a parametric model to represent the shape of the observed time series with physically-interpretable parameters (e.g. the time and magnitude of a concentration peak), which is compared to subsets of the observations with similar dimensionality. The results from both scenarios highlight the effectiveness for the shape-matching models to reduce dimensionality from 100+ dimensions down to less than five. The models outperform the alternative subset method, especially when the observations are noisy. This approach to incorporating time series observations in the Bayesian framework for inferring geostatistical parameters allows for high-dimensional observations to be faithfully represented in lower-dimensional space for the non-parametric likelihood estimation procedure, which increases the applicability of the framework to more observation types. Although the scenarios are both from hydrogeology, the methodology is general in that no assumptions are made about the subject domain. Any application that requires the inference of geostatistical parameters using series in either time of space can use the approach described in this paper.
Hydrogeological information about an aquifer is difficult and costly to obtain, yet essential for the efficient management of groundwater resources. Transferring information from sampled sites to a specific site of interest can provide information when site-specific data is lacking. Central to this approach is the notion of site similarity, which is necessary for determining relevant sites to include in the data transfer process. In this paper, we present a data-driven method for defining site similarity. We apply this method to selecting groups of similar sites from which to derive prior distributions for the Bayesian estimation of hydraulic conductivity measurements at sites of interest. We conclude that there is now a unique opportunity to combine hydrogeological expertise with data-driven methods to improve the predictive ability of stochastic hydrogeological models.
High-performance numerical codes are an indispensable tool for hydrogeologists when modeling subsurface flow and transport systems. But as they are written in compiled languages, like C/C++ or Fortran, established software packages are rarely user-friendly, limiting a wider adoption of such tools. OpenGeoSys (OGS), an open-source, finite-element solver for thermo-hydro-mechanical-chemical processes in porous and fractured media, is no exception. Graphical user interfaces may increase usability, but do so at a dramatic reduction of flexibility and are difficult or impossible to integrate into a larger workflow. Python offers an optimal trade-off between these goals by providing a highly flexible, yet comparatively user-friendly environment for software applications. Hence, we introduceogs5py, a Python-API for the OpenGeoSys 5 scientific modeling package. It provides a fully Python-based representation of an OGS project, a large array of convenience functions for users to interact with OGS and connects OGS to the scientific and computational environment of Python.
Groundwater travel time distributions (TTDs) provide a robust description of the subsurface mixing behavior and hydrological response of a subsurface system. Lagrangian particle tracking is often used to derive the groundwater TTDs. The reliability of this approach is subjected to the uncertainty of external forcings, internal hydraulic properties, and the interplay between them. Here, we evaluate the uncertainty of catchment groundwater TTDs in an agricultural catchment using a 3-D groundwater model with an overall focus on revealing the relationship between external forcing, internal hydraulic properties, and TTD predictions. Eight recharge realizations are sampled from a high-resolution dataset of land surface fluxes and states. Calibration-constrained hydraulic conductivity fields (Ks fields) are stochastically generated using the null-space Monte Carlo (NSMC) method for each recharge realization. The random walk particle tracking (RWPT) method is used to track the pathways of particles and compute travel times. Moreover, an analytical model under the random sampling (RS) assumption is fit against the numerical solutions, serving as a reference for the mixing behavior of the model domain. The StorAge Selection (SAS) function is used to interpret the results in terms of quantifying the systematic preference for discharging young/old water. The simulation results reveal the primary effect of recharge on the predicted mean travel time (MTT). The different realizations of calibration-constrained Ks fields moderately magnify or attenuate the predicted MTTs. The analytical model does not properly replicate the numerical solution, and it underestimates the mean travel time. Simulated SAS functions indicate an overall preference for young water for all realizations. The spatial pattern of recharge controls the shape and breadth of simulated TTDs and SAS functions by changing the spatial distribution of particles' pathways. In conclusion, overlooking the spatial nonuniformity and uncertainty of input (forcing) will result in biased travel time predictions. We also highlight the worth of reliable observations in reducing predictive uncertainty and the good interpretability of SAS functions in terms of understanding catchment transport processes.
The fluxes of water and solutes in the subsurface compartment of the Critical Zone are temporally dynamic and it is unclear how this impacts microbial mediated nutrient cycling in the spatially heterogeneous subsurface. To investigate this, we undertook numerical modeling, simulating the transport in a wide range of spatially heterogeneous domains, and the biogeochemical transformation of organic carbon and nitrogen compounds using a complex microbial community with four (4) distinct functional groups, in water saturated subsurface compartments. We performed a comprehensive uncertainty analysis accounting for varying residence times and spatial heterogeneity. While the aggregated removal of chemical species in the domains over the entire simulation period was approximately the same as that in steady state conditions, the sub-scale temporal variation of microbial biomass and chemical discharge from a domain depended strongly on the interplay of spatial heterogeneity and temporal dynamics of the forcing. We showed that the travel time and the Damkohler number (Da) can be used to predict the temporally varying chemical discharge from a spatially heterogeneous domain. In homogeneous domains, chemical discharge in temporally dynamic conditions could be double of that in the steady state conditions while microbial biomass varied up to 75% of that in steady state conditions. In heterogeneous domains, the interquartile range of uncertainty in chemical discharge in reaction dominated systems (log(10)Da > 0) was double of that in steady state conditions. However, high heterogeneous domains resulted in outliers where chemical discharge could be as high as 10-20 times of that in steady state conditions in high flow periods. And in transport dominated systems (log(10)Da < 0), the chemical discharge could be half of that in steady state conditions in unusually low flow conditions. In conclusion, ignoring spatio-temporal heterogeneities in a numerical modeling approach may exacerbate inaccurate estimation of nutrient export and microbial biomass. The results are relevant to long-term field monitoring studies, and for homogeneous soil column-scale experiments investigating the role of temporal dynamics on microbial redox dynamics.
Geostatistics as a subfield of statistics accounts for the spatial correlations encountered in many applications of, for example, earth sciences. Valuable information can be extracted from these correlations, also helping to address the often encountered burden of data scarcity. Despite the value of additional data, the use of geostatistics still falls short of its potential. This problem is often connected to the lack of user-friendly software hampering the use and application of geostatistics. We therefore present GSTools, a Python-based software suite for solving a wide range of geostatistical problems. We chose Python due to its unique balance between usability, flexibility, and efficiency and due to its adoption in the scientific community. GSTools provides methods for generating random fields; it can perform kriging, variogram estimation and much more. We demonstrate its abilities by virtue of a series of example applications detailing their use.
Groundwater is the biggest single source of high-quality freshwater worldwide, which is also continuously threatened by the changing climate. In this paper, we investigate the response of the regional groundwater system to climate change under three global warming levels (1.5, 2, and 3 ∘C) in a central German basin (Nägelstedt). This investigation is conducted by deploying an integrated modeling workflow that consists of a mesoscale hydrologic model (mHM) and a fully distributed groundwater model, OpenGeoSys (OGS). mHM is forced with climate simulations of five general circulation models under three representative concentration pathways. The diffuse recharges estimated by mHM are used as boundary forcings to the OGS groundwater model to compute changes in groundwater levels and travel time distributions. Simulation results indicate that groundwater recharges and levels are expected to increase slightly under future climate scenarios. Meanwhile, the mean travel time is expected to decrease compared to the historical average. However, the ensemble simulations do not all agree on the sign of relative change. Changes in mean travel time exhibit a larger variability than those in groundwater levels. The ensemble simulations do not show a systematic relationship between the projected change (in both groundwater levels and travel times) and the warming level, but they indicate an increased variability in projected changes with adjusting the enhanced warming level from 1.5 to 3 ∘C. Correspondingly, it is highly recommended to restrain the trend of global warming.
Machine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever-increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model-agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine.
Machine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever-increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model-agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine.
Groundwater is the biggest single source of high-quality freshwater worldwide, which is also continuously threatened by the changing climate. In this paper, we investigate the response of the regional groundwater system to climate change under three global warming levels (1.5, 2, and 3 ∘C) in a central German basin (Nägelstedt). This investigation is conducted by deploying an integrated modeling workflow that consists of a mesoscale hydrologic model (mHM) and a fully distributed groundwater model, OpenGeoSys (OGS). mHM is forced with climate simulations of five general circulation models under three representative concentration pathways. The diffuse recharges estimated by mHM are used as boundary forcings to the OGS groundwater model to compute changes in groundwater levels and travel time distributions. Simulation results indicate that groundwater recharges and levels are expected to increase slightly under future climate scenarios. Meanwhile, the mean travel time is expected to decrease compared to the historical average. However, the ensemble simulations do not all agree on the sign of relative change. Changes in mean travel time exhibit a larger variability than those in groundwater levels. The ensemble simulations do not show a systematic relationship between the projected change (in both groundwater levels and travel times) and the warming level, but they indicate an increased variability in projected changes with adjusting the enhanced warming level from 1.5 to 3 ∘C. Correspondingly, it is highly recommended to restrain the trend of global warming.