TY - JOUR A1 - Savoy, Heather A1 - Heße, Falk T1 - Dimension reduction for integrating data series in Bayesian inversion of geostatistical models JF - Stochastic environmental research and risk assessment N2 - This study explores methods with which multidimensional data, e.g. time series, can be effectively incorporated into a Bayesian framework for inferring geostatistical parameters. Such series are difficult to use directly in the likelihood estimation procedure due to their high dimensionality; thus, a dimension reduction approach is taken to utilize these measurements in the inference. Two synthetic scenarios from hydrology are explored in which pumping drawdown and concentration breakthrough curves are used to infer the global mean of a log-normally distributed hydraulic conductivity field. Both cases pursue the use of a parametric model to represent the shape of the observed time series with physically-interpretable parameters (e.g. the time and magnitude of a concentration peak), which is compared to subsets of the observations with similar dimensionality. The results from both scenarios highlight the effectiveness for the shape-matching models to reduce dimensionality from 100+ dimensions down to less than five. The models outperform the alternative subset method, especially when the observations are noisy. This approach to incorporating time series observations in the Bayesian framework for inferring geostatistical parameters allows for high-dimensional observations to be faithfully represented in lower-dimensional space for the non-parametric likelihood estimation procedure, which increases the applicability of the framework to more observation types. Although the scenarios are both from hydrogeology, the methodology is general in that no assumptions are made about the subject domain. Any application that requires the inference of geostatistical parameters using series in either time of space can use the approach described in this paper. KW - Geostatistics KW - Stochastic hydrogeology KW - Dimension reduction KW - Bayesian inference Y1 - 2019 U6 - https://doi.org/10.1007/s00477-019-01697-9 SN - 1436-3240 SN - 1436-3259 VL - 33 IS - 7 SP - 1327 EP - 1344 PB - Springer CY - New York ER - TY - JOUR A1 - Müller, Sebastian A1 - Zech, Alraune A1 - Hesse, Falk T1 - ogs5py: APython-APIfor theOpenGeoSys5 Scientific Modeling Package JF - Groundwater : journal of the Association of Ground-Water Scientists and Engineers, a division of the National Ground Water Association N2 - High-performance numerical codes are an indispensable tool for hydrogeologists when modeling subsurface flow and transport systems. But as they are written in compiled languages, like C/C++ or Fortran, established software packages are rarely user-friendly, limiting a wider adoption of such tools. OpenGeoSys (OGS), an open-source, finite-element solver for thermo-hydro-mechanical-chemical processes in porous and fractured media, is no exception. Graphical user interfaces may increase usability, but do so at a dramatic reduction of flexibility and are difficult or impossible to integrate into a larger workflow. Python offers an optimal trade-off between these goals by providing a highly flexible, yet comparatively user-friendly environment for software applications. Hence, we introduceogs5py, a Python-API for the OpenGeoSys 5 scientific modeling package. It provides a fully Python-based representation of an OGS project, a large array of convenience functions for users to interact with OGS and connects OGS to the scientific and computational environment of Python. Y1 - 2021 U6 - https://doi.org/10.1111/gwat.13017 SN - 0017-467X SN - 1745-6584 VL - 59 IS - 1 SP - 117 EP - 122 PB - Wiley CY - Hoboken ER - TY - JOUR A1 - Cucchi, Karma A1 - Hesse, Falk A1 - Kawa, Nura A1 - Wang, Changhong A1 - Rubin, Yoram T1 - Ex-situ priors: A Bayesian hierarchical framework for defining informative prior distributions in hydrogeology JF - Advances in water resources N2 - Stochastic modeling is a common practice for modeling uncertainty in hydrogeology. In stochastic modeling, aquifer properties are characterized by their probability density functions (PDFs). The Bayesian approach for inverse modeling is often used to assimilate information from field measurements collected at a site into properties’ posterior PDFs. This necessitates the definition of a prior PDF, characterizing the knowledge of hydrological properties before undertaking any investigation at the site, and usually coming from previous studies at similar sites. In this paper, we introduce a Bayesian hierarchical algorithm capable of assimilating various information–like point measurements, bounds and moments–into a single, informative PDF that we call ex-situ prior. This informative PDF summarizes the ex-situ information available about a hydrogeological parameter at a site of interest, which can then be used as a prior PDF in future studies at the site. We demonstrate the behavior of the algorithm on several synthetic case studies, compare it to other methods described in the literature, and illustrate the approach by applying it to a public open-access hydrogeological dataset. KW - Data assimilation KW - Data fusion KW - Bayesian hierarchical model KW - Informative prior KW - Databases Y1 - 2019 U6 - https://doi.org/10.1016/j.advwatres.2019.02.003 SN - 0309-1708 SN - 1872-9657 VL - 126 SP - 65 EP - 78 PB - Elsevier CY - Oxford ER - TY - JOUR A1 - Müller, Sebastian A1 - Schüler, Lennart A1 - Zech, Alraune A1 - Heße, Falk T1 - GSTools v1.3: a toolbox for geostatistical modelling in Python JF - Geoscientific model development : an interactive open access journal of the European Geosciences Union N2 - Geostatistics as a subfield of statistics accounts for the spatial correlations encountered in many applications of, for example, earth sciences. Valuable information can be extracted from these correlations, also helping to address the often encountered burden of data scarcity. Despite the value of additional data, the use of geostatistics still falls short of its potential. This problem is often connected to the lack of user-friendly software hampering the use and application of geostatistics. We therefore present GSTools, a Python-based software suite for solving a wide range of geostatistical problems. We chose Python due to its unique balance between usability, flexibility, and efficiency and due to its adoption in the scientific community. GSTools provides methods for generating random fields; it can perform kriging, variogram estimation and much more. We demonstrate its abilities by virtue of a series of example applications detailing their use. Y1 - 2022 U6 - https://doi.org/10.5194/gmd-15-3161-2022 SN - 1991-959X SN - 1991-9603 VL - 15 IS - 7 SP - 3161 EP - 3182 PB - Copernicus CY - Göttingen ER - TY - JOUR A1 - Khurana, Swamini A1 - Heße, Falk A1 - Hildebrandt, Anke A1 - Thullner, Martin T1 - Predicting the impact of spatial heterogeneity on microbially mediated nutrient cycling in the subsurface JF - Biogeosciences N2 - The subsurface is a temporally dynamic and spatially heterogeneous compartment of the Earth's critical zone, and biogeochemical transformations taking place in this compartment are crucial for the cycling of nutrients. The impact of spatial heterogeneity on such microbially mediated nutrient cycling is not well known, which imposes a severe challenge in the prediction of in situ biogeochemical transformation rates and further of nutrient loading contributed by the groundwater to the surface water bodies. Therefore, we used a numerical modelling approach to evaluate the sensitivity of groundwater microbial biomass distribution and nutrient cycling to spatial heterogeneity in different scenarios accounting for various residence times. The model results gave us an insight into domain characteristics with respect to the presence of oxic niches in predominantly anoxic zones and vice versa depending on the extent of spatial heterogeneity and the flow regime. The obtained results show that microbial abundance, distribution, and activity are sensitive to the applied flow regime and that the mobile (i.e. observable by groundwater sampling) fraction of microbial biomass is a varying, yet only a small, fraction of the total biomass in a domain. Furthermore, spatial heterogeneity resulted in anaerobic niches in the domain and shifts in microbial biomass between active and inactive states. The lack of consideration of spatial heterogeneity, thus, can result in inaccurate estimation of microbial activity. In most cases this leads to an overestimation of nutrient removal (up to twice the actual amount) along a flow path. We conclude that the governing factors for evaluating this are the residence time of solutes and the Damkohler number (Da) of the biogeochemical reactions in the domain. We propose a relationship to scale the impact of spatial heterogeneity on nutrient removal governed by the logioDa. This relationship may be applied in upscaled descriptions of microbially mediated nutrient cycling dynamics in the subsurface thereby resulting in more accurate predictions of, for example, carbon and nitrogen cycling in groundwater over long periods at the catchment scale. Y1 - 2022 U6 - https://doi.org/10.5194/bg-19-665-2022 SN - 1726-4170 SN - 1726-4189 VL - 19 IS - 3 SP - 665 EP - 688 PB - Copernicus CY - Göttingen ER - TY - JOUR A1 - Khurana, Swamini A1 - Hesse, Falk A1 - Kleidon-Hildebrandt, Anke A1 - Thullner, Martin T1 - Should we worry about surficial dynamics when assessing nutrient cycling in the groundwater? JF - Frontiers in water N2 - The fluxes of water and solutes in the subsurface compartment of the Critical Zone are temporally dynamic and it is unclear how this impacts microbial mediated nutrient cycling in the spatially heterogeneous subsurface. To investigate this, we undertook numerical modeling, simulating the transport in a wide range of spatially heterogeneous domains, and the biogeochemical transformation of organic carbon and nitrogen compounds using a complex microbial community with four (4) distinct functional groups, in water saturated subsurface compartments. We performed a comprehensive uncertainty analysis accounting for varying residence times and spatial heterogeneity. While the aggregated removal of chemical species in the domains over the entire simulation period was approximately the same as that in steady state conditions, the sub-scale temporal variation of microbial biomass and chemical discharge from a domain depended strongly on the interplay of spatial heterogeneity and temporal dynamics of the forcing. We showed that the travel time and the Damkohler number (Da) can be used to predict the temporally varying chemical discharge from a spatially heterogeneous domain. In homogeneous domains, chemical discharge in temporally dynamic conditions could be double of that in the steady state conditions while microbial biomass varied up to 75% of that in steady state conditions. In heterogeneous domains, the interquartile range of uncertainty in chemical discharge in reaction dominated systems (log(10)Da > 0) was double of that in steady state conditions. However, high heterogeneous domains resulted in outliers where chemical discharge could be as high as 10-20 times of that in steady state conditions in high flow periods. And in transport dominated systems (log(10)Da < 0), the chemical discharge could be half of that in steady state conditions in unusually low flow conditions. In conclusion, ignoring spatio-temporal heterogeneities in a numerical modeling approach may exacerbate inaccurate estimation of nutrient export and microbial biomass. The results are relevant to long-term field monitoring studies, and for homogeneous soil column-scale experiments investigating the role of temporal dynamics on microbial redox dynamics. KW - reactive transport modeling KW - spatio-temporal heterogeneity KW - uncertainty KW - geomicrobial activity KW - nutrient export Y1 - 2022 U6 - https://doi.org/10.3389/frwa.2022.780297 SN - 2624-9375 VL - 4 PB - Frontiers Media CY - Lausanne ER - TY - GEN A1 - Heße, Falk A1 - Comunian, Alessandro A1 - Attinger, Sabine T1 - What We Talk About When We Talk About Uncertainty BT - Toward a Unified, Data-Driven Framework for Uncertainty Characterization in Hydrogeology T2 - Postprints der Universität Potsdam Mathematisch-Naturwissenschaftliche Reihe T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 754 KW - Bayesianism KW - uncertainty analysis KW - hydrogeology KW - data science KW - opinion KW - prior derivation Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-436582 SN - 1866-8372 IS - 754 ER - TY - GEN A1 - Jing, Miao A1 - Heße, Falk A1 - Kumar, Rohini A1 - Wang, Wenqing A1 - Fischer, Thomas A1 - Walther, Marc A1 - Zink, Matthias A1 - Zech, Alraune A1 - Samaniego, Luis A1 - Kolditz, Olaf A1 - Attinger, Sabine T1 - Improved regional-scale groundwater representation by the coupling of the mesoscale Hydrologic Model (mHM v5.7) to the groundwater model OpenGeoSys (OGS) T2 - Postprints der Universität Potsdam : Mathematisch Naturwissenschaftliche Reihe N2 - Most large-scale hydrologic models fall short in reproducing groundwater head dynamics and simulating transport process due to their oversimplified representation of groundwater flow. In this study, we aim to extend the applicability of the mesoscale Hydrologic Model (mHM v5.7) to subsurface hydrology by coupling it with the porous media simulator OpenGeoSys (OGS). The two models are one-way coupled through model interfaces GIS2FEM and RIV2FEM, by which the grid-based fluxes of groundwater recharge and the river-groundwater exchange generated by mHM are converted to fixed-flux boundary conditions of the groundwater model OGS. Specifically, the grid-based vertical reservoirs in mHM are completely preserved for the estimation of land-surface fluxes, while OGS acts as a plug-in to the original mHM modeling framework for groundwater flow and transport modeling. The applicability of the coupled model (mHM-OGS v1.0) is evaluated by a case study in the central European mesoscale river basin - Nagelstedt. Different time steps, i.e., daily in mHM and monthly in OGS, are used to account for fast surface flow and slow groundwater flow. Model calibration is conducted following a two-step procedure using discharge for mHM and long-term mean of groundwater head measurements for OGS. Based on the model summary statistics, namely the Nash-Sutcliffe model efficiency (NSE), the mean absolute error (MAE), and the interquartile range error (QRE), the coupled model is able to satisfactorily represent the dynamics of discharge and groundwater heads at several locations across the study basin. Our exemplary calculations show that the one-way coupled model can take advantage of the spatially explicit modeling capabilities of surface and groundwater hydrologic models and provide an adequate representation of the spatiotemporal behaviors of groundwater storage and heads, thus making it a valuable tool for addressing water resources and management problems. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 851 KW - travel-time distributions KW - surface-water KW - land-surface KW - surface/subsurface flow KW - parameter-estimation KW - subsurface flow KW - transport model KW - climate-change KW - river-basins KW - catchment Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427030 SN - 1866-8372 IS - 851 SP - 1989 EP - 2007 ER - TY - GEN A1 - Jing, Miao A1 - Kumar, Rohini A1 - Heße, Falk A1 - Thober, Stephan A1 - Rakovec, Oldrich A1 - Samaniego, Luis A1 - Attinger, Sabine T1 - Assessing the response of groundwater quantity and travel time distribution to 1.5, 2, and 3 °C global warming in a mesoscale central German basin T2 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Groundwater is the biggest single source of high-quality freshwater worldwide, which is also continuously threatened by the changing climate. In this paper, we investigate the response of the regional groundwater system to climate change under three global warming levels (1.5, 2, and 3 ∘C) in a central German basin (Nägelstedt). This investigation is conducted by deploying an integrated modeling workflow that consists of a mesoscale hydrologic model (mHM) and a fully distributed groundwater model, OpenGeoSys (OGS). mHM is forced with climate simulations of five general circulation models under three representative concentration pathways. The diffuse recharges estimated by mHM are used as boundary forcings to the OGS groundwater model to compute changes in groundwater levels and travel time distributions. Simulation results indicate that groundwater recharges and levels are expected to increase slightly under future climate scenarios. Meanwhile, the mean travel time is expected to decrease compared to the historical average. However, the ensemble simulations do not all agree on the sign of relative change. Changes in mean travel time exhibit a larger variability than those in groundwater levels. The ensemble simulations do not show a systematic relationship between the projected change (in both groundwater levels and travel times) and the warming level, but they indicate an increased variability in projected changes with adjusting the enhanced warming level from 1.5 to 3 ∘C. Correspondingly, it is highly recommended to restrain the trend of global warming. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1402 KW - climate change impacts KW - hydrological models KW - coupled surface KW - water fluxes KW - catchment KW - recharge KW - dynamics KW - aquifer KW - flow KW - parameterization Y1 - 2020 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-509343 SN - 1866-8372 IS - 3 ER - TY - GEN A1 - Schmidt, Lennart A1 - Heße, Falk A1 - Attinger, Sabine A1 - Kumar, Rohini T1 - Challenges in applying machine learning models for hydrological inference: a case study for flooding events across Germany T2 - Postprints der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe N2 - Machine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever-increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model-agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine. T3 - Zweitveröffentlichungen der Universität Potsdam : Mathematisch-Naturwissenschaftliche Reihe - 1193 KW - machine learning KW - inference KW - floods Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-523843 SN - 1866-8372 IS - 5 ER - TY - JOUR A1 - Jing, Miao A1 - Kumar, Rohini A1 - Heße, Falk A1 - Thober, Stephan A1 - Rakovec, Oldrich A1 - Samaniego, Luis A1 - Attinger, Sabine T1 - Assessing the response of groundwater quantity and travel time distribution to 1.5, 2, and 3 °C global warming in a mesoscale central German basin JF - Hydrology and Earth System Sciences N2 - Groundwater is the biggest single source of high-quality freshwater worldwide, which is also continuously threatened by the changing climate. In this paper, we investigate the response of the regional groundwater system to climate change under three global warming levels (1.5, 2, and 3 ∘C) in a central German basin (Nägelstedt). This investigation is conducted by deploying an integrated modeling workflow that consists of a mesoscale hydrologic model (mHM) and a fully distributed groundwater model, OpenGeoSys (OGS). mHM is forced with climate simulations of five general circulation models under three representative concentration pathways. The diffuse recharges estimated by mHM are used as boundary forcings to the OGS groundwater model to compute changes in groundwater levels and travel time distributions. Simulation results indicate that groundwater recharges and levels are expected to increase slightly under future climate scenarios. Meanwhile, the mean travel time is expected to decrease compared to the historical average. However, the ensemble simulations do not all agree on the sign of relative change. Changes in mean travel time exhibit a larger variability than those in groundwater levels. The ensemble simulations do not show a systematic relationship between the projected change (in both groundwater levels and travel times) and the warming level, but they indicate an increased variability in projected changes with adjusting the enhanced warming level from 1.5 to 3 ∘C. Correspondingly, it is highly recommended to restrain the trend of global warming. KW - climate change impacts KW - hydrological models KW - coupled surface KW - water fluxes KW - catchment KW - recharge KW - dynamics KW - aquifer KW - flow KW - parameterization Y1 - 2020 U6 - https://doi.org/10.5194/hess-24-1511-2020 SN - 1607-7938 SN - 1027-5606 VL - 24 IS - 3 SP - 1511 EP - 1526 PB - Copernicus Publ. CY - Göttingen ER - TY - JOUR A1 - Heße, Falk A1 - Comunian, Alessandro A1 - Attinger, Sabine T1 - What We Talk About When We Talk About Uncertainty BT - Toward a Unified, Data-Driven Framework for Uncertainty Characterization in Hydrogeology JF - Frontiers in Earth Science KW - Bayesianism KW - uncertainty analysis KW - hydrogeology KW - data science KW - opinion KW - prior derivation Y1 - 2019 U6 - https://doi.org/10.3389/feart.2019.00118 SN - 2296-6463 VL - 7 PB - Frontiers Media CY - Lausanne ER - TY - JOUR A1 - Jing, Miao A1 - Hesse, Falk A1 - Kumar, Rohini A1 - Wang, Wenqing A1 - Fischer, Thomas A1 - Walther, Marc A1 - Zink, Matthias A1 - Zech, Alraune A1 - Samaniego, Luis A1 - Kolditz, Olaf A1 - Attinger, Sabine T1 - Improved regional-scale groundwater representation by the coupling of the mesoscale Hydrologic Model (mHM v5.7) to the groundwater model OpenGeoSys (OGS) JF - Geoscientific model development : an interactive open access journal of the European Geosciences Union N2 - Most large-scale hydrologic models fall short in reproducing groundwater head dynamics and simulating transport process due to their oversimplified representation of groundwater flow. In this study, we aim to extend the applicability of the mesoscale Hydrologic Model (mHM v5.7) to subsurface hydrology by coupling it with the porous media simulator OpenGeoSys (OGS). The two models are one-way coupled through model interfaces GIS2FEM and RIV2FEM, by which the grid-based fluxes of groundwater recharge and the river-groundwater exchange generated by mHM are converted to fixed-flux boundary conditions of the groundwater model OGS. Specifically, the grid-based vertical reservoirs in mHM are completely preserved for the estimation of land-surface fluxes, while OGS acts as a plug-in to the original mHM modeling framework for groundwater flow and transport modeling. The applicability of the coupled model (mHM-OGS v1.0) is evaluated by a case study in the central European mesoscale river basin - Nagelstedt. Different time steps, i.e., daily in mHM and monthly in OGS, are used to account for fast surface flow and slow groundwater flow. Model calibration is conducted following a two-step procedure using discharge for mHM and long-term mean of groundwater head measurements for OGS. Based on the model summary statistics, namely the Nash-Sutcliffe model efficiency (NSE), the mean absolute error (MAE), and the interquartile range error (QRE), the coupled model is able to satisfactorily represent the dynamics of discharge and groundwater heads at several locations across the study basin. Our exemplary calculations show that the one-way coupled model can take advantage of the spatially explicit modeling capabilities of surface and groundwater hydrologic models and provide an adequate representation of the spatiotemporal behaviors of groundwater storage and heads, thus making it a valuable tool for addressing water resources and management problems. Y1 - 2018 U6 - https://doi.org/10.5194/gmd-11-1989-2018 SN - 1991-959X SN - 1991-9603 VL - 11 IS - 5 SP - 1989 EP - 2007 PB - Copernicus CY - Göttingen ER - TY - JOUR A1 - Jing, Miao A1 - Hesse, Falk A1 - Kumar, Rohini A1 - Kolditz, Olaf A1 - Kalbacher, Thomas A1 - Attinger, Sabine T1 - Influence of input and parameter uncertainty on the prediction of catchment-scale groundwater travel time distributions JF - Hydrology and earth system sciences : HESS N2 - Groundwater travel time distributions (TTDs) provide a robust description of the subsurface mixing behavior and hydrological response of a subsurface system. Lagrangian particle tracking is often used to derive the groundwater TTDs. The reliability of this approach is subjected to the uncertainty of external forcings, internal hydraulic properties, and the interplay between them. Here, we evaluate the uncertainty of catchment groundwater TTDs in an agricultural catchment using a 3-D groundwater model with an overall focus on revealing the relationship between external forcing, internal hydraulic properties, and TTD predictions. Eight recharge realizations are sampled from a high-resolution dataset of land surface fluxes and states. Calibration-constrained hydraulic conductivity fields (Ks fields) are stochastically generated using the null-space Monte Carlo (NSMC) method for each recharge realization. The random walk particle tracking (RWPT) method is used to track the pathways of particles and compute travel times. Moreover, an analytical model under the random sampling (RS) assumption is fit against the numerical solutions, serving as a reference for the mixing behavior of the model domain. The StorAge Selection (SAS) function is used to interpret the results in terms of quantifying the systematic preference for discharging young/old water. The simulation results reveal the primary effect of recharge on the predicted mean travel time (MTT). The different realizations of calibration-constrained Ks fields moderately magnify or attenuate the predicted MTTs. The analytical model does not properly replicate the numerical solution, and it underestimates the mean travel time. Simulated SAS functions indicate an overall preference for young water for all realizations. The spatial pattern of recharge controls the shape and breadth of simulated TTDs and SAS functions by changing the spatial distribution of particles' pathways. In conclusion, overlooking the spatial nonuniformity and uncertainty of input (forcing) will result in biased travel time predictions. We also highlight the worth of reliable observations in reducing predictive uncertainty and the good interpretability of SAS functions in terms of understanding catchment transport processes. Y1 - 2019 U6 - https://doi.org/10.5194/hess-23-171-2019 SN - 1027-5606 SN - 1607-7938 VL - 23 IS - 1 SP - 171 EP - 190 PB - Copernicus CY - Göttingen ER - TY - JOUR A1 - Schmidt, Lennart A1 - Hesse, Falk A1 - Attinger, Sabine A1 - Kumar, Rohini T1 - Challenges in applying machine learning models for hydrological inference BT - a case study for flooding events across Germany JF - Water resources research N2 - Machine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever-increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model-agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine. KW - machine learning KW - inference KW - floods Y1 - 2020 U6 - https://doi.org/10.1029/2019WR025924 SN - 0043-1397 SN - 1944-7973 VL - 56 IS - 5 PB - American Geophysical Union CY - Washington ER - TY - JOUR A1 - Kawa, Nura A1 - Cucchi, Karina A1 - Rubin, Yoram A1 - Attinger, Sabine A1 - Hesse, Falk T1 - Defining Hydrogeological Site Similarity with Hierarchical Agglomerative Clustering JF - Groundwater : journal of the Association of Ground-Water Scientists and Engineers, a division of the National Ground Water Association N2 - Hydrogeological information about an aquifer is difficult and costly to obtain, yet essential for the efficient management of groundwater resources. Transferring information from sampled sites to a specific site of interest can provide information when site-specific data is lacking. Central to this approach is the notion of site similarity, which is necessary for determining relevant sites to include in the data transfer process. In this paper, we present a data-driven method for defining site similarity. We apply this method to selecting groups of similar sites from which to derive prior distributions for the Bayesian estimation of hydraulic conductivity measurements at sites of interest. We conclude that there is now a unique opportunity to combine hydrogeological expertise with data-driven methods to improve the predictive ability of stochastic hydrogeological models. Y1 - 2022 U6 - https://doi.org/10.1111/gwat.13261 SN - 0017-467X SN - 1745-6584 PB - Wiley CY - Hoboken ER - TY - JOUR A1 - Schmidt, Lennart A1 - Heße, Falk A1 - Attinger, Sabine A1 - Kumar, Rohini T1 - Challenges in applying machine learning models for hydrological inference: a case study for flooding events across Germany JF - Water Resources Research N2 - Machine learning (ML) algorithms are being increasingly used in Earth and Environmental modeling studies owing to the ever-increasing availability of diverse data sets and computational resources as well as advancement in ML algorithms. Despite advances in their predictive accuracy, the usefulness of ML algorithms for inference remains elusive. In this study, we employ two popular ML algorithms, artificial neural networks and random forest, to analyze a large data set of flood events across Germany with the goals to analyze their predictive accuracy and their usability to provide insights to hydrologic system functioning. The results of the ML algorithms are contrasted against a parametric approach based on multiple linear regression. For analysis, we employ a model-agnostic framework named Permuted Feature Importance to derive the influence of models' predictors. This allows us to compare the results of different algorithms for the first time in the context of hydrology. Our main findings are that (1) the ML models achieve higher prediction accuracy than linear regression, (2) the results reflect basic hydrological principles, but (3) further inference is hindered by the heterogeneity of results across algorithms. Thus, we conclude that the problem of equifinality as known from classical hydrological modeling also exists for ML and severely hampers its potential for inference. To account for the observed problems, we propose that when employing ML for inference, this should be made by using multiple algorithms and multiple methods, of which the latter should be embedded in a cross-validation routine. KW - machine learning KW - inference KW - floods Y1 - 2019 VL - 56 IS - 5 PB - John Wiley & Sons, Inc. CY - New Jersey ER -