### Refine

#### Keywords

- Aleatory variability (2)
- Epistemic uncertainty (2)
- Bayesian networks (1)
- Bayesian non-parametrics (1)
- Empirical ground-motion models (1)
- Europe (1)
- GMPE (1)
- Gaussian Process regression (1)
- Generalization error (1)
- Ground-motion models (1)

#### Institute

Modern natural hazards research requires dealing with several uncertainties that arise from limited process knowledge, measurement errors, censored and incomplete observations, and the intrinsic randomness of the governing processes. Nevertheless, deterministic analyses are still widely used in quantitative hazard assessments despite the pitfall of misestimating the hazard and any ensuing risks.
In this paper we show that Bayesian networks offer a flexible framework for capturing and expressing a broad range of uncertainties encountered in natural hazard assessments. Although Bayesian networks are well studied in theory, their application to real-world data is far from straightforward, and requires specific tailoring and adaptation of existing algorithms. We offer suggestions as how to tackle frequently arising problems in this context and mainly concentrate on the handling of continuous variables, incomplete data sets, and the interaction of both. By way of three case studies from earthquake, flood, and landslide research, we demonstrate the method of data-driven Bayesian network learning, and showcase the flexibility, applicability, and benefits of this approach.
Our results offer fresh and partly counterintuitive insights into well-studied multivariate problems of earthquake-induced ground motion prediction, accurate flood damage quantification, and spatially explicit landslide prediction at the regional scale. In particular, we highlight how Bayesian networks help to express information flow and independence assumptions between candidate predictors. Such knowledge is pivotal in providing scientists and decision makers with well-informed strategies for selecting adequate predictor variables for quantitative natural hazard assessments.

This paper presents a Bayesian non-parametric method based on Gaussian Process (GP) regression to derive ground-motion models for peak-ground parameters and response spectral ordinates. Due to its non-parametric nature there is no need to specify any fixed functional form as in parametric regression models. A GP defines a distribution over functions, which implicitly expresses the uncertainty over the underlying data generating process. An advantage of GP regression is that it is possible to capture the whole uncertainty involved in ground-motion modeling, both in terms of aleatory variability as well as epistemic uncertainty associated with the underlying functional form and data coverage. The distribution over functions is updated in a Bayesian way by computing the posterior distribution of the GP after observing ground-motion data, which in turn can be used to make predictions. The proposed GP regression models is evaluated on a subset of the RESORCE data base for the SIGMA project. The experiments show that GP models have a better generalization error than a simple parametric regression model. A visual assessment of different scenarios demonstrates that the inferred GP models are physically plausible.

This article presents comparisons among the five ground-motion models described in other articles within this special issue, in terms of data selection criteria, characteristics of the models and predicted peak ground and response spectral accelerations. Comparisons are also made with predictions from the Next Generation Attenuation (NGA) models to which the models presented here have similarities (e.g. a common master database has been used) but also differences (e.g. some models in this issue are nonparametric). As a result of the differing data selection criteria and derivation techniques the predicted median ground motions show considerable differences (up to a factor of two for certain scenarios), particularly for magnitudes and distances close to or beyond the range of the available observations. The predicted influence of style-of-faulting shows much variation among models whereas site amplification factors are more similar, with peak amplification at around 1s. These differences are greater than those among predictions from the NGA models. The models for aleatory variability (sigma), however, are similar and suggest that ground-motion variability from this region is slightly higher than that predicted by the NGA models, based primarily on data from California and Taiwan.

We apply and evaluate a recent machine learning method for the automatic classification of seismic waveforms. The method relies on Dynamic Bayesian Networks (DBN) and supervised learning to improve the detection capabilities at 3C seismic stations. A time-frequency decomposition provides the basis for the required signal characteristics we need in order to derive the features defining typical "signal" and "noise" patterns. Each pattern class is modeled by a DBN, specifying the interrelationships of the derived features in the time-frequency plane. Subsequently, the models are trained using previously labeled segments of seismic data. The DBN models can now be compared against in order to determine the likelihood of new incoming seismic waveform segments to be either signal or noise. As the noise characteristics of seismic stations varies smoothly in time (seasonal variation as well as anthropogenic influence), we accommodate in our approach for a continuous adaptation of the DBN model that is associated with the noise class. Given the difficulty for obtaining a golden standard for real data (ground truth) the proof of concept and evaluation is shown by conducting experiments based on 3C seismic data from the International Monitoring Stations, BOSA and LPAZ.

We investigate the usefulness of complex flood damage models for predicting relative damage to residential buildings in a spatial and temporal transfer context. We apply eight different flood damage models to predict relative building damage for five historic flood events in two different regions of Germany. Model complexity is measured in terms of the number of explanatory variables which varies from 1 variable up to 10 variables which are singled out from 28 candidate variables. Model validation is based on empirical damage data, whereas observation uncertainty is taken into consideration. The comparison of model predictive performance shows that additional explanatory variables besides the water depth improve the predictive capability in a spatial and temporal transfer context, i.e., when the models are transferred to different regions and different flood events. Concerning the trade-off between predictive capability and reliability the model structure seem more important than the number of explanatory variables. Among the models considered, the reliability of Bayesian network-based predictions in space-time transfer is larger than for the remaining models, and the uncertainties associated with damage predictions are reflected more completely.

Bayesian networks are a powerful and increasingly popular tool for reasoning under uncertainty, offering intuitive insight into (probabilistic) data-generating processes. They have been successfully applied to many different fields, including bioinformatics. In this paper, Bayesian networks are used to model the joint-probability distribution of selected earthquake, site, and ground-motion parameters. This provides a probabilistic representation of the independencies and dependencies between these variables. In particular, contrary to classical regression, Bayesian networks do not distinguish between target and predictors, treating each variable as random variable. The capability of Bayesian networks to model the ground-motion domain in probabilistic seismic hazard analysis is shown for a generic situation. A Bayesian network is learned based on a subset of the Next Generation Attenuation (NGA) dataset, using 3342 records from 154 earthquakes. Because no prior assumptions about dependencies between particular parameters are made, the learned network displays the most probable model given the data. The learned network shows that the ground-motion parameter (horizontal peak ground acceleration, PGA) is directly connected only to the moment magnitude, Joyner-Boore distance, fault mechanism, source-to-site azimuth, and depth to a shear-wave horizon of 2: 5 km/s (Z2.5). In particular, the effect of V-S30 is mediated by Z2.5. Comparisons of the PGA distributions based on the Bayesian networks with the NGA model of Boore and Atkinson (2008) show a reasonable agreement in ranges of good data coverage.

In probabilistic seismic-hazard analysis, epistemic uncertainties are commonly treated within a logic-tree framework in which the branch weights express the degree of belief of an expert in a set of models. For the calculation of the distribution of hazard curves, these branch weights represent subjective probabilities. A major challenge for experts is to provide logically consistent weight estimates (in the sense of Kolmogorovs axioms), to be aware of the multitude of heuristics, and to minimize the biases which affect human judgment under uncertainty. We introduce a platform-independent, interactive program enabling us to quantify, elicit, and transfer expert knowledge into a set of subjective probabilities by applying experimental design theory, following the approach of Curtis and Wood (2004). Instead of determining the set of probabilities for all models in a single step, the computer-driven elicitation process is performed as a sequence of evaluations of relative weights for small subsets of models. From these, the probabilities for the whole model set are determined as a solution of an optimization problem. The result of this process is a set of logically consistent probabilities together with a measure of confidence determined from the amount of conflicting information which is provided by the expert during the relative weighting process. We experiment with different scenarios simulating likely expert behaviors in the context of knowledge elicitation and show the impact this has on the results. The overall aim is to provide a smart elicitation technique, and our findings serve as a guide for practical applications.

Although the methodological framework of probabilistic seismic hazard analysis is well established, the selection of models to predict the ground motion at the sites of interest remains a major challenge. Information theory provides a powerful theoretical framework that can guide this selection process in a consistent way. From an information- theoretic perspective, the appropriateness of models can be expressed in terms of their relative information loss (Kullback-Leibler distance) and hence in physically meaningful units (bits). In contrast to hypothesis testing, information-theoretic model selection does not require ad hoc decisions regarding significance levels nor does it require the models to be mutually exclusive and collectively exhaustive. The key ingredient, the Kullback-Leibler distance, can be estimated from the statistical expectation of log-likelihoods of observations for the models under consideration. In the present study, data-driven ground-motion model selection based on Kullback-Leibler-distance differences is illustrated for a set of simulated observations of response spectra and macroseismic intensities. Information theory allows for a unified treatment of both quantities. The application of Kullback-Leibler-distance based model selection to real data using the model generating data set for the Abrahamson and Silva (1997) ground-motion model demonstrates the superior performance of the information-theoretic perspective in comparison to earlier attempts at data- driven model selection (e.g., Scherbaum et al., 2004).

Empirical ground-motion models used in seismic hazard analysis are commonly derived by regression of observed ground motions against a chosen set of predictor variables. Commonly, the model building process is based on residual analysis and/or expert knowledge and/or opinion, while the quality of the model is assessed by the goodness-of-fit to the data. Such an approach, however, bears no immediate relation to the predictive power of the model and with increasing complexity of the models is increasingly susceptible to the danger of overfitting. Here, a different, primarily data-driven method for the development of ground-motion models is proposed that makes use of the notion of generalization error to counteract the problem of overfitting. Generalization error directly estimates the average prediction error on data not used for the model generation and, thus, is a good criterion to assess the predictive capabilities of a model. The approach taken here makes only few a priori assumptions. At first, peak ground acceleration and response spectrum values are modeled by flexible, nonphysical functions (polynomials) of the predictor variables. The inclusion of a particular predictor and the order of the polynomials are based on minimizing generalization error. The approach is illustrated for the next generation of ground-motion attenuation dataset. The resulting model is rather complex, comprising 48 parameters, but has considerably lower generalization error than functional forms commonly used in ground-motion models. The model parameters have no physical meaning, but a visual interpretation is possible and can reveal relevant characteristics of the data, for example, the Moho bounce in the distance scaling. In a second step, the regression model is approximated by an equivalent stochastic model, making it physically interpretable. The resulting resolvable stochastic model parameters are comparable to published models for western North America. In general, for large datasets generalization error minimization provides a viable method for the development of empirical ground-motion models.

Considering the increasing number and complexity of ground-motion prediction equations available for seismic hazard assessment, there is a definite need for an efficient, quantitative, and robust method to select and rank these models for a particular region of interest. In a recent article, Scherbaum et al. (2009) have suggested an information- theoretic approach for this purpose that overcomes several shortcomings of earlier attempts at using data-driven ground- motion prediction equation selection procedures. The results of their theoretical study provides evidence that in addition to observed response spectra, macroseismic intensity data might be useful for model selection and ranking. We present here an applicability study for this approach using response spectra and macroseismic intensities from eight Californian earthquakes. A total of 17 ground-motion prediction equations, from different regions, for response spectra, combined with the equation of Atkinson and Kaka (2007) for macroseismic intensities are tested for their relative performance. The resulting data-driven rankings show that the models that best estimate ground motion in California are, as one would expect, Californian and western U. S. models, while some European models also perform fairly well. Moreover, the model performance appears to be strongly dependent on both distance and frequency. The relative information of intensity versus response spectral data is also explored. The strong correlation we obtain between intensity-based rankings and spectral-based ones demonstrates the great potential of macroseismic intensities data for model selection in the context of seismic hazard assessment.