TY - JOUR A1 - Kühn, Nicolas M. A1 - Scherbaum, Frank A1 - Riggelsen, Carsten T1 - Deriving empirical ground-motion models : balancing data constraints and physical assumptions to optimize prediction capability N2 - Empirical ground-motion models used in seismic hazard analysis are commonly derived by regression of observed ground motions against a chosen set of predictor variables. Commonly, the model building process is based on residual analysis and/or expert knowledge and/or opinion, while the quality of the model is assessed by the goodness-of-fit to the data. Such an approach, however, bears no immediate relation to the predictive power of the model and with increasing complexity of the models is increasingly susceptible to the danger of overfitting. Here, a different, primarily data-driven method for the development of ground-motion models is proposed that makes use of the notion of generalization error to counteract the problem of overfitting. Generalization error directly estimates the average prediction error on data not used for the model generation and, thus, is a good criterion to assess the predictive capabilities of a model. The approach taken here makes only few a priori assumptions. At first, peak ground acceleration and response spectrum values are modeled by flexible, nonphysical functions (polynomials) of the predictor variables. The inclusion of a particular predictor and the order of the polynomials are based on minimizing generalization error. The approach is illustrated for the next generation of ground-motion attenuation dataset. The resulting model is rather complex, comprising 48 parameters, but has considerably lower generalization error than functional forms commonly used in ground-motion models. The model parameters have no physical meaning, but a visual interpretation is possible and can reveal relevant characteristics of the data, for example, the Moho bounce in the distance scaling. In a second step, the regression model is approximated by an equivalent stochastic model, making it physically interpretable. The resulting resolvable stochastic model parameters are comparable to published models for western North America. In general, for large datasets generalization error minimization provides a viable method for the development of empirical ground-motion models. Y1 - 2009 UR - http://bssa.geoscienceworld.org/ U6 - https://doi.org/10.1785/0120080136 SN - 0037-1106 ER - TY - JOUR A1 - Delavaud, Elise A1 - Scherbaum, Frank A1 - Kuehn, Nicolas A1 - Riggelsen, Carsten T1 - Information-theoretic selection of ground-motion prediction equations for seismic hazard analysis : an applicability study using Californian data N2 - Considering the increasing number and complexity of ground-motion prediction equations available for seismic hazard assessment, there is a definite need for an efficient, quantitative, and robust method to select and rank these models for a particular region of interest. In a recent article, Scherbaum et al. (2009) have suggested an information- theoretic approach for this purpose that overcomes several shortcomings of earlier attempts at using data-driven ground- motion prediction equation selection procedures. The results of their theoretical study provides evidence that in addition to observed response spectra, macroseismic intensity data might be useful for model selection and ranking. We present here an applicability study for this approach using response spectra and macroseismic intensities from eight Californian earthquakes. A total of 17 ground-motion prediction equations, from different regions, for response spectra, combined with the equation of Atkinson and Kaka (2007) for macroseismic intensities are tested for their relative performance. The resulting data-driven rankings show that the models that best estimate ground motion in California are, as one would expect, Californian and western U. S. models, while some European models also perform fairly well. Moreover, the model performance appears to be strongly dependent on both distance and frequency. The relative information of intensity versus response spectral data is also explored. The strong correlation we obtain between intensity-based rankings and spectral-based ones demonstrates the great potential of macroseismic intensities data for model selection in the context of seismic hazard assessment. Y1 - 2009 UR - http://bssa.geoscienceworld.org/ U6 - https://doi.org/10.1785/0120090055 SN - 0037-1106 ER - TY - JOUR A1 - Scherbaum, Frank A1 - Delavaud, Elise A1 - Riggelsen, Carsten T1 - Model selection in seismic hazard analysis : an information-theoretic perspective N2 - Although the methodological framework of probabilistic seismic hazard analysis is well established, the selection of models to predict the ground motion at the sites of interest remains a major challenge. Information theory provides a powerful theoretical framework that can guide this selection process in a consistent way. From an information- theoretic perspective, the appropriateness of models can be expressed in terms of their relative information loss (Kullback-Leibler distance) and hence in physically meaningful units (bits). In contrast to hypothesis testing, information-theoretic model selection does not require ad hoc decisions regarding significance levels nor does it require the models to be mutually exclusive and collectively exhaustive. The key ingredient, the Kullback-Leibler distance, can be estimated from the statistical expectation of log-likelihoods of observations for the models under consideration. In the present study, data-driven ground-motion model selection based on Kullback-Leibler-distance differences is illustrated for a set of simulated observations of response spectra and macroseismic intensities. Information theory allows for a unified treatment of both quantities. The application of Kullback-Leibler-distance based model selection to real data using the model generating data set for the Abrahamson and Silva (1997) ground-motion model demonstrates the superior performance of the information-theoretic perspective in comparison to earlier attempts at data- driven model selection (e.g., Scherbaum et al., 2004). Y1 - 2009 UR - http://bssa.geoscienceworld.org/ U6 - https://doi.org/10.1785/0120080347 SN - 0037-1106 ER -