TY  - THES
A1  - Brill, Fabio Alexander
T1  - Applications of machine learning and open geospatial data in flood risk modelling
N2  - Der technologische Fortschritt erlaubt es, zunehmend komplexe Vorhersagemodelle auf Basis immer größerer Datensätze zu produzieren. Für das Risikomanagement von Naturgefahren sind eine Vielzahl von Modellen als Entscheidungsgrundlage notwendig, z.B. in der Auswertung von Beobachtungsdaten, für die Vorhersage von Gefahrenszenarien, oder zur statistischen Abschätzung der zu erwartenden Schäden. Es stellt sich also die Frage, inwiefern moderne Modellierungsansätze wie das maschinelle Lernen oder Data-Mining in diesem Themenbereich sinnvoll eingesetzt werden können. Zusätzlich ist im Hinblick auf die Datenverfügbarkeit und -zugänglichkeit ein Trend zur Öffnung (open data) zu beobachten. Thema dieser Arbeit ist daher, die Möglichkeiten und Grenzen des maschinellen Lernens und frei verfügbarer Geodaten auf dem Gebiet der Hochwasserrisikomodellierung im weiteren Sinne zu untersuchen. Da dieses übergeordnete Thema sehr breit ist, werden einzelne relevante Aspekte herausgearbeitet und detailliert betrachtet.

Eine prominente Datenquelle im Bereich Hochwasser ist die satellitenbasierte Kartierung von Überflutungsflächen, die z.B. über den Copernicus Service der Europäischen Union frei zur Verfügung gestellt werden. Große Hoffnungen werden in der wissenschaftlichen Literatur in diese Produkte gesetzt, sowohl für die akute Unterstützung der Einsatzkräfte im Katastrophenfall, als auch in der Modellierung mittels hydrodynamischer Modelle oder zur Schadensabschätzung. Daher wurde ein Fokus in dieser Arbeit auf die Untersuchung dieser Flutmasken gelegt. Aus der Beobachtung, dass die Qualität dieser Produkte in bewaldeten und urbanen Gebieten unzureichend ist, wurde ein Verfahren zur nachträglichenVerbesserung mittels maschinellem Lernen entwickelt. Das Verfahren basiert auf einem Klassifikationsalgorithmus der nur Trainingsdaten von einer vorherzusagenden Klasse benötigt, im konkreten Fall also Daten von Überflutungsflächen, nicht jedoch von der negativen Klasse (trockene Gebiete). Die Anwendung für Hurricane Harvey in Houston zeigt großes Potenzial der Methode, abhängig von der Qualität der ursprünglichen Flutmaske.

Anschließend wird anhand einer prozessbasierten Modellkette untersucht, welchen Einfluss implementierte physikalische Prozessdetails auf das vorhergesagte statistische Risiko haben. Es wird anschaulich gezeigt, was eine Risikostudie basierend auf etablierten Modellen leisten kann. Solche Modellketten sind allerdings bereits für Flusshochwasser sehr komplex, und für zusammengesetzte oder kaskadierende Ereignisse mit Starkregen, Sturzfluten, und weiteren Prozessen, kaum vorhanden. Im vierten Kapitel dieser Arbeit wird daher getestet, ob maschinelles Lernen auf Basis von vollständigen Schadensdaten einen direkteren Weg zur Schadensmodellierung ermöglicht, der die explizite Konzeption einer solchen Modellkette umgeht. Dazu wird ein staatlich erhobener Datensatz der geschädigten Gebäude während des schweren El Niño Ereignisses 2017 in Peru verwendet. In diesem Kontext werden auch die Möglichkeiten des Data-Mining zur Extraktion von Prozessverständnis ausgelotet. Es kann gezeigt werden, dass diverse frei verfügbare Geodaten nützliche Informationen für die Gefahren- und Schadensmodellierung von komplexen Flutereignissen liefern, z.B. satellitenbasierte Regenmessungen, topographische und hydrographische Information, kartierte Siedlungsflächen, sowie Indikatoren aus Spektraldaten. Zudem zeigen sich Erkenntnisse zu den Schädigungsprozessen, die im Wesentlichen mit den vorherigen Erwartungen in Einklang stehen. Die maximale Regenintensität wirkt beispielsweise in Städten und steilen Schluchten stärker schädigend, während die Niederschlagssumme in tiefliegenden Flussgebieten und bewaldeten Regionen als aussagekräftiger befunden wurde. Ländliche Gebiete in Peru weisen in der präsentierten Studie eine höhere Vulnerabilität als die Stadtgebiete auf. Jedoch werden auch die grundsätzlichen Grenzen der Methodik und die Abhängigkeit von spezifischen Datensätzen and Algorithmen offenkundig.

In der übergreifenden Diskussion werden schließlich die verschiedenen Methoden – prozessbasierte Modellierung, prädiktives maschinelles Lernen, und Data-Mining – mit Blick auf die Gesamtfragestellungen evaluiert. Im Bereich der Gefahrenbeobachtung scheint eine Fokussierung auf neue Algorithmen sinnvoll. Im Bereich der Gefahrenmodellierung, insbesondere für Flusshochwasser, wird eher die Verbesserung von physikalischen Modellen, oder die Integration von prozessbasierten und statistischen Verfahren angeraten. In der Schadensmodellierung fehlen nach wie vor die großen repräsentativen Datensätze, die für eine breite Anwendung von maschinellem Lernen Voraussetzung ist. Daher ist die Verbesserung der Datengrundlage im Bereich der Schäden derzeit als wichtiger einzustufen als die Auswahl der Algorithmen.
N2  - Technological progress allows for producing ever more complex predictive models on the basis of increasingly big datasets. For risk management of natural hazards, a multitude of models is needed as basis for decision-making, e.g. in the evaluation of observational data, for the prediction of hazard scenarios, or for statistical estimates of expected damage. The question arises, how modern modelling approaches like machine learning or data-mining can be meaningfully deployed in this thematic field. In addition, with respect to data availability and accessibility, the trend is towards open data. Topic of this thesis is therefore to investigate the possibilities and limitations of machine learning and open geospatial data in the field of flood risk modelling in the broad sense. As this overarching topic is broad in scope, individual relevant aspects are identified and inspected in detail.

A prominent data source in the flood context is satellite-based mapping of inundated areas, for example made openly available by the Copernicus service of the European Union. Great expectations are directed towards these products in scientific literature, both for acute support of relief forces during emergency response action, and for modelling via hydrodynamic models or for damage estimation. Therefore, a focus of this work was set on evaluating these flood masks. From the observation that the quality of these products is insufficient in forested and built-up areas, a procedure for subsequent improvement via machine learning was developed. This procedure is based on a classification algorithm that only requires training data from a particular class to be predicted, in this specific case data of flooded areas, but not of the negative class (dry areas). The application for hurricane Harvey in Houston shows the high potential of this method, which depends on the quality of the initial flood mask.

Next, it is investigated how much the predicted statistical risk from a process-based model chain is dependent on implemented physical process details. Thereby it is demonstrated what a risk study based on established models can deliver. Even for fluvial flooding, such model chains are already quite complex, though, and are hardly available for compound or cascading events comprising torrential rainfall, flash floods, and other processes. In the fourth chapter of this thesis it is therefore tested whether machine learning based on comprehensive damage data can offer a more direct path towards damage modelling, that avoids explicit conception of such a model chain. For that purpose, a state-collected dataset of damaged buildings from the severe El Niño event 2017 in Peru is used. In this context, the possibilities of data-mining for extracting process knowledge are explored as well. It can be shown that various openly available geodata sources contain useful information for flood hazard and damage modelling for complex events, e.g. satellite-based rainfall measurements, topographic and hydrographic information, mapped settlement areas, as well as indicators from spectral data. Further, insights on damaging processes are discovered, which mainly are in line with prior expectations. The maximum intensity of rainfall, for example, acts stronger in cities and steep canyons, while the sum of rain was found more informative in low-lying river catchments and forested areas. Rural areas of Peru exhibited higher vulnerability in the presented study compared to urban areas. However, the general limitations of the methods and the dependence on specific datasets and algorithms also become obvious.

In the overarching discussion, the different methods – process-based modelling, predictive machine learning, and data-mining – are evaluated with respect to the overall research questions. In the case of hazard observation it seems that a focus on novel algorithms makes sense for future research. In the subtopic of hazard modelling, especially for river floods, the improvement of physical models and the integration of process-based and statistical procedures is suggested. For damage modelling the large and representative datasets necessary for the broad application of machine learning are still lacking. Therefore, the improvement of the data basis in the field of damage is currently regarded as more important than the selection of algorithms.
KW  - flood risk
KW  - machine learning
KW  - open data
KW  - damage modelling
KW  - data-mining
KW  - Schadensmodellierung
KW  - Data-Mining
KW  - Hochwasserrisiko
KW  - maschinelles Lernen
KW  - offene Daten
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-555943
ER  - 
TY  - THES
A1  - Sieg, Tobias
T1  - Reliability of flood damage estimations across spatial scales
T1  - Verlässlichkeit von Hochwasserschadensschätzungen über räumliche Skalen
N2  - Extreme Naturereignisse sind ein integraler Bestandteil der Natur der Erde. Sie werden erst dann zu Gefahren für die Gesellschaft, wenn sie diesen Ereignissen ausgesetzt ist. Dann allerdings können Naturgefahren verheerende Folgen für die Gesellschaft haben. Besonders hydro-meteorologische Gefahren wie zum Beispiel Flusshochwasser, Starkregenereignisse, Winterstürme, Orkane oder Tornados haben ein hohes Schadenspotential und treten rund um den Globus auf. Einhergehend mit einer immer wärmer werdenden Welt, werden auch Extremwetterereignisse, welche potentiell Naturgefahren auslösen können, immer wahrscheinlicher. Allerdings trägt nicht nur eine sich verändernde Umwelt zur Erhöhung des Risikos von Naturgefahren bei, sondern auch eine sich verändernde Gesellschaft. Daher ist ein angemessenes Risikomanagement erforderlich um die Gesellschaft auf jeder räumlichen Ebene an diese Veränderungen anzupassen. Ein essentieller Bestandteil dieses Managements ist die Abschätzung der ökonomischen Auswirkungen der Naturgefahren. Bisher allerdings fehlen verlässliche Methoden um die Auswirkungen von hydro-meteorologischen Gefahren abzuschätzen. Ein Hauptbestandteil dieser Arbeit ist daher die Entwicklung und Anwendung einer neuen Methode, welche die Verlässlichkeit der Schadensschätzung verbessert. Die Methode wurde beispielhaft zur Schätzung der ökonomischen Auswirkungen eines Flusshochwassers auf einzelne Unternehmen bis hin zu den Auswirkungen auf das gesamte Wirtschaftssystem Deutschlands erfolgreich angewendet. Bestehende Methoden geben meist wenig Information über die Verlässlichkeit ihrer Schätzungen. Da diese Informationen Entscheidungen zur Anpassung an das Risiko erleichtern, wird die Verlässlichkeit der Schadensschätzungen mit der neuen Methode dargestellt. Die Verlässlichkeit bezieht sich dabei nicht nur auf die Schadensschätzung selber, sondern auch auf die Annahmen, die über betroffene Gebäude gemacht werden. Nach diesem Prinzip kann auch die Verlässlichkeit von Annahmen über die Zukunft dargestellt werden, dies ist ein wesentlicher Aspekt für Prognosen. Die Darstellung der Verlässlichkeit und die erfolgreiche Anwendung zeigt das Potential der Methode zur Verwendung von Analysen für gegenwärtige und zukünftige hydro-meteorologische Gefahren.
N2  - Natural extreme events are an integral part of nature on planet earth. Usually these events are only considered hazardous to humans, in case they are exposed. In this case, however, natural hazards can have devastating impacts on human societies. Especially hydro-meteorological hazards have a high damage potential in form of e.g. riverine and pluvial floods, winter storms, hurricanes and tornadoes, which can occur all over the globe. Along with an increasingly warm climate also an increase in extreme weather which potentially triggers natural hazards can be expected. Yet, not only changing natural systems, but also changing societal systems contribute to an increasing risk associated with these hazards. These can comprise increasing exposure and possibly also increasing vulnerability to the impacts of natural events. Thus, appropriate risk management is required to adapt all parts of society to existing and upcoming risks at various spatial scales. One essential part of risk management is the risk assessment including the estimation of the economic impacts. However, reliable methods for the estimation of economic impacts due to hydro-meteorological hazards are still missing. Therefore, this thesis deals with the question of how the reliability of hazard damage estimates can be improved, represented and propagated across all spatial scales. This question is investigated using the specific example of economic impacts to companies as a result of riverine floods in Germany.

Flood damage models aim to describe the damage processes during a given flood event. In other words they describe the vulnerability of a specific object to a flood. The models can be based on empirical data sets collected after flood events. In this thesis tree-based models trained with survey data are used for the estimation of direct economic flood impacts on the objects. It is found that these machine learning models, in conjunction with increasing sizes of data sets used to derive the models, outperform state-of-the-art damage models. However, despite the performance improvements induced by using multiple variables and more data points, large prediction errors remain at the object level. The occurrence of the high errors was explained by a further investigation using distributions derived from tree-based models. The investigation showed that direct economic impacts to individual objects cannot be modeled by a normal distribution. Yet, most state-of-the-art approaches assume a normal distribution and take mean values as point estimators. Subsequently, the predictions are unlikely values within the distributions resulting in high errors. At larger spatial scales more objects are considered for the damage estimation. This leads to a better fit of the damage estimates to a normal distribution. Consequently, also the performance of the point estimators get better, although large errors can still occur due to the variance of the normal distribution. It is recommended to use distributions instead of point estimates in order to represent the reliability of damage estimates.

In addition current approaches also mostly ignore the uncertainty associated with the characteristics of the hazard and the exposed objects. For a given flood event e.g. the estimation of the water level at a certain building is prone to uncertainties. Current approaches define exposed objects mostly by the use of land use data sets. These data sets often show inconsistencies, which introduce additional uncertainties. Furthermore, state-of-the-art approaches also imply problems of missing consistency when predicting the damage at different spatial scales. This is due to the use of different types of exposure data sets for model derivation and application. In order to face these issues a novel object-based method was developed in this thesis. The method enables a seamless estimation of hydro-meteorological hazard damage across spatial scales including uncertainty quantification. The application and validation of the method resulted in plausible estimations at all spatial scales without overestimating the uncertainty.

Mainly newly available data sets containing individual buildings make the application of the method possible as they allow for the identification of flood affected objects by overlaying the data sets with water masks. However, the identification of affected objects with two different water masks revealed huge differences in the number of identified objects. Thus, more effort is needed for their identification, since the number of objects affected determines the order of magnitude of the economic flood impacts to a large extent.

In general the method represents the uncertainties associated with the three components of risk namely hazard, exposure and vulnerability, in form of probability distributions. The object-based approach enables a consistent propagation of these uncertainties in space. Aside from the propagation of damage estimates and their uncertainties across spatial scales, a propagation between models estimating direct and indirect economic impacts was demonstrated. This enables the inclusion of uncertainties associated with the direct economic impacts within the estimation of the indirect economic impacts. Consequently, the modeling procedure facilitates the representation of the reliability of estimated total economic impacts. The representation of the estimates' reliability prevents reasoning based on a false certainty, which might be attributed to point estimates. Therefore, the developed approach facilitates a meaningful flood risk management and adaptation planning.

The successful post-event application and the representation of the uncertainties qualifies the method also for the use for future risk assessments. Thus, the developed method enables the representation of the assumptions made for the future risk assessments, which is crucial information for future risk management. This is an important step forward, since the representation of reliability associated with all components of risk is currently lacking in all state-of-the-art methods assessing future risk.

In conclusion, the use of object-based methods giving results in the form of distributions instead of point estimations is recommended. The improvement of the model performance by the means of multi-variable models and additional data points is possible, but small. Uncertainties associated with all components of damage estimation should be included and represented within the results. Furthermore, the findings of the thesis suggest that, at larger scales, the influence of the uncertainty associated with the vulnerability is smaller than those associated with the hazard and exposure. This leads to the conclusion that for an increased reliability of flood damage estimations and risk assessments, the improvement and active inclusion of hazard and exposure, including their uncertainties, is needed in addition to the improvements of the models describing the vulnerability of the objects.
KW  - hydro-meteorological risk
KW  - damage modeling
KW  - uncertainty
KW  - probabilistic approach
KW  - economic impacts
KW  - OpenStreetMap
KW  - hydro-meteorologische Risiken
KW  - Schadensmodellierung
KW  - Unsicherheiten
KW  - probabilistischer Ansatz
KW  - ökonomische Auswirkungen
KW  - OpenStreetMap
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-426161
ER  -