### Refine

#### Has Fulltext

- yes (13) (remove)

#### Year of publication

#### Document Type

- Doctoral Thesis (8)
- Habilitation (2)
- Master's Thesis (2)
- Monograph/Edited Volume (1)

#### Language

- English (13) (remove)

#### Keywords

- Datenanalyse (13) (remove)

In the era of social networks, internet of things and location-based services, many online services produce a huge amount of data that have valuable objective information, such as geographic coordinates and date time. These characteristics (parameters) in the combination with a textual parameter bring the challenge for the discovery of geospatiotemporal knowledge. This challenge requires efficient methods for clustering and pattern mining in spatial, temporal and textual spaces.
In this thesis, we address the challenge of providing methods and frameworks for geospatiotemporal data analytics. As an initial step, we address the challenges of geospatial data processing: data gathering, normalization, geolocation, and storage. That initial step is the basement to tackle the next challenge -- geospatial clustering challenge. The first step of this challenge is to design the method for online clustering of georeferenced data. This algorithm can be used as a server-side clustering algorithm for online maps that visualize massive georeferenced data. As the second step, we develop the extension of this method that considers, additionally, the temporal aspect of data. For that, we propose the density and intensity-based geospatiotemporal clustering algorithm with fixed distance and time radius.
Each version of the clustering algorithm has its own use case that we show in the thesis.
In the next chapter of the thesis, we look at the spatiotemporal analytics from the perspective of the sequential rule mining challenge. We design and implement the framework that transfers data into textual geospatiotemporal data - data that contain geographic coordinates, time and textual parameters. By this way, we address the challenge of applying pattern/rule mining algorithms in geospatiotemporal space. As the applicable use case study, we propose spatiotemporal crime analytics -- discovery spatiotemporal patterns of crimes in publicly available crime data.
The second part of the thesis, we dedicate to the application part and use case studies. We design and implement the application that uses the proposed clustering algorithms to discover knowledge in data. Jointly with the application, we propose the use case studies for analysis of georeferenced data in terms of situational and public safety awareness.

Scientific inquiry requires that we formulate not only what we know, but also what we do not know and by how much. In climate data analysis, this involves an accurate specification of measured quantities and a consequent analysis that consciously propagates the measurement errors at each step. The dissertation presents a thorough analytical method to quantify errors of measurement inherent in paleoclimate data. An additional focus are the uncertainties in assessing the coupling between different factors that influence the global mean temperature (GMT).
Paleoclimate studies critically rely on `proxy variables' that record climatic signals in natural archives. However, such proxy records inherently involve uncertainties in determining the age of the signal. We present a generic Bayesian approach to analytically determine the proxy record along with its associated uncertainty, resulting in a time-ordered sequence of correlated probability distributions rather than a precise time series. We further develop a recurrence based method to detect dynamical events from the proxy probability distributions. The methods are validated with synthetic examples and
demonstrated with real-world proxy records. The proxy estimation step reveals the interrelations between proxy variability and uncertainty. The recurrence analysis of the East Asian Summer Monsoon during the last 9000 years confirms the well-known `dry' events at 8200 and 4400 BP, plus an additional significantly dry event at 6900 BP.
We also analyze the network of dependencies surrounding GMT. We find an intricate, directed network with multiple links between the different factors at multiple time delays. We further uncover a significant feedback from the GMT to the El Niño Southern Oscillation at quasi-biennial timescales. The analysis highlights the need of a more nuanced formulation of influences between different climatic factors, as well as the limitations in trying to estimate such dependencies.

The H.E.S.S. array is a third generation Imaging Atmospheric Cherenkov Telescope (IACT) array. It is located in the Khomas Highland in Namibia, and measures very high energy (VHE) gamma-rays. In Phase I, the array started data taking in 2004 with its four identical 13 m telescopes. Since then, H.E.S.S. has emerged as the most successful IACT experiment to date. Among the almost 150 sources of VHE gamma-ray radiation found so far, even the oldest detection, the Crab Nebula, keeps surprising the scientific community with unexplained phenomena such as the recently discovered very energetic flares of high energy gamma-ray radiation. During its most recent flare, which was detected by the Fermi satellite in March 2013, the Crab Nebula was simultaneously observed with the H.E.S.S. array for six nights. The results of the observations will be discussed in detail during the course of this work. During the nights of the flare, the new 24 m × 32 m H.E.S.S. II telescope was still being commissioned, but participated in the data taking for one night. To be able to reconstruct and analyze the data of the H.E.S.S. Phase II array, the algorithms and software used by the H.E.S.S. Phase I array had to be adapted. The most prominent advanced shower reconstruction technique developed by de Naurois and Rolland, the template-based model analysis, compares real shower images taken by the Cherenkov telescope cameras with shower templates obtained using a semi-analytical model. To find the best fitting image, and, therefore, the relevant parameters that describe the air shower best, a pixel-wise log-likelihood fit is done. The adaptation of this advanced shower reconstruction technique to the heterogeneous H.E.S.S. Phase II array for stereo events (i.e. air showers seen by at least two telescopes of any kind), its performance using MonteCarlo simulations as well as its application to real data will be described.

In the presented thesis, the most advanced photon reconstruction technique of ground-based γ-ray astronomy is adapted to the H.E.S.S. 28 m telescope. The method is based on a semi-analytical model of electromagnetic particle showers in the atmosphere. The properties of cosmic γ-rays are reconstructed by comparing the camera image of the telescope with the Cherenkov emission that is expected from the shower model. To suppress the dominant background from charged cosmic rays, events are selected based on several criteria. The performance of the analysis is evaluated with simulated events. The method is then applied to two sources that are known to emit γ-rays. The first of these is the Crab Nebula, the standard candle of ground-based γ-ray astronomy. The results of this source confirm the expected performance of the reconstruction method, where the much lower energy threshold compared to H.E.S.S. I is of particular importance. A second analysis is performed on the region around the Galactic Centre. The analysis results emphasise the capabilities of the new telescope to measure γ-rays in an energy range that is interesting for both theoretical and experimental astrophysics. The presented analysis features the lowest energy threshold that has ever been reached in ground-based γ-ray astronomy, opening a new window to the precise measurement of the physical properties of time-variable sources at energies of several tens of GeV.

Water management and environmental protection is vulnerable to extreme low flows during streamflow droughts. During the last decades, in most rivers of Central Europe summer runoff and low flows have decreased. Discharge projections agree that future decrease in runoff is likely for catchments in Brandenburg, Germany. Depending on the first-order controls on low flows, different adaption measures are expected to be appropriate. Small catchments were analyzed because they are expected to be more vulnerable to a changing climate than larger rivers. They are mainly headwater catchments with smaller ground water storage. Local characteristics are more important at this scale and can increase vulnerability. This thesis mutually evaluates potential adaption measures to sustain minimum runoff in small catchments of Brandenburg, Germany, and similarities of these catchments regarding low flows. The following guiding questions are addressed: (i) Which first-order controls on low flows and related time scales exist? (ii) Which are the differences between small catchments regarding low flow vulnerability? (iii) Which adaption measures to sustain minimum runoff in small catchments of Brandenburg are appropriate considering regional low flow patterns? Potential adaption measures to sustain minimum runoff during periods of low flows can be classified into three categories: (i) increase of groundwater recharge and subsequent baseflow by land use change, land management and artificial ground water recharge, (ii) increase of water storage with regulated outflow by reservoirs, lakes and wetland water management and (iii) regional low flow patterns have to be considered during planning of measures with multiple purposes (urban water management, waste water recycling and inter-basin water transfer). The question remained whether water management of areas with shallow groundwater tables can efficiently sustain minimum runoff. Exemplary, water management scenarios of a ditch irrigated area were evaluated using the model Hydrus-2D. Increasing antecedent water levels and stopping ditch irrigation during periods of low flows increased fluxes from the pasture to the stream, but storage was depleted faster during the summer months due to higher evapotranspiration. Fluxes from this approx. 1 km long pasture with an area of approx. 13 ha ranged from 0.3 to 0.7 l\s depending on scenario. This demonstrates that numerous of such small decentralized measures are necessary to sustain minimum runoff in meso-scale catchments. Differences in the low flow risk of catchments and meteorological low flow predictors were analyzed. A principal component analysis was applied on daily discharge of 37 catchments between 1991 and 2006. Flows decreased more in Southeast Brandenburg according to meteorological forcing. Low flow risk was highest in a region east of Berlin because of intersection of a more continental climate and the specific geohydrology. In these catchments, flows decreased faster during summer and the low flow period was prolonged. A non-linear support vector machine regression was applied to iteratively select meteorological predictors for annual 30-day minimum runoff in 16 catchments between 1965 and 2006. The potential evapotranspiration sum of the previous 48 months was the most important predictor (r²=0.28). The potential evapotranspiration of the previous 3 months and the precipitation of the previous 3 months and last year increased model performance (r²=0.49, including all four predictors). Model performance was higher for catchments with low yield and more damped runoff. In catchments with high low flow risk, explanatory power of long term potential evapotranspiration was high. Catchments with a high low flow risk as well as catchments with a considerable decrease in flows in southeast Brandenburg have the highest demand for adaption. Measures increasing groundwater recharge are to be preferred. Catchments with high low flow risk showed relatively deep and decreasing groundwater heads allowing increased groundwater recharge at recharge areas with higher altitude away from the streams. Low flows are expected to stay low or decrease even further because long term potential evapotranspiration was the most important low flow predictor and is projected to increase during climate change. Differences in low flow risk and runoff dynamics between catchments have to be considered for management and planning of measures which do not only have the task to sustain minimum runoff.

One of the most exciting predictions of Einstein's theory of gravitation that have not yet been proven experimentally by a direct detection are gravitational waves. These are tiny distortions of the spacetime itself, and a world-wide effort to directly measure them for the first time with a network of large-scale laser interferometers is currently ongoing and expected to provide positive results within this decade. One potential source of measurable gravitational waves is the inspiral and merger of two compact objects, such as binary black holes. Successfully finding their signature in the noise-dominated data of the detectors crucially relies on accurate predictions of what we are looking for. In this thesis, we present a detailed study of how the most complete waveform templates can be constructed by combining the results from (A) analytical expansions within the post-Newtonian framework and (B) numerical simulations of the full relativistic dynamics. We analyze various strategies to construct complete hybrid waveforms that consist of a post-Newtonian inspiral part matched to numerical-relativity data. We elaborate on exsisting approaches for nonspinning systems by extending the accessible parameter space and introducing an alternative scheme based in the Fourier domain. Our methods can now be readily applied to multiple spherical-harmonic modes and precessing systems. In addition to that, we analyze in detail the accuracy of hybrid waveforms with the goal to quantify how numerous sources of error in the approximation techniques affect the application of such templates in real gravitational-wave searches. This is of major importance for the future construction of improved models, but also for the correct interpretation of gravitational-wave observations that are made utilizing any complete waveform family. In particular, we comprehensively discuss how long the numerical-relativity contribution to the signal has to be in order to make the resulting hybrids accurate enough, and for currently feasible simulation lengths we assess the physics one can potentially do with template-based searches.

In the present work synchronization phenomena in complex dynamical systems exhibiting multiple time scales have been analyzed. Multiple time scales can be active in different manners. Three different systems have been analyzed with different methods from data analysis. The first system studied is a large heterogenous network of bursting neurons, that is a system with two predominant time scales, the fast firing of action potentials (spikes) and the burst of repetitive spikes followed by a quiescent phase. This system has been integrated numerically and analyzed with methods based on recurrence in phase space. An interesting result are the different transitions to synchrony found in the two distinct time scales. Moreover, an anomalous synchronization effect can be observed in the fast time scale, i.e. there is range of the coupling strength where desynchronization occurs. The second system analyzed, numerically as well as experimentally, is a pair of coupled CO₂ lasers in a chaotic bursting regime. This system is interesting due to its similarity with epidemic models. We explain the bursts by different time scales generated from unstable periodic orbits embedded in the chaotic attractor and perform a synchronization analysis of these different orbits utilizing the continuous wavelet transform. We find a diverse route to synchrony of these different observed time scales. The last system studied is a small network motif of limit cycle oscillators. Precisely, we have studied a hub motif, which serves as elementary building block for scale-free networks, a type of network found in many real world applications. These hubs are of special importance for communication and information transfer in complex networks. Here, a detailed study on the mechanism of synchronization in oscillatory networks with a broad frequency distribution has been carried out. In particular, we find a remote synchronization of nodes in the network which are not directly coupled. We also explain the responsible mechanism and its limitations and constraints. Further we derive an analytic expression for it and show that information transmission in pure phase oscillators, such as the Kuramoto type, is limited. In addition to the numerical and analytic analysis an experiment consisting of electrical circuits has been designed. The obtained results confirm the former findings.

Phase Space Reconstruction is a method that allows to reconstruct the phase space of a system using only an one dimensional time series as input. It can be used for calculating Lyapunov-exponents and detecting chaos. It helps to understand complex dynamics and their behavior. And it can reproduce datasets which were not measured. There are many different methods which produce correct reconstructions such as time-delay, Hilbert-transformation, derivation and integration. The most used one is time-delay but all methods have special properties which are useful in different situations. Hence, every reconstruction method has some situations where it is the best choice. Looking at all these different methods the questions are: Why can all these different looking methods be used for the same purpose? Is there any connection between all these functions? The answer is found in the frequency domain : Performing a Fourier transformation all these methods getting a similar shape: Every presented reconstruction method can be described as a multiplication in the frequency domain with a frequency-depending reconstruction function. This structure is also known as a filter. From this point of view every reconstructed dimension can be seen as a filtered version of the measured time series. It contains the original data but applies just a new focus: Some parts are amplified and other parts are reduced. Furthermore I show, that not every function can be used for reconstruction. In the thesis three characteristics are identified, which are mandatory for the reconstruction function. Under consideration of these restrictions one gets a whole bunch of new reconstruction functions. So it is possible to reduce noise within the reconstruction process itself or to use some advantages of already known reconstructions methods while suppressing unwanted characteristics of it.

Complex network theory provides an elegant and powerful framework to statistically investigate the topology of local and long range dynamical interrelationships, i.e., teleconnections, in the climate system. Employing a refined methodology relying on linear and nonlinear measures of time series analysis, the intricate correlation structure within a multivariate climatological data set is cast into network form. Within this graph theoretical framework, vertices are identified with grid points taken from the data set representing a region on the the Earth's surface, and edges correspond to strong statistical interrelationships between the dynamics on pairs of grid points. The resulting climate networks are neither perfectly regular nor completely random, but display the intriguing and nontrivial characteristics of complexity commonly found in real world networks such as the internet, citation and acquaintance networks, food webs and cortical networks in the mammalian brain. Among other interesting properties, climate networks exhibit the "small-world" effect and possess a broad degree distribution with dominating super-nodes as well as a pronounced community structure. We have performed an extensive and detailed graph theoretical analysis of climate networks on the global topological scale focussing on the flow and centrality measure betweenness which is locally defined at each vertex, but includes global topological information by relying on the distribution of shortest paths between all pairs of vertices in the network. The betweenness centrality field reveals a rich internal structure in complex climate networks constructed from reanalysis and atmosphere-ocean coupled general circulation model (AOGCM) surface air temperature data. Our novel approach uncovers an elaborately woven meta-network of highly localized channels of strong dynamical information flow, that we relate to global surface ocean currents and dub the backbone of the climate network in analogy to the homonymous data highways of the internet. This finding points to a major role of the oceanic surface circulation in coupling and stabilizing the global temperature field in the long term mean (140 years for the model run and 60 years for reanalysis data). Carefully comparing the backbone structures detected in climate networks constructed using linear Pearson correlation and nonlinear mutual information, we argue that the high sensitivity of betweenness with respect to small changes in network structure may allow to detect the footprints of strongly nonlinear physical interactions in the climate system. The results presented in this thesis are thoroughly founded and substantiated using a hierarchy of statistical significance tests on the level of time series and networks, i.e., by tests based on time series surrogates as well as network surrogates. This is particularly relevant when working with real world data. Specifically, we developed new types of network surrogates to include the additional constraints imposed by the spatial embedding of vertices in a climate network. Our methodology is of potential interest for a broad audience within the physics community and various applied fields, because it is universal in the sense of being valid for any spatially extended dynamical system. It can help to understand the localized flow of dynamical information in any such system by combining multivariate time series analysis, a complex network approach and the information flow measure betweenness centrality. Possible fields of application include fluid dynamics (turbulence), plasma physics and biological physics (population models, neural networks, cell models). Furthermore, the climate network approach is equally relevant for experimental data as well as model simulations and hence introduces a novel perspective on model evaluation and data driven model building. Our work is timely in the context of the current debate on climate change within the scientific community, since it allows to assess from a new perspective the regional vulnerability and stability of the climate system while relying on global and not only on regional knowledge. The methodology developed in this thesis hence has the potential to substantially contribute to the understanding of the local effect of extreme events and tipping points in the earth system within a holistic global framework.

Data obtained from foreign data sources often come with only superficial structural information, such as relation names and attribute names. Other types of metadata that are important for effective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is difficult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with Spider an algorithm that efficiently finds all INDs in a given relational database. It leverages the sorting facilities of DBMS but performs the actual comparisons outside of the database to save computation. Spider analyzes very large databases up to an order of magnitude faster than previous approaches. We also evaluate in detail the effectiveness of several heuristics to reduce the number of necessary comparisons. Furthermore, we generalize Spider to find composite INDs covering multiple attributes, and partial INDs, which are true INDs for all but a certain number of values. This last type is particularly relevant when integrating dirty data as is often the case in the life sciences domain - our driving motivation.

The occurrence of earthquakes is characterized by a high degree of spatiotemporal complexity. Although numerous patterns, e.g. fore- and aftershock sequences, are well-known, the underlying mechanisms are not observable and thus not understood. Because the recurrence times of large earthquakes are usually decades or centuries, the number of such events in corresponding data sets is too small to draw conclusions with reasonable statistical significance. Therefore, the present study combines both, numerical modeling and analysis of real data in order to unveil the relationships between physical mechanisms and observational quantities. The key hypothesis is the validity of the so-called "critical point concept" for earthquakes, which assumes large earthquakes to occur as phase transitions in a spatially extended many-particle system, similar to percolation models. New concepts are developed to detect critical states in simulated and in natural data sets. The results indicate that important features of seismicity like the frequency-size distribution and the temporal clustering of earthquakes depend on frictional and structural fault parameters. In particular, the degree of quenched spatial disorder (the "roughness") of a fault zone determines whether large earthquakes occur quasiperiodically or more clustered. This illustrates the power of numerical models in order to identify regions in parameter space, which are relevant for natural seismicity. The critical point concept is verified for both, synthetic and natural seismicity, in terms of a critical state which precedes a large earthquake: a gradual roughening of the (unobservable) stress field leads to a scale-free (observable) frequency-size distribution. Furthermore, the growth of the spatial correlation length and the acceleration of the seismic energy release prior to large events is found. The predictive power of these precursors is, however, limited. Instead of forecasting time, location, and magnitude of individual events, a contribution to a broad multiparameter approach is encouraging.

Recurrence plots, a rather promising tool of data analysis, have been introduced by Eckman et al. in 1987. They visualise recurrences in phase space and give an overview about the system's dynamics. Two features have made the method rather popular. Firstly they are rather simple to compute and secondly they are putatively easy to interpret. However, the straightforward interpretation of recurrence plots for some systems yields rather surprising results. For example indications of low dimensional chaos have been reported for stock marked data, based on recurrence plots. In this work we exploit recurrences or ``naturally occurring analogues'' as they were termed by E. Lorenz, to obtain three key results. One of which is that the most striking structures which are found in recurrence plots are hinged to the correlation entropy and the correlation dimension of the underlying system. Even though an eventual embedding changes the structures in recurrence plots considerably these dynamical invariants can be estimated independently of the special parameters used for the computation. The second key result is that the attractor can be reconstructed from the recurrence plot. This means that it contains all topological information of the system under question in the limit of long time series. The graphical representation of the recurrences can also help to develop new algorithms and exploit specific structures. This feature has helped to obtain the third key result of this study. Based on recurrences to points which have the same ``recurrence structure'', it is possible to generate surrogates of the system which capture all relevant dynamical characteristics, such as entropies, dimensions and characteristic frequencies of the system. These so generated surrogates are shadowed by a trajectory of the system which starts at different initial conditions than the time series in question. They can be used then to test for complex synchronisation.

In a classical context, synchronization means adjustment of rhythms of self-sustained periodic oscillators due to their weak interaction. The history of synchronization goes back to the 17th century when the famous Dutch scientist Christiaan Huygens reported on his observation of synchronization of pendulum clocks: when two such clocks were put on a common support, their pendula moved in a perfect agreement. In rigorous terms, it means that due to coupling the clocks started to oscillate with identical frequencies and tightly related phases. Being, probably, the oldest scientifically studied nonlinear effect, synchronization was understood only in 1920-ies when E. V. Appleton and B. Van der Pol systematically - theoretically and experimentally - studied synchronization of triode generators. Since that the theory was well developed and found many applications. Nowadays it is well-known that certain systems, even rather simple ones, can exhibit chaotic behaviour. It means that their rhythms are irregular, and cannot be characterized only by one frequency. However, as is shown in the Habilitation work, one can extend the notion of phase for systems of this class as well and observe their synchronization, i.e., agreement of their (still irregular!) rhythms: due to very weak interaction there appear relations between the phases and average frequencies. This effect, called phase synchronization, was later confirmed in laboratory experiments of other scientific groups. Understanding of synchronization of irregular oscillators allowed us to address important problem of data analysis: how to reveal weak interaction between the systems if we cannot influence them, but can only passively observe, measuring some signals. This situation is very often encountered in biology, where synchronization phenomena appear on every level - from cells to macroscopic physiological systems; in normal states as well as in severe pathologies. With our methods we found that cardiovascular and respiratory systems in humans can adjust their rhythms; the strength of their interaction increases with maturation. Next, we used our algorithms to analyse brain activity of Parkinsonian patients. The results of this collaborative work with neuroscientists show that different brain areas synchronize just before the onset of pathological tremor. Morevoever, we succeeded in localization of brain areas responsible for tremor generation.