Refine
Document Type
- Doctoral Thesis (5)
- Article (4)
- Postprint (3)
- Review (1)
Language
- English (13) (remove)
Keywords
- classification (13) (remove)
Institute
Laser-induced breakdown spectroscopy (LIBS) analysers are becoming increasingly common for material classification purposes. However, to achieve good classification accuracy, mostly noncompact units are used based on their stability and reproducibility. In addition, computational algorithms that require significant hardware resources are commonly applied. For performing measurement campaigns in hard-to-access environments, such as mining sites, there is a need for compact, portable, or even handheld devices capable of reaching high measurement accuracy. The optics and hardware of small (i.e., handheld) devices are limited by space and power consumption and require a compromise of the achievable spectral quality. As long as the size of such a device is a major constraint, the software is the primary field for improvement. In this study, we propose a novel combination of handheld LIBS with non-negative tensor factorisation to investigate its classification capabilities of copper minerals. The proposed approach is based on the extraction of source spectra for each mineral (with the use of tensor methods) and their labelling based on the percentage contribution within the dataset. These latent spectra are then used in a regression model for validation purposes. The application of such an approach leads to an increase in the classification score by approximately 5% compared to that obtained using commonly used classifiers such as support vector machines, linear discriminant analysis, and the k-nearest neighbours algorithm.
The protein fractions of cocoa have been implicated influencing both the bioactive potential and sensory properties of cocoa and cocoa products. The objective of the present review is to show the impact of different stages of cultivation and processing with regard to the changes induced in the protein fractions. Special focus has been laid on the major seed storage proteins throughout the different stages of processing. The study starts with classical introduction of the extraction and the characterization methods used, while addressing classification approaches of cocoa proteins evolved during the timeline. The changes in protein composition during ripening and maturation of cocoa seeds, together with the possible modifications during the post-harvest processing (fermentation, drying, and roasting), have been documented. Finally, the bioactive potential arising directly or indirectly from cocoa proteins has been elucidated. The state of the art suggests that exploration of other potentially bioactive components in cocoa needs to be undertaken, while considering the complexity of reaction products occurring during the roasting phase of the post-harvest processing. Finally, the utilization of partially processed cocoa beans (e.g., fermented, conciliatory thermal treatment) can be recommended, providing a large reservoir of bioactive potentials arising from the protein components that could be instrumented in functionalizing foods.
Hydrometric networks play a vital role in providing information for decision-making in water resource management. They should be set up optimally to provide as much information as possible that is as accurate as possible and, at the same time, be cost-effective. Although the design of hydrometric networks is a well-identified problem in hydrometeorology and has received considerable attention, there is still scope for further advancement. In this study, we use complex network analysis, defined as a collection of nodes interconnected by links, to propose a new measure that identifies critical nodes of station networks. The approach can support the design and redesign of hydrometric station networks. The science of complex networks is a relatively young field and has gained significant momentum over the last few years in different areas such as brain networks, social networks, technological networks, or climate networks. The identification of influential nodes in complex networks is an important field of research. We propose a new node-ranking measure – the weighted degree–betweenness (WDB) measure – to evaluate the importance of nodes in a network. It is compared to previously proposed measures used on synthetic sample networks and then applied to a real-world rain gauge network comprising 1229 stations across Germany to demonstrate its applicability. The proposed measure is evaluated using the decline rate of the network efficiency and the kriging error. The results suggest that WDB effectively quantifies the importance of rain gauges, although the benefits of the method need to be investigated in more detail.
Hydrometric networks play a vital role in providing information for decision-making in water resource management. They should be set up optimally to provide as much information as possible that is as accurate as possible and, at the same time, be cost-effective. Although the design of hydrometric networks is a well-identified problem in hydrometeorology and has received considerable attention, there is still scope for further advancement. In this study, we use complex network analysis, defined as a collection of nodes interconnected by links, to propose a new measure that identifies critical nodes of station networks. The approach can support the design and redesign of hydrometric station networks. The science of complex networks is a relatively young field and has gained significant momentum over the last few years in different areas such as brain networks, social networks, technological networks, or climate networks. The identification of influential nodes in complex networks is an important field of research. We propose a new node-ranking measure – the weighted degree–betweenness (WDB) measure – to evaluate the importance of nodes in a network. It is compared to previously proposed measures used on synthetic sample networks and then applied to a real-world rain gauge network comprising 1229 stations across Germany to demonstrate its applicability. The proposed measure is evaluated using the decline rate of the network efficiency and the kriging error. The results suggest that WDB effectively quantifies the importance of rain gauges, although the benefits of the method need to be investigated in more detail.
The Himalayas are a region that is most dependent, but also frequently prone to hazards from changing meltwater resources. This mountain belt hosts the highest mountain peaks on earth, has the largest reserve of ice outside the polar regions, and is home to a rapidly growing population in recent decades. One source of hazard has attracted scientific research in particular in the past two decades: glacial lake outburst floods (GLOFs) occurred rarely, but mostly with fatal and catastrophic consequences for downstream communities and infrastructure. Such GLOFs can suddenly release several million cubic meters of water from naturally impounded meltwater lakes. Glacial lakes have grown in number and size by ongoing glacial mass losses in the Himalayas. Theory holds that enhanced meltwater production may increase GLOF frequency, but has never been tested so far. The key challenge to test this notion are the high altitudes of >4000 m, at which lakes occur, making field work impractical. Moreover, flood waves can attenuate rapidly in mountain channels downstream, so that many GLOFs have likely gone unnoticed in past decades. Our knowledge on GLOFs is hence likely biased towards larger, destructive cases, which challenges a detailed quantification of their frequency and their response to atmospheric warming. Robustly quantifying the magnitude and frequency of GLOFs is essential for risk assessment and management along mountain rivers, not least to implement their return periods in building design codes.
Motivated by this limited knowledge of GLOF frequency and hazard, I developed an algorithm that efficiently detects GLOFs from satellite images. In essence, this algorithm classifies land cover in 30 years (~1988–2017) of continuously recorded Landsat images over the Himalayas, and calculates likelihoods for rapidly shrinking water bodies in the stack of land cover images. I visually assessed such detected tell-tale sites for sediment fans in the river channel downstream, a second key diagnostic of GLOFs. Rigorous tests and validation with known cases from roughly 10% of the Himalayas suggested that this algorithm is robust against frequent image noise, and hence capable to identify previously unknown GLOFs. Extending the search radius to the entire Himalayan mountain range revealed some 22 newly detected GLOFs. I thus more than doubled the existing GLOF count from 16 previously known cases since 1988, and found a dominant cluster of GLOFs in the Central and Eastern Himalayas (Bhutan and Eastern Nepal), compared to the rarer affected ranges in the North. Yet, the total of 38 GLOFs showed no change in the annual frequency, so that the activity of GLOFs per unit glacial lake area has decreased in the past 30 years. I discussed possible drivers for this finding, but left a further attribution to distinct GLOF-triggering mechanisms open to future research.
This updated GLOF frequency was the key input for assessing GLOF hazard for the entire Himalayan mountain belt and several subregions. I used standard definitions in flood hydrology, describing hazard as the annual exceedance probability of a given flood peak discharge [m3 s-1] or larger at the breach location. I coupled the empirical frequency of GLOFs per region to simulations of physically plausible peak discharges from all existing ~5,000 lakes in the Himalayas. Using an extreme-value model, I could hence calculate flood return periods. I found that the contemporary 100-year GLOF discharge (the flood level that is reached or exceeded on average once in 100 years) is 20,600+2,200/–2,300 m3 s-1 for the entire Himalayas. Given the spatial and temporal distribution of historic GLOFs, contemporary GLOF hazard is highest in the Eastern Himalayas, and lower for regions with rarer GLOF abundance. I also calculated GLOF hazard for some 9,500 overdeepenings, which could expose and fill with water, if all Himalayan glaciers have melted eventually. Assuming that the current GLOF rate remains unchanged, the 100-year GLOF discharge could double (41,700+5,500/–4,700 m3 s-1), while the regional GLOF hazard may increase largest in the Karakoram.
To conclude, these three stages–from GLOF detection, to analysing their frequency and estimating regional GLOF hazard–provide a framework for modern GLOF hazard assessment. Given the rapidly growing population, infrastructure, and hydropower projects in the Himalayas, this thesis assists in quantifying the purely climate-driven contribution to hazard and risk from GLOFs.
Cocoa Bean Proteins
(2019)
The protein fractions of cocoa have been implicated influencing both the bioactive potential and sensory properties of cocoa and cocoa products. The objective of the present review is to show the impact of different stages of cultivation and processing with regard to the changes induced in the protein fractions. Special focus has been laid on the major seed storage proteins throughout the different stages of processing. The study starts with classical introduction of the extraction and the characterization methods used, while addressing classification approaches of cocoa proteins evolved during the timeline. The changes in protein composition during ripening and maturation of cocoa seeds, together with the possible modifications during the post-harvest processing (fermentation, drying, and roasting), have been documented. Finally, the bioactive potential arising directly or indirectly from cocoa proteins has been elucidated. The “state of the art” suggests that exploration of other potentially bioactive components in cocoa needs to be undertaken, while considering the complexity of reaction products occurring during the roasting phase of the post-harvest processing. Finally, the utilization of partially processed cocoa beans (e.g., fermented, conciliatory thermal treatment) can be recommended, providing a large reservoir of bioactive potentials arising from the protein components that could be instrumented in functionalizing foods.
Cocoa Bean Proteins
(2019)
The protein fractions of cocoa have been implicated influencing both the bioactive potential and sensory properties of cocoa and cocoa products. The objective of the present review is to show the impact of different stages of cultivation and processing with regard to the changes induced in the protein fractions. Special focus has been laid on the major seed storage proteins throughout the different stages of processing. The study starts with classical introduction of the extraction and the characterization methods used, while addressing classification approaches of cocoa proteins evolved during the timeline. The changes in protein composition during ripening and maturation of cocoa seeds, together with the possible modifications during the post-harvest processing (fermentation, drying, and roasting), have been documented. Finally, the bioactive potential arising directly or indirectly from cocoa proteins has been elucidated. The “state of the art” suggests that exploration of other potentially bioactive components in cocoa needs to be undertaken, while considering the complexity of reaction products occurring during the roasting phase of the post-harvest processing. Finally, the utilization of partially processed cocoa beans (e.g., fermented, conciliatory thermal treatment) can be recommended, providing a large reservoir of bioactive potentials arising from the protein components that could be instrumented in functionalizing foods.
Remote sensing technology, such as airborne, mobile, or terrestrial laser scanning, and photogrammetric techniques, are fundamental approaches for efficient, automatic creation of digital representations of spatial environments. For example, they allow us to generate 3D point clouds of landscapes, cities, infrastructure networks, and sites. As essential and universal category of geodata, 3D point clouds are used and processed by a growing number of applications, services, and systems such as in the domains of urban planning, landscape architecture, environmental monitoring, disaster management, virtual geographic environments as well as for spatial analysis and simulation.
While the acquisition processes for 3D point clouds become more and more reliable and widely-used, applications and systems are faced with more and more 3D point cloud data. In addition, 3D point clouds, by their very nature, are raw data, i.e., they do not contain any structural or semantics information. Many processing strategies common to GIS such as deriving polygon-based 3D models generally do not scale for billions of points. GIS typically reduce data density and precision of 3D point clouds to cope with the sheer amount of data, but that results in a significant loss of valuable information at the same time.
This thesis proposes concepts and techniques designed to efficiently store and process massive 3D point clouds. To this end, object-class segmentation approaches are presented to attribute semantics to 3D point clouds, used, for example, to identify building, vegetation, and ground structures and, thus, to enable processing, analyzing, and visualizing 3D point clouds in a more effective and efficient way. Similarly, change detection and updating strategies for 3D point clouds are introduced that allow for reducing storage requirements and incrementally updating 3D point cloud databases. In addition, this thesis presents out-of-core, real-time rendering techniques used to interactively explore 3D point clouds and related analysis results. All techniques have been implemented based on specialized spatial data structures, out-of-core algorithms, and GPU-based processing schemas to cope with massive 3D point clouds having billions of points.
All proposed techniques have been evaluated and demonstrated their applicability to the field of geospatial applications and systems, in particular for tasks such as classification, processing, and visualization. Case studies for 3D point clouds of entire cities with up to 80 billion points show that the presented approaches open up new ways to manage and apply large-scale, dense, and time-variant 3D point clouds as required by a rapidly growing number of applications and systems.
To understand past flood changes in the Rhine catchment and in particular the role of anthropogenic climate change in extreme flows, an attribution study relying on a proper GCM (general circulation model) downscaling is needed. A downscaling based on conditioning a stochastic weather generator on weather patterns is a promising approach. This approach assumes a strong link between weather patterns and local climate, and sufficient GCM skill in reproducing weather pattern climatology. These presuppositions are unprecedentedly evaluated here using 111 years of daily climate data from 490 stations in the Rhine basin and comprehensively testing the number of classification parameters and GCM weather pattern characteristics. A classification based on a combination of mean sea level pressure, temperature, and humidity from the ERA20C reanalysis of atmospheric fields over central Europe with 40 weather types was found to be the most appropriate for stratifying six local climate variables. The corresponding skill is quite diverse though, ranging from good for radiation to poor for precipitation. Especially for the latter it was apparent that pressure fields alone cannot sufficiently stratify local variability. To test the skill of the latest generation of GCMs from the CMIP5 ensemble in reproducing the frequency, seasonality, and persistence of the derived weather patterns, output from 15 GCMs is evaluated. Most GCMs are able to capture these characteristics well, but some models showed consistent deviations in all three evaluation criteria and should be excluded from further attribution analysis.
In recent years, the ever-growing amount of documents on the Web as well as in closed systems for private or business contexts led to a considerable increase of valuable textual information about topics, events, and entities. It is a truism that the majority of information (i.e., business-relevant data) is only available in unstructured textual form. The text mining research field comprises various practice areas that have the common goal of harvesting high-quality information from textual data. These information help addressing users' information needs.
In this thesis, we utilize the knowledge represented in user-generated content (UGC) originating from various social media services to improve text mining results. These social media platforms provide a plethora of information with varying focuses. In many cases, an essential feature of such platforms is to share relevant content with a peer group. Thus, the data exchanged in these communities tend to be focused on the interests of the user base. The popularity of social media services is growing continuously and the inherent knowledge is available to be utilized. We show that this knowledge can be used for three different tasks.
Initially, we demonstrate that when searching persons with ambiguous names, the information from Wikipedia can be bootstrapped to group web search results according to the individuals occurring in the documents. We introduce two models and different means to handle persons missing in the UGC source. We show that the proposed approaches outperform traditional algorithms for search result clustering. Secondly, we discuss how the categorization of texts according to continuously changing community-generated folksonomies helps users to identify new information related to their interests. We specifically target temporal changes in the UGC and show how they influence the quality of different tag recommendation approaches. Finally, we introduce an algorithm to attempt the entity linking problem, a necessity for harvesting entity knowledge from large text collections. The goal is the linkage of mentions within the documents with their real-world entities. A major focus lies on the efficient derivation of coherent links.
For each of the contributions, we provide a wide range of experiments on various text corpora as well as different sources of UGC.
The evaluation shows the added value that the usage of these sources provides and confirms the appropriateness of leveraging user-generated content to serve different information needs.
The genus Vanda and its affiliated taxa are a diverse group of horticulturally important species of orchids occurring mainly in South-East Asia, for which generic limits are poorly defined. Here, we present a molecular study using sequence data from three plastid DNA regions. It is shown that Vanda s.l. forms a clade containing approximately 73 species, including the previously accepted genera Ascocentrum, Euanthe, Christensonia, Neofinetia and Trudelia, and the species Aerides flabellata. Resolution of the phylogenetic relationships of species in Vanda s.l. is relatively poor, but existing morphological classifications for Vanda are incongruent with the results produced. Some novel species relationships are revealed, and a new morphological sectional classification is proposed based on support for these groupings and corresponding morphological characters shared by taxa and their geographical distributions. The putative occurrence of multiple pollination syndromes in this group of taxa, combined with complex biogeographical history of the South-East Asian region, is discussed in the context of these results.(c) 2013 The Linnean Society of London, Botanical Journal of the Linnean Society, 2013, 173, 549-572.
Planetary research is often user-based and requires considerable skill, time, and effort. Unfortunately, self-defined boundary conditions, definitions, and rules are often not documented or not easy to comprehend due to the complexity of research. This makes a comparison to other studies, or an extension of the already existing research, complicated. Comparisons are often distorted, because results rely on different, not well defined, or even unknown boundary conditions. The purpose of this research is to develop a standardized analysis method for planetary surfaces, which is adaptable to several research topics. The method provides a consistent quality of results. This also includes achieving reliable and comparable results and reducing the time and effort of conducting such studies. A standardized analysis method is provided by automated analysis tools that focus on statistical parameters. Specific key parameters and boundary conditions are defined for the tool application. The analysis relies on a database in which all key parameters are stored. These databases can be easily updated and adapted to various research questions. This increases the flexibility, reproducibility, and comparability of the research. However, the quality of the database and reliability of definitions directly influence the results. To ensure a high quality of results, the rules and definitions need to be well defined and based on previously conducted case studies. The tools then produce parameters, which are obtained by defined geostatistical techniques (measurements, calculations, classifications). The idea of an automated statistical analysis is tested to proof benefits but also potential problems of this method. In this study, I adapt automated tools for floor-fractured craters (FFCs) on Mars. These impact craters show a variety of surface features, occurring in different Martian environments, and having different fracturing origins. They provide a complex morphological and geological field of application. 433 FFCs are classified by the analysis tools due to their fracturing process. Spatial data, environmental context, and crater interior data are analyzed to distinguish between the processes involved in floor fracturing. Related geologic processes, such as glacial and fluvial activity, are too similar to be separately classified by the automated tools. Glacial and fluvial fracturing processes are merged together for the classification. The automated tools provide probability values for each origin model. To guarantee the quality and reliability of the results, classification tools need to achieve an origin probability above 50 %. This analysis method shows that 15 % of the FFCs are fractured by intrusive volcanism, 20 % by tectonic activity, and 43 % by water & ice related processes. In total, 75 % of the FFCs are classified to an origin type. This can be explained by a combination of origin models, superposition or erosion of key parameters, or an unknown fracturing model. Those features have to be manually analyzed in detail. Another possibility would be the improvement of key parameters and rules for the classification. This research shows that it is possible to conduct an automated statistical analysis of morphologic and geologic features based on analysis tools. Analysis tools provide additional information to the user and are therefore considered assistance systems.
Merapi volcano is one of the most active and dangerous volcanoes of the earth. Located in central part of Java island (Indonesia), even a moderate eruption of Merapi poses a high risk to the highly populated area. Due to the close relationship between the volcanic unrest and the occurrence of seismic events at Mt. Merapi, the monitoring of Merapi's seismicity plays an important role for recognizing major changes in the volcanic activity. An automatic seismic event detection and classification system, which is capable to characterize the actual seismic activity in near real-time, is an important tool which allows the scientists in charge to take immediate decisions during a volcanic crisis. In order to accomplish the task of detecting and classifying volcano-seismic signals automatically in the continuous data streams, a pattern recognition approach has been used. It is based on the method of hidden Markov models (HMM), a technique, which has proven to provide high recognition rates at high confidence levels in classification tasks of similar complexity (e.g. speech recognition). Any pattern recognition system relies on the appropriate representation of the input data in order to allow a reasonable class-decision by means of a mathematical test function. Based on the experiences from seismological observatory practice, a parametrization scheme of the seismic waveform data is derived using robust seismological analysis techniques. The wavefield parameters are summarized into a real-valued feature vector per time step. The time series of this feature vector build the basis for the HMM-based classification system. In order to make use of discrete hidden Markov (DHMM) techniques, the feature vectors are further processed by applying a de-correlating and prewhitening transformation and additional vector quantization. The seismic wavefield is finally represented as a discrete symbol sequence with a finite alphabet. This sequence is subject to a maximum likelihood test against the discrete hidden Markov models, learned from a representative set of training sequences for each seismic event type of interest. A time period from July, 1st to July, 5th, 1998 of rapidly increasing seismic activity prior to the eruptive cycle between July, 10th and July, 19th, 1998 at Merapi volcano is selected for evaluating the performance of this classification approach. Three distinct types of seismic events according to the established classification scheme of the Volcanological Survey of Indonesia (VSI) have been observed during this time period. Shallow volcano-tectonic events VTB (h < 2.5 km), very shallow dome-growth related seismic events MP (h < 1 km) and seismic signals connected to rockfall activity originating from the active lava dome, termed Guguran. The special configuration of the digital seismic station network at Merapi volcano, a combination of small-aperture array deployments surrounding Merapi's summit region, allows the use of array methods to parametrize the continuously recorded seismic wavefield. The individual signal parameters are analyzed to determine their relevance for the discrimination of seismic event classes. For each of the three observed event types a set of DHMMs has been trained using a selected set of seismic events with varying signal to noise ratios and signal durations. Additionally, two sets of discrete hidden Markov models have been derived for the seismic noise, incorporating the fact, that the wavefield properties of the ambient vibrations differ considerably during working hours and night time. A total recognition accuracy of 67% is obtained. The mean false alarm (FA) rate can be given by 41 FA/class/day. However, variations in the recognition capabilities for the individual seismic event classes are significant. Shallow volcano-tectonic signals (VTB) show very distinct wavefield properties and (at least in the selected time period) a stable time pattern of wavefield attributes. The DHMM-based classification performs therefore best for VTB-type events, with almost 89% recognition accuracy and 2 FA/day. Seismic signals of the MP- and Guguran-classes are more difficult to detect and classify. Around 64% of MP-events and 74% of Guguran signals are recognized correctly. The average false alarm rate for MP-events is 87 FA/day, whereas for Guguran signals 33 FA/day are obtained. However, the majority of missed events and false alarms for both MP and Guguran events are due to confusion errors between these two event classes in the recognition process. The confusion of MP and Guguran events is interpreted as being a consequence of the selected parametrization approach for the continuous seismic data streams. The observed patterns of the analyzed wavefield attributes for MP and Guguran events show a significant amount of similarity, thus providing not sufficient discriminative information for the numerical classification. The similarity of wavefield parameters obtained for seismic events of MP and Guguran type reflect the commonly observed dominance of path effects on the seismic wave propagation in volcanic environments. The recognition rates obtained for the five-day period of increasing seismicity show, that the presented DHMM-based automatic classification system is a promising approach for the difficult task of classifying volcano-seismic signals. Compared to standard signal detection algorithms, the most significant advantage of the discussed technique is, that the entire seismogram is detected and classified in a single step.