Filtern
Dokumenttyp
Sprache
- Englisch (2) (entfernen)
Gehört zur Bibliographie
- ja (2) (entfernen)
Schlagworte
- data synthesis (2) (entfernen)
Hundreds of experiments have now manipulated species richness (SR) of various groups of organisms and examined how this aspect of biological diversity influences ecosystem functioning. Ecologists have recently expanded this field to look at whether phylogenetic diversity (PD) among species, often quantified as the sum of branch lengths on a molecular phylogeny leading to all species in a community, also predicts ecological function. Some have hypothesized that phylogenetic divergence should be a superior predictor of ecological function than SR because evolutionary relatedness represents the degree of ecological and functional differentiation among species. But studies to date have provided mixed support for this hypothesis. Here, we reanalyse data from 16 experiments that have manipulated plant SR in grassland ecosystems and examined the impact on above-ground biomass production over multiple time points. Using a new molecular phylogeny of the plant species used in these experiments, we quantified how the PD of plants impacts average community biomass production as well as the stability of community biomass production through time. Using four complementary analyses, we show that, after statistically controlling for variation in SR, PD (the sum of branches in a molecular phylogenetic tree connecting all species in a community) is neither related to mean community biomass nor to the temporal stability of biomass. These results run counter to past claims. However, after controlling for SR, PD was positively related to variation in community biomass over time due to an increase in the variances of individual species, but this relationship was not strong enough to influence community stability. In contrast to the non-significant relationships between PD, biomass and stability, our analyses show that SR per se tends to increase the mean biomass production of plant communities, after controlling for PD. The relationship between SR and temporal variation in community biomass was either positive, non-significant or negative depending on which analysis was used. However, the increases in community biomass with SR, independently of PD, always led to increased stability. These results suggest that PD is no better as a predictor of ecosystem functioning than SR.Synthesis. Our study on grasslands offers a cautionary tale when trying to relate PD to ecosystem functioning suggesting that there may be ecologically important trait and functional variation among species that is not explained by phylogenetic relatedness. Our results fail to support the hypothesis that the conservation of evolutionarily distinct species would be more effective than the conservation of SR as a way to maintain productive and stable communities under changing environmental conditions.
Text is a ubiquitous entity in our world and daily life. We encounter it nearly everywhere in shops, on the street, or in our flats. Nowadays, more and more text is contained in digital images. These images are either taken using cameras, e.g., smartphone cameras, or taken using scanning devices such as document scanners. The sheer amount of available data, e.g., millions of images taken by Google Streetview, prohibits manual analysis and metadata extraction. Although much progress was made in the area of optical character recognition (OCR) for printed text in documents, broad areas of OCR are still not fully explored and hold many research challenges. With the mainstream usage of machine learning and especially deep learning, one of the most pressing problems is the availability and acquisition of annotated ground truth for the training of machine learning models because obtaining annotated training data using manual annotation mechanisms is time-consuming and costly. In this thesis, we address of how we can reduce the costs of acquiring ground truth annotations for the application of state-of-the-art machine learning methods to optical character recognition pipelines. To this end, we investigate how we can reduce the annotation cost by using only a fraction of the typically required ground truth annotations, e.g., for scene text recognition systems. We also investigate how we can use synthetic data to reduce the need of manual annotation work, e.g., in the area of document analysis for archival material. In the area of scene text recognition, we have developed a novel end-to-end scene text recognition system that can be trained using inexact supervision and shows competitive/state-of-the-art performance on standard benchmark datasets for scene text recognition. Our method consists of two independent neural networks, combined using spatial transformer networks. Both networks learn together to perform text localization and text recognition at the same time while only using annotations for the recognition task. We apply our model to end-to-end scene text recognition (meaning localization and recognition of words) and pure scene text recognition without any changes in the network architecture.
In the second part of this thesis, we introduce novel approaches for using and generating synthetic data to analyze handwriting in archival data. First, we propose a novel preprocessing method to determine whether a given document page contains any handwriting. We propose a novel data synthesis strategy to train a classification model and show that our data synthesis strategy is viable by evaluating the trained model on real images from an archive. Second, we introduce the new analysis task of handwriting classification. Handwriting classification entails classifying a given handwritten word image into classes such as date, word, or number. Such an analysis step allows us to select the best fitting recognition model for subsequent text recognition; it also allows us to reason about the semantic content of a given document page without the need for fine-grained text recognition and further analysis steps, such as Named Entity Recognition. We show that our proposed approaches work well when trained on synthetic data. Further, we propose a flexible metric learning approach to allow zero-shot classification of classes unseen during the network’s training. Last, we propose a novel data synthesis algorithm to train off-the-shelf pixel-wise semantic segmentation networks for documents. Our data synthesis pipeline is based on the famous Style-GAN architecture and can synthesize realistic document images with their corresponding segmentation annotation without the need for any annotated data!