Refine
Year of publication
Document Type
- Article (19)
- Postprint (10)
- Doctoral Thesis (8)
- Conference Proceeding (1)
- Review (1)
Is part of the Bibliography
- yes (39) (remove)
Keywords
- prediction (39) (remove)
Institute
- Department Psychologie (5)
- Institut für Biochemie und Biologie (5)
- Institut für Physik und Astronomie (4)
- Institut für Umweltwissenschaften und Geographie (3)
- Mathematisch-Naturwissenschaftliche Fakultät (3)
- Department Linguistik (2)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (2)
- Hochschulambulanz (2)
- Institut für Ernährungswissenschaft (2)
- Institut für Mathematik (2)
This Thesis was devoted to the study of the coupled system composed by El Niño/Southern Oscillation and the Annual Cycle. More precisely, the work was focused on two main problems: 1. How to separate both oscillations into an affordable model for understanding the behaviour of the whole system. 2. How to model the system in order to achieve a better understanding of the interaction, as well as to predict future states of the system. We focused our efforts in the Sea Surface Temperature equations, considering that atmospheric effects were secondary to the ocean dynamics. The results found may be summarised as follows: 1. Linear methods are not suitable for characterising the dimensionality of the sea surface temperature in the tropical Pacific Ocean. Therefore they do not help to separate the oscillations by themselves. Instead, nonlinear methods of dimensionality reduction are proven to be better in defining a lower limit for the dimensionality of the system as well as in explaining the statistical results in a more physical way [1]. In particular, Isomap, a nonlinear modification of Multidimensional Scaling methods, provides a physically appealing method of decomposing the data, as it substitutes the euclidean distances in the manifold by an approximation of the geodesic distances. We expect that this method could be successfully applied to other oscillatory extended systems and, in particular, to meteorological systems. 2. A three dimensional dynamical system could be modeled, using a backfitting algorithm, for describing the dynamics of the sea surface temperature in the tropical Pacific Ocean. We observed that, although there were few data points available, we could predict future behaviours of the coupled ENSO-Annual Cycle system with an accuracy of less than six months, although the constructed system presented several drawbacks: few data points to input in the backfitting algorithm, untrained model, lack of forcing with external data and simplification using a close system. Anyway, ensemble prediction techniques showed that the prediction skills of the three dimensional time series were as good as those found in much more complex models. This suggests that the climatological system in the tropics is mainly explained by ocean dynamics, while the atmosphere plays a secondary role in the physics of the process. Relevant predictions for short lead times can be made using a low dimensional system, despite its simplicity. The analysis of the SST data suggests that nonlinear interaction between the oscillations is small, and that noise plays a secondary role in the fundamental dynamics of the oscillations [2]. A global view of the work shows a general procedure to face modeling of climatological systems. First, we should find a suitable method of either linear or nonlinear dimensionality reduction. Then, low dimensional time series could be extracted out of the method applied. Finally, a low dimensional model could be found using a backfitting algorithm in order to predict future states of the system.
A large number and wide variety of lake ecosystem models have been developed and published during the past four decades. We identify two challenges for making further progress in this field. One such challenge is to avoid developing more models largely following the concept of others ('reinventing the wheel'). The other challenge is to avoid focusing on only one type of model, while ignoring new and diverse approaches that have become available ('having tunnel vision'). In this paper, we aim at improving the awareness of existing models and knowledge of concurrent approaches in lake ecosystem modelling, without covering all possible model tools and avenues. First, we present a broad variety of modelling approaches. To illustrate these approaches, we give brief descriptions of rather arbitrarily selected sets of specific models. We deal with static models (steady state and regression models), complex dynamic models (CAEDYM, CE-QUAL-W2, Delft 3D-ECO, LakeMab, LakeWeb, MyLake, PCLake, PROTECH, SALMO), structurally dynamic models and minimal dynamic models. We also discuss a group of approaches that could all be classified as individual based: super-individual models (Piscator, Charisma), physiologically structured models, stage-structured models and traitbased models. We briefly mention genetic algorithms, neural networks, Kalman filters and fuzzy logic. Thereafter, we zoom in, as an in-depth example, on the multi-decadal development and application of the lake ecosystem model PCLake and related models (PCLake Metamodel, Lake Shira Model, IPH-TRIM3D-PCLake). In the discussion, we argue that while the historical development of each approach and model is understandable given its 'leading principle', there are many opportunities for combining approaches. We take the point of view that a single 'right' approach does not exist and should not be strived for. Instead, multiple modelling approaches, applied concurrently to a given problem, can help develop an integrative view on the functioning of lake ecosystems. We end with a set of specific recommendations that may be of help in the further development of lake ecosystem models.
A large number and wide variety of lake ecosystem models have been developed and published during the past four decades. We identify two challenges for making further progress in this field. One such challenge is to avoid developing more models largely following the concept of others ('reinventing the wheel'). The other challenge is to avoid focusing on only one type of model, while ignoring new and diverse approaches that have become available ('having tunnel vision'). In this paper, we aim at improving the awareness of existing models and knowledge of concurrent approaches in lake ecosystem modelling, without covering all possible model tools and avenues. First, we present a broad variety of modelling approaches. To illustrate these approaches, we give brief descriptions of rather arbitrarily selected sets of specific models. We deal with static models (steady state and regression models), complex dynamic models (CAEDYM, CE-QUAL-W2, Delft 3D-ECO, LakeMab, LakeWeb, MyLake, PCLake, PROTECH, SALMO), structurally dynamic models and minimal dynamic models. We also discuss a group of approaches that could all be classified as individual based: super-individual models (Piscator, Charisma), physiologically structured models, stage-structured models and traitbased models. We briefly mention genetic algorithms, neural networks, Kalman filters and fuzzy logic. Thereafter, we zoom in, as an in-depth example, on the multi-decadal development and application of the lake ecosystem model PCLake and related models (PCLake Metamodel, Lake Shira Model, IPH-TRIM3D-PCLake). In the discussion, we argue that while the historical development of each approach and model is understandable given its 'leading principle', there are many opportunities for combining approaches. We take the point of view that a single 'right' approach does not exist and should not be strived for. Instead, multiple modelling approaches, applied concurrently to a given problem, can help develop an integrative view on the functioning of lake ecosystems. We end with a set of specific recommendations that may be of help in the further development of lake ecosystem models.
SLocX predicting subcellular localization of Arabidopsis proteins leveraging gene expression data
(2011)
Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mito-chondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
Species respond to environmental change by dynamically adjusting their geographical ranges. Robust predictions of these changes are prerequisites to inform dynamic and sustainable conservation strategies. Correlative species distribution models (SDMs) relate species’ occurrence records to prevailing environmental factors to describe the environmental niche. They have been widely applied in global change context as they have comparably low data requirements and allow for rapid assessments of potential future species’ distributions. However, due to their static nature, transient responses to environmental change are essentially ignored in SDMs. Furthermore, neither dispersal nor demographic processes and biotic interactions are explicitly incorporated. Therefore, it has often been suggested to link statistical and mechanistic modelling approaches in order to make more realistic predictions of species’ distributions for scenarios of environmental change. In this thesis, I present two different ways of such linkage. (i) Mechanistic modelling can act as virtual playground for testing statistical models and allows extensive exploration of specific questions. I promote this ‘virtual ecologist’ approach as a powerful evaluation framework for testing sampling protocols, analyses and modelling tools. Also, I employ such an approach to systematically assess the effects of transient dynamics and ecological properties and processes on the prediction accuracy of SDMs for climate change projections. That way, relevant mechanisms are identified that shape the species’ response to altered environmental conditions and which should hence be considered when trying to project species’ distribution through time. (ii) I supplement SDM projections of potential future habitat for black grouse in Switzerland with an individual-based population model. By explicitly considering complex interactions between habitat availability and demographic processes, this allows for a more direct assessment of expected population response to environmental change and associated extinction risks. However, predictions were highly variable across simulations emphasising the need for principal evaluation tools like sensitivity analysis to assess uncertainty and robustness in dynamic range predictions. Furthermore, I identify data coverage of the environmental niche as a likely cause for contrasted range predictions between SDM algorithms. SDMs may fail to make reliable predictions for truncated and edge niches, meaning that portions of the niche are not represented in the data or niche edges coincide with data limits. Overall, my thesis contributes to an improved understanding of uncertainty factors in predictions of range dynamics and presents ways how to deal with these. Finally I provide preliminary guidelines for predictive modelling of dynamic species’ response to environmental change, identify key challenges for future research and discuss emerging developments.
Background
High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.
Methods
We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.
Results
We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.
Conclusions
We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.
Predicting the actions of other individuals is crucial for our daily interactions. Recent evidence suggests that the prediction of object-directed arm and full-body actions employs the dorsal premotor cortex (PMd). Thus, the neural substrate involved in action control may also be essential for action prediction. Here, we aimed to address this issue and hypothesized that disrupting the PMd impairs action prediction. Using fMRI-guided coil navigation, rTMS (five pulses, 10Hz) was applied over the left PMd and over the vertex (control region) while participants observed everyday actions in video clips that were transiently occluded for 1s. The participants detected manipulations in the time course of occluded actions, which required them to internally predict the actions during occlusion. To differentiate between functional roles that the PMd could play in prediction, rTMS was either delivered at occluder-onset (TMS-early), affecting the initiation of action prediction, or 300 ms later during occlusion(TMS-late), affecting the maintenance of anongoing prediction. TMS-early over the left PMd produced more prediction errors than TMS-early over the vertex. TMS-late had no effect on prediction performance, suggesting that the left PMd might be involved particularly during the initiation of internally guided action prediction but may play a subordinate role in maintaining ongoing prediction. These findings open a new perspective on the role of the left PMd in action prediction which is in line with its functions in action control and in cognitive tasks. In the discussion, there levance of the left PMd for integrating external action parameters with the observer's motor repertoire is emphasized. Overall, the results are in line with the notion that premotor functions are employed in both action control and action observation.
Aim Biotic interactions within guilds or across trophic levels have widely been ignored in species distribution models (SDMs). This synthesis outlines the development of species interaction distribution models (SIDMs), which aim to incorporate multispecies interactions at large spatial extents using interaction matrices. Location Local to global. Methods We review recent approaches for extending classical SDMs to incorporate biotic interactions, and identify some methodological and conceptual limitations. To illustrate possible directions for conceptual advancement we explore three principal ways of modelling multispecies interactions using interaction matrices: simple qualitative linkages between species, quantitative interaction coefficients reflecting interaction strengths, and interactions mediated by interaction currencies. We explain methodological advancements for static interaction data and multispecies time series, and outline methods to reduce complexity when modelling multispecies interactions. Results Classical SDMs ignore biotic interactions and recent SDM extensions only include the unidirectional influence of one or a few species. However, novel methods using error matrices in multivariate regression models allow interactions between multiple species to be modelled explicitly with spatial co-occurrence data. If time series are available, multivariate versions of population dynamic models can be applied that account for the effects and relative importance of species interactions and environmental drivers. These methods need to be extended by incorporating the non-stationarity in interaction coefficients across space and time, and are challenged by the limited empirical knowledge on spatio-temporal variation in the existence and strength of species interactions. Model complexity may be reduced by: (1) using prior ecological knowledge to set a subset of interaction coefficients to zero, (2) modelling guilds and functional groups rather than individual species, and (3) modelling interaction currencies and species effect and response traits. Main conclusions There is great potential for developing novel approaches that incorporate multispecies interactions into the projection of species distributions and community structure at large spatial extents. Progress can be made by: (1) developing statistical models with interaction matrices for multispecies co-occurrence datasets across large-scale environmental gradients, (2) testing the potential and limitations of methods for complexity reduction, and (3) sampling and monitoring comprehensive spatio-temporal data on biotic interactions in multispecies communities.
Learning a model for the relationship between the attributes and the annotated labels of data examples serves two purposes. Firstly, it enables the prediction of the label for examples without annotation. Secondly, the parameters of the model can provide useful insights into the structure of the data. If the data has an inherent partitioned structure, it is natural to mirror this structure in the model. Such mixture models predict by combining the individual predictions generated by the mixture components which correspond to the partitions in the data. Often the partitioned structure is latent, and has to be inferred when learning the mixture model. Directly evaluating the accuracy of the inferred partition structure is, in many cases, impossible because the ground truth cannot be obtained for comparison. However it can be assessed indirectly by measuring the prediction accuracy of the mixture model that arises from it. This thesis addresses the interplay between the improvement of predictive accuracy by uncovering latent cluster structure in data, and further addresses the validation of the estimated structure by measuring the accuracy of the resulting predictive model. In the application of filtering unsolicited emails, the emails in the training set are latently clustered into advertisement campaigns. Uncovering this latent structure allows filtering of future emails with very low false positive rates. In order to model the cluster structure, a Bayesian clustering model for dependent binary features is developed in this thesis. Knowing the clustering of emails into campaigns can also aid in uncovering which emails have been sent on behalf of the same network of captured hosts, so-called botnets. This association of emails to networks is another layer of latent clustering. Uncovering this latent structure allows service providers to further increase the accuracy of email filtering and to effectively defend against distributed denial-of-service attacks. To this end, a discriminative clustering model is derived in this thesis that is based on the graph of observed emails. The partitionings inferred using this model are evaluated through their capacity to predict the campaigns of new emails. Furthermore, when classifying the content of emails, statistical information about the sending server can be valuable. Learning a model that is able to make use of it requires training data that includes server statistics. In order to also use training data where the server statistics are missing, a model that is a mixture over potentially all substitutions thereof is developed. Another application is to predict the navigation behavior of the users of a website. Here, there is no a priori partitioning of the users into clusters, but to understand different usage scenarios and design different layouts for them, imposing a partitioning is necessary. The presented approach simultaneously optimizes the discriminative as well as the predictive power of the clusters. Each model is evaluated on real-world data and compared to baseline methods. The results show that explicitly modeling the assumptions about the latent cluster structure leads to improved predictions compared to the baselines. It is beneficial to incorporate a small number of hyperparameters that can be tuned to yield the best predictions in cases where the prediction accuracy can not be optimized directly.
Despite recent growth of research on the effects of prosocial media, processes underlying these effects are not well understood. Two studies explored theoretically relevant mediators and moderators of the effects of prosocial media on helping. Study 1 examined associations among prosocial- and violent-media use, empathy, and helping in samples from seven countries. Prosocial-media use was positively associated with helping. This effect was mediated by empathy and was similar across cultures. Study 2 explored longitudinal relations among prosocial-video-game use, violent-video-game use, empathy, and helping in a large sample of Singaporean children and adolescents measured three times across 2 years. Path analyses showed significant longitudinal effects of prosocial- and violent-video-game use on prosocial behavior through empathy. Latent-growth-curve modeling for the 2-year period revealed that change in video-game use significantly affected change in helping, and that this relationship was mediated by change in empathy.