Refine
Year of publication
Document Type
- Article (19)
- Postprint (10)
- Doctoral Thesis (8)
- Conference Proceeding (1)
- Review (1)
Is part of the Bibliography
- yes (39) (remove)
Keywords
- prediction (39) (remove)
Institute
- Department Psychologie (5)
- Institut für Biochemie und Biologie (5)
- Institut für Physik und Astronomie (4)
- Institut für Umweltwissenschaften und Geographie (3)
- Mathematisch-Naturwissenschaftliche Fakultät (3)
- Department Linguistik (2)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (2)
- Hochschulambulanz (2)
- Institut für Ernährungswissenschaft (2)
- Institut für Mathematik (2)
This dissertation investigates the working memory mechanism subserving human sentence processing and its relative contribution to processing difficulty as compared to syntactic prediction. Within the last decades, evidence for a content-addressable memory system underlying human cognition in general has accumulated (e.g., Anderson et al., 2004). In sentence processing research, it has been proposed that this general content-addressable architecture is also used for language processing (e.g., McElree, 2000).
Although there is a growing body of evidence from various kinds of linguistic dependencies that is consistent with a general content-addressable memory subserving sentence processing (e.g., McElree et al., 2003; VanDyke2006), the case of reflexive-antecedent dependencies has challenged this view. It has been proposed that in the processing of reflexive-antecedent dependencies, a syntactic-structure based memory access is used rather than cue-based retrieval within a content-addressable framework (e.g., Sturt, 2003).
Two eye-tracking experiments on Chinese reflexives were designed to tease apart accounts assuming a syntactic-structure based memory access mechanism from cue-based retrieval (implemented in ACT-R as proposed by Lewis and Vasishth (2005).
In both experiments, interference effects were observed from noun phrases which syntactically do not qualify as the reflexive's antecedent but match the animacy requirement the reflexive imposes on its antecedent. These results are interpreted as evidence against a purely syntactic-structure based memory access. However, the exact pattern of effects observed in the data is only partially compatible with the Lewis and Vasishth cue-based parsing model.
Therefore, an extension of the Lewis and Vasishth model is proposed. Two principles are added to the original model, namely 'cue confusion' and 'distractor prominence'.
Although interference effects are generally interpreted in favor of a content-addressable memory architecture, an alternative explanation for interference effects in reflexive processing has been proposed which, crucially, might reconcile interference effects with a structure-based account.
It has been argued that interference effects do not necessarily reflect cue-based retrieval interference in a content-addressable memory but might equally well be accounted for by interference effects which have already occurred at the moment of encoding the antecedent in memory (Dillon, 2011).
Three experiments (eye-tracking and self-paced reading) on German reflexives and Swedish possessives were designed to tease apart cue-based retrieval interference from encoding interference. The results of all three experiments suggest that there is no evidence that encoding interference affects the retrieval of a reflexive's antecedent.
Taken together, these findings suggest that the processing of reflexives can be explained with the same cue-based retrieval mechanism that has been invoked to explain syntactic dependency resolution in a range of other structures. This supports the view that the language processing system is located within a general cognitive architecture, with a general-purpose content-addressable working memory system operating on linguistic expressions.
Finally, two experiments (self-paced reading and eye-tracking) using Chinese relative clauses were conducted to determine the relative contribution to sentence processing difficulty of working-memory processes as compared to syntactic prediction during incremental parsing.
Chinese has the cross-linguistically rare property of being a language with subject-verb-object word order and pre-nominal relative clauses. This property leads to opposing predictions of expectation-based
accounts and memory-based accounts with respect to the relative processing difficulty of subject vs. object relatives.
Previous studies showed contradictory results, which has been attributed to different kinds local ambiguities confounding the materials (Lin and Bever, 2011). The two experiments presented are the first to compare Chinese relatives clauses in syntactically unambiguous contexts.
The results of both experiments were consistent with the predictions of the expectation-based account of sentence processing but not with the memory-based account. From these findings, I conclude that any theory of human sentence processing needs to take into account the power of predictive processes unfolding in the human mind.
Der Beitrag untersucht, ob und zu welchen Anteilen frühe sprachliche Kompetenzen numerische Kompetenzen vorhersagen. An 72 dreijährigen Kindern wurden numerische, verbal produktive und rezeptive sowie grammatische Leistungen zwei Mal im Abstand von drei Monaten erhoben. Mithilfe von Strukturgleichungsmodellen kann gezeigt werden, dass sprachliche und numerische Leistungen in diesem Alter noch wenig distinkt sind. Für die numerischen Kompetenzen findet sich bereits in diesem Alter eine hohe interindividuelle Entwicklungsstabilität. Ein bedeutsamer Einfluss sprachlicher Kompetenz auf den Zuwachs mathematischer Kompetenz im vierten Lebensjahr konnte nicht nachgewiesen werden. Wir diskutieren die Ergebnisse vor dem Hintergrund der aktuellen Thesen zum Zusammenhang von Sprache und Numerik in der Entwicklung.
Dynamic resource management is an essential requirement for private and public cloud computing environments. With dynamic resource management, the physical resources assignment to the cloud virtual resources depends on the actual need of the applications or the running services, which enhances the cloud physical resources utilization and reduces the offered services cost. In addition, the virtual resources can be moved across different physical resources in the cloud environment without an obvious impact on the running applications or services production. This means that the availability of the running services and applications in the cloud is independent on the hardware resources including the servers, switches and storage failures. This increases the reliability of using cloud services compared to the classical data-centers environments.
In this thesis we briefly discuss the dynamic resource management topic and then deeply focus on live migration as the definition of the compute resource dynamic management. Live migration is a commonly used and an essential feature in cloud and virtual data-centers environments. Cloud computing load balance, power saving and fault tolerance features are all dependent on live migration to optimize the virtual and physical resources usage. As we will discuss in this thesis, live migration shows many benefits to cloud and virtual data-centers environments, however the cost of live migration can not be ignored. Live migration cost includes the migration time, downtime, network overhead, power consumption increases and CPU overhead.
IT admins run virtual machines live migrations without an idea about the migration cost. So, resources bottlenecks, higher migration cost and migration failures might happen. The first problem that we discuss in this thesis is how to model the cost of the virtual machines live migration. Secondly, we investigate how to make use of machine learning techniques to help the cloud admins getting an estimation of this cost before initiating the migration for one of multiple virtual machines. Also, we discuss the optimal timing for a specific virtual machine before live migration to another server. Finally, we propose practical solutions that can be used by the cloud admins to be integrated with the cloud administration portals to answer the raised research questions above.
Our research methodology to achieve the project objectives is to propose empirical models based on using VMware test-beds with different benchmarks tools. Then we make use of the machine learning techniques to propose a prediction approach for virtual machines live migration cost. Timing optimization for live migration is also proposed in this thesis based on using the cost prediction and data-centers network utilization prediction. Live migration with persistent memory clusters is also discussed at the end of the thesis. The cost prediction and timing optimization techniques proposed in this thesis could be practically integrated with VMware vSphere cluster portal such that the IT admins can now use the cost prediction feature and timing optimization option before proceeding with a virtual machine live migration.
Testing results show that our proposed approach for VMs live migration cost prediction shows acceptable results with less than 20% prediction error and can be easily implemented and integrated with VMware vSphere as an example of a commonly used resource management portal for virtual data-centers and private cloud environments. The results show that using our proposed VMs migration timing optimization technique also could save up to 51% of migration time of the VMs migration time for memory intensive workloads and up to 27% of the migration time for network intensive workloads. This timing optimization technique can be useful for network admins to save migration time with utilizing higher network rate and higher probability of success.
At the end of this thesis, we discuss the persistent memory technology as a new trend in servers memory technology. Persistent memory modes of operation and configurations are discussed in detail to explain how live migration works between servers with different memory configuration set up. Then, we build a VMware cluster with persistent memory inside server and also with DRAM only servers to show the live migration cost difference between the VMs with DRAM only versus the VMs with persistent memory inside.
Aim Biotic interactions within guilds or across trophic levels have widely been ignored in species distribution models (SDMs). This synthesis outlines the development of species interaction distribution models (SIDMs), which aim to incorporate multispecies interactions at large spatial extents using interaction matrices. Location Local to global. Methods We review recent approaches for extending classical SDMs to incorporate biotic interactions, and identify some methodological and conceptual limitations. To illustrate possible directions for conceptual advancement we explore three principal ways of modelling multispecies interactions using interaction matrices: simple qualitative linkages between species, quantitative interaction coefficients reflecting interaction strengths, and interactions mediated by interaction currencies. We explain methodological advancements for static interaction data and multispecies time series, and outline methods to reduce complexity when modelling multispecies interactions. Results Classical SDMs ignore biotic interactions and recent SDM extensions only include the unidirectional influence of one or a few species. However, novel methods using error matrices in multivariate regression models allow interactions between multiple species to be modelled explicitly with spatial co-occurrence data. If time series are available, multivariate versions of population dynamic models can be applied that account for the effects and relative importance of species interactions and environmental drivers. These methods need to be extended by incorporating the non-stationarity in interaction coefficients across space and time, and are challenged by the limited empirical knowledge on spatio-temporal variation in the existence and strength of species interactions. Model complexity may be reduced by: (1) using prior ecological knowledge to set a subset of interaction coefficients to zero, (2) modelling guilds and functional groups rather than individual species, and (3) modelling interaction currencies and species effect and response traits. Main conclusions There is great potential for developing novel approaches that incorporate multispecies interactions into the projection of species distributions and community structure at large spatial extents. Progress can be made by: (1) developing statistical models with interaction matrices for multispecies co-occurrence datasets across large-scale environmental gradients, (2) testing the potential and limitations of methods for complexity reduction, and (3) sampling and monitoring comprehensive spatio-temporal data on biotic interactions in multispecies communities.
Many prediction tasks can be done based on users’ trace data. In this paper, we explored convergent thinking as a personality-related attribute and its relation to features gathered in interactive and non-interactive tasks of an online course. This is an under-utilized attribute that could be used for adapting online courses according to the creativity level to enhance the motivation of learners. Therefore, we used the logfile data of a 60 minutes Moodle course with N=128 learners, combined with the Remote Associates Test (RAT). We explored the trace data and found a weak correlation between interactive tasks and the RAT score, which was the highest considering the overall dataset. We trained a Random Forest Regressor to predict convergent thinking based on the trace data and analyzed the feature importance. The result has shown that the interactive tasks have the highest importance in prediction, but the accuracy is very low. We discuss the potential for personalizing online courses and address further steps to improve the applicability.
Language processing requires memory retrieval to integrate current input with previous context and making predictions about upcoming input. We propose that prediction and retrieval are two sides of the same coin, i.e. functionally the same, as they both activate memory representations. Under this assumption, memory retrieval and prediction should interact: Retrieval interference can only occur at a word that triggers retrieval and a fully predicted word would not do that. The present study investigated the proposed interaction with event-related potentials (ERPs) during the processing of sentence pairs in German. Predictability was measured via cloze probability. Memory retrieval was manipulated via the position of a distractor inducing proactive or retroactive similarity-based interference. Linear mixed model analyses provided evidence for the hypothesised interaction in a broadly distributed negativity, which we discuss in relation to the interference ERP literature. Our finding supports the proposal that memory retrieval and prediction are functionally the same.
In the present work, we use symbolic regression for automated modeling of dynamical systems. Symbolic regression is a powerful and general method suitable for data-driven identification of mathematical expressions. In particular, the structure and parameters of those expressions are identified simultaneously.
We consider two main variants of symbolic regression: sparse regression-based and genetic programming-based symbolic regression. Both are applied to identification, prediction and control of dynamical systems.
We introduce a new methodology for the data-driven identification of nonlinear dynamics for systems undergoing abrupt changes. Building on a sparse regression algorithm derived earlier, the model after the change is defined as a minimum update with respect to a reference model of the system identified prior to the change. The technique is successfully exemplified on the chaotic Lorenz system and the van der Pol oscillator. Issues such as computational complexity, robustness against noise and requirements with respect to data volume are investigated.
We show how symbolic regression can be used for time series prediction. Again, issues such as robustness against noise and convergence rate are investigated us- ing the harmonic oscillator as a toy problem. In combination with embedding, we demonstrate the prediction of a propagating front in coupled FitzHugh-Nagumo oscillators. Additionally, we show how we can enhance numerical weather predictions to commercially forecast power production of green energy power plants.
We employ symbolic regression for synchronization control in coupled van der Pol oscillators. Different coupling topologies are investigated. We address issues such as plausibility and stability of the control laws found. The toolkit has been made open source and is used in turbulence control applications.
Genetic programming based symbolic regression is very versatile and can be adapted to many optimization problems. The heuristic-based algorithm allows for cost efficient optimization of complex tasks.
We emphasize the ability of symbolic regression to yield white-box models. In contrast to black-box models, such models are accessible and interpretable which allows the usage of established tool chains.
SLocX predicting subcellular localization of Arabidopsis proteins leveraging gene expression data
(2011)
Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mito-chondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
The Limpopo Basin in southern Africa is prone to droughts which affect the livelihood of millions of people in South Africa, Botswana, Zimbabwe and Mozambique. Seasonal drought early warning is thus vital for the whole region. In this study, the predictability of hydrological droughts during the main runoff period from December to May is assessed using statistical approaches. Three methods (multiple linear models, artificial neural networks, random forest regression trees) are compared in terms of their ability to forecast streamflow with up to 12 months of lead time. The following four main findings result from the study.
1. There are stations in the basin at which standardised streamflow is predictable with lead times up to 12 months. The results show high inter-station differences of forecast skill but reach a coefficient of determination as high as 0.73 (cross validated).
2. A large range of potential predictors is considered in this study, comprising well-established climate indices, customised teleconnection indices derived from sea surface temperatures and antecedent streamflow as a proxy of catchment conditions. El Nino and customised indices, representing sea surface temperature in the Atlantic and Indian oceans, prove to be important teleconnection predictors for the region. Antecedent streamflow is a strong predictor in small catchments (with median 42% explained variance), whereas teleconnections exert a stronger influence in large catchments.
3. Multiple linear models show the best forecast skill in this study and the greatest robustness compared to artificial neural networks and random forest regression trees, despite their capabilities to represent nonlinear relationships.
4. Employed in early warning, the models can be used to forecast a specific drought level. Even if the coefficient of determination is low, the forecast models have a skill better than a climatological forecast, which is shown by analysis of receiver operating characteristics (ROCs). Seasonal statistical forecasts in the Limpopo show promising results, and thus it is recommended to employ them as complementary to existing forecasts in order to strengthen preparedness for droughts.
Background:
Endomyocardial biopsy is considered as the gold standard in patients with suspected myocarditis. We aimed to evaluate the impact of bioptic findings on prediction of successful return to work.
Methods:
In 1153 patients (48.9 ± 12.4 years, 66.2% male), who were hospitalized due to symptoms of left heart failure between 2005 and 2012, an endomyocardial biopsy was performed. Routine clinical and laboratory data, sociodemographic parameters, and noninvasive and invasive cardiac variables including endomyocardial biopsy were registered. Data were linked with return to work data from the German statutory pension insurance program and analyzed by Cox regression.
Results:
A total of 220 patients had a complete data set of hospital and insurance information. Three quarters of patients were virus-positive (54.2% parvovirus B19, other or mixed infection 16.7%). Mean invasive left ventricular ejection fraction was 47.1% ± 18.6% (left ventricular ejection fraction <45% in 46.3%). Return to work was achieved after a mean interval of 168.8 ± 347.7 days in 220 patients (after 6, 12, and 24 months in 61.3%, 72.2%, and 76.4%). In multivariate regression analysis, only age (per 10 years, hazard ratio, 1.27; 95% confidence interval, 1.10–1.46; p = 0.001) and left ventricular ejection fraction (per 5% increase, hazard ratio, 1.07; 95% confidence interval, 1.03–1.12; p = 0.002) were associated with increased, elevated work intensity (heavy vs light, congestive heart failure, 0.58; 95% confidence interval, 0.34–0.99; p < 0.049) with decreased probability of return to work. None of the endomyocardial biopsy–derived parameters was significantly associated with return to work in the total group as well as in the subgroup of patients with biopsy-proven myocarditis.
Conclusion:
Added to established predictors, bioptic data demonstrated no additional impact for return to work probability. Thus, socio-medical evaluation of patients with suspected myocarditis furthermore remains an individually oriented process based primarily on clinical and functional parameters.