004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (78)
- Doctoral Thesis (62)
- Postprint (6)
- Conference Proceeding (5)
- Master's Thesis (3)
- Other (3)
- Bachelor Thesis (1)
- Monograph/Edited Volume (1)
- Habilitation Thesis (1)
- Preprint (1)
Language
- English (161) (remove)
Keywords
- Machine Learning (7)
- Maschinelles Lernen (7)
- Computer Science Education (5)
- answer set programming (5)
- Antwortmengenprogrammierung (4)
- Answer Set Programming (3)
- Competence Measurement (3)
- DPLL (3)
- Optimization (3)
- Secondary Education (3)
Institute
- Institut für Informatik und Computational Science (161) (remove)
Accurately solving classification problems nowadays is likely to be the most relevant machine learning task. Binary classification separating two classes only is algorithmically simpler but has fewer potential applications as many real-world problems are multi-class. On the reverse, separating only a subset of classes simplifies the classification task. Even though existing multi-class machine learning algorithms are very flexible regarding the number of classes, they assume that the target set Y is fixed and cannot be restricted once the training is finished. On the other hand, existing state-of-the-art production environments are becoming increasingly interconnected with the advance of Industry 4.0 and related technologies such that additional information can simplify the respective classification problems. In light of this, the main aim of this thesis is to introduce dynamic classification that generalizes multi-class classification such that the target class set can be restricted arbitrarily to a non-empty class subset M of Y at any time between two consecutive predictions.
This task is solved by a combination of two algorithmic approaches. First, classifier calibration, which transforms predictions into posterior probability estimates that are intended to be well calibrated. The analysis provided focuses on monotonic calibration and in particular corrects wrong statements that appeared in the literature. It also reveals that bin-based evaluation metrics, which became popular in recent years, are unjustified and should not be used at all. Next, the validity of Platt scaling, which is the most relevant parametric calibration approach, is analyzed in depth. In particular, its optimality for classifier predictions distributed according to four different families of probability distributions as well its equivalence with Beta calibration up to a sigmoidal preprocessing are proven. For non-monotonic calibration, extended variants on kernel density estimation and the ensemble method EKDE are introduced. Finally, the calibration techniques are evaluated using a simulation study with complete information as well as on a selection of 46 real-world data sets.
Building on this, classifier calibration is applied as part of decomposition-based classification that aims to reduce multi-class problems to simpler (usually binary) prediction tasks. For the involved fusing step performed at prediction time, a new approach based on evidence theory is presented that uses classifier calibration to model mass functions. This allows the analysis of decomposition-based classification against a strictly formal background and to prove closed-form equations for the overall combinations. Furthermore, the same formalism leads to a consistent integration of dynamic class information, yielding a theoretically justified and computationally tractable dynamic classification model. The insights gained from this modeling are combined with pairwise coupling, which is one of the most relevant reduction-based classification approaches, such that all individual predictions are combined with a weight. This not only generalizes existing works on pairwise coupling but also enables the integration of dynamic class information.
Lastly, a thorough empirical study is performed that compares all newly introduced approaches to existing state-of-the-art techniques. For this, evaluation metrics for dynamic classification are introduced that depend on corresponding sampling strategies. Thereafter, these are applied during a three-part evaluation. First, support vector machines and random forests are applied on 26 data sets from the UCI Machine Learning Repository. Second, two state-of-the-art deep neural networks are evaluated on five benchmark data sets from a relatively recent reference work. Here, computationally feasible strategies to apply the presented algorithms in combination with large-scale models are particularly relevant because a naive application is computationally intractable. Finally, reference data from a real-world process allowing the inclusion of dynamic class information are collected and evaluated. The results show that in combination with support vector machines and random forests, pairwise coupling approaches yield the best results, while in combination with deep neural networks, differences between the different approaches are mostly small to negligible. Most importantly, all results empirically confirm that dynamic classification succeeds in improving the respective prediction accuracies. Therefore, it is crucial to pass dynamic class information in respective applications, which requires an appropriate digital infrastructure.
M-rate 0L systems are interactionless Lindenmayer systems together with a function assigning to every string a set of multisets of productions that may be applied simultaneously to the string. Some questions that have been left open in the forerunner papers are examined, and the computational power of deterministic M-rate 0L systems is investigated, where also tabled and extended variants are taken into consideration.
The highly structured nature of the educational sector demands effective policy mechanisms close to the needs of the field. That is why evidence-based policy making, endorsed by the European Commission under Erasmus+ Key Action 3, aims to make an alignment between the domains of policy and practice. Against this background, this article addresses two issues: First, that there is a vertical gap in the translation of higher-level policies to local strategies and regulations. Second, that there is a horizontal gap between educational domains regarding the policy awareness of individual players. This was analyzed in quantitative and qualitative studies with domain experts from the fields of virtual mobility and teacher training. From our findings, we argue that the combination of both gaps puts the academic bridge from secondary to tertiary education at risk, including the associated knowledge proficiency levels. We discuss the role of digitalization in the academic bridge by asking the question: which value does the involved stakeholders expect from educational policies? As a theoretical basis, we rely on the model of value co-creation for and by stakeholders. We describe the used instruments along with the obtained results and proposed benefits. Moreover, we reflect on the methodology applied, and we finally derive recommendations for future academic bridge policies.
Research publications and data nowadays should be publicly available on the internet and, theoretically, usable for everyone to develop further research, products, or services. The long-term accessibility of research data is, therefore, fundamental in the economy of the research production process. However, the availability of data is not sufficient by itself, but also their quality must be verifiable. Measures to ensure reuse and reproducibility need to include the entire research life cycle, from the experimental design to the generation of data, quality control, statistical analysis, interpretation, and validation of the results. Hence, high-quality records, particularly for providing a string of documents for the verifiable origin of data, are essential elements that can act as a certificate for potential users (customers). These records also improve the traceability and transparency of data and processes, therefore, improving the reliability of results. Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in the life science academic research are agreed procedures for complex routine research workflows. Here, well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. Therefore, this paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation.
Research publications and data nowadays should be publicly available on the internet and, theoretically, usable for everyone to develop further research, products, or services. The long-term accessibility of research data is, therefore, fundamental in the economy of the research production process. However, the availability of data is not sufficient by itself, but also their quality must be verifiable. Measures to ensure reuse and reproducibility need to include the entire research life cycle, from the experimental design to the generation of data, quality control, statistical analysis, interpretation, and validation of the results. Hence, high-quality records, particularly for providing a string of documents for the verifiable origin of data, are essential elements that can act as a certificate for potential users (customers). These records also improve the traceability and transparency of data and processes, therefore, improving the reliability of results. Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in the life science academic research are agreed procedures for complex routine research workflows. Here, well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. Therefore, this paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation.
The aim of our project design space exploration with answer set programming is to develop a general framework based on Answer Set Programming (ASP) that finds valid solutions to the system design problem and simultaneously performs Design Space Exploration (DSE) to find the most favorable alternatives. We leverage recent developments in ASP solving that allow for tight integration of background theories to create a holistic framework for effective DSE.
The Potsdam answer set solving collection, or Potassco for short, bundles various tools implementing and/or applying answer set programming. The article at hand succeeds an earlier description of the Potassco project published in Gebser et al. (AI Commun 24(2):107-124, 2011). Hence, we concentrate in what follows on the major features of the most recent, fifth generation of the ASP system clingo and highlight some recent resulting application systems.