Institut für Informatik und Computational Science
Refine
Has Fulltext
- yes (126) (remove)
Year of publication
Document Type
- Doctoral Thesis (65)
- Article (40)
- Postprint (10)
- Conference Proceeding (5)
- Master's Thesis (3)
- Bachelor Thesis (1)
- Habilitation Thesis (1)
- Preprint (1)
Language
- English (126) (remove)
Keywords
- Maschinelles Lernen (7)
- Antwortmengenprogrammierung (5)
- Computer Science Education (5)
- Machine Learning (5)
- answer set programming (4)
- Answer Set Programming (3)
- Competence Measurement (3)
- DPLL (3)
- Secondary Education (3)
- machine learning (3)
This thesis presents an attempt to use source code synthesised from Coq formalisations of device drivers for existing (micro)kernel operating systems, with a particular focus on the Linux Kernel.
In the first part, the technical background and related work are described. The focus is here on the possible approaches to synthesising certified software with Coq, namely the extraction to functional languages using the Coq extraction plugin and the extraction to Clight code using the CertiCoq plugin. It is noted that the implementation of CertiCoq is verified, whereas this is not the case for the Coq extraction plugin. Consequently, there is a correctness guarantee for the generated Clight code which does not hold for the code being generated by the Coq extraction plugin. Furthermore, the differences between user space and kernel space software are discussed in relation to Linux device drivers. It is elaborated that it is not possible to generate working Linux kernel module components using the Coq extraction plugin without significant modifications. In contrast, it is possible to produce working user space drivers both with the Coq extraction plugin and CertiCoq. The subsequent parts describe the main contributions of the thesis.
In the second part, it is demonstrated how to extend the Coq extraction plugin to synthesise foreign function calls between the functional language OCaml and the imperative language C. This approach has the potential to improve the type-safety of user space drivers. Furthermore, it is shown that the code being synthesised by CertiCoq cannot be used in kernel space without modifications to the necessary runtime. Consequently, the necessary modifications to the runtimes of CertiCoq and VeriFFI are introduced, resulting in the runtimes becoming compatible components of a Linux kernel module. Furthermore, justifications for the transformations are provided and possible further extensions to both plugins and solutions to failing garbage collection calls in kernel space are discussed.
The third part presents a proof of concept device driver for the Linux Kernel. To achieve this, the event handler of the original PC Speaker driver is partially formalised in Coq. Furthermore, some relevant formal properties of the formalised functionality are discussed. Subsequently, a kernel module is defined, utilising the modified variants of CertiCoq and VeriFFI to compile a working device driver. It is furthermore shown that it is possible to compile the synthesised code with CompCert, thereby extending the guarantee of correctness to the assembly layer. This is followed by a performance evaluation that compares a naive formalisation of the PC speaker functionality with the original PC Speaker driver pointing out the weaknesses in the formalisation and possible improvements. The part closes with a summary of the results, their implications and open questions being raised.
The last part lists all used sources, separated into scientific literature, documentations or reference manuals and artifacts, i.e. source code.
Due to anthropogenic greenhouse gas emissions, Earth’s average surface temperature is steadily increasing. As a consequence, many weather extremes are likely to become more frequent and intense. This poses a threat to natural and human systems, with local impacts capable of destroying exposed assets and infrastructure, and disrupting economic and societal activity. Yet, these effects are not locally confined to the directly affected regions, as they can trigger indirect economic repercussions through loss propagation along supply chains. As a result, local extremes yield a potentially global economic response. To build economic resilience and design effective adaptation measures that mitigate adverse socio-economic impacts of ongoing climate change, it is crucial to gain a comprehensive understanding of indirect impacts and the underlying economic mechanisms.
Presenting six articles in this thesis, I contribute towards this understanding. To this end, I expand on local impacts under current and future climate, the resulting global economic response, as well as the methods and tools to analyze this response.
Starting with a traditional assessment of weather extremes under climate change, the first article investigates extreme snowfall in the Northern Hemisphere until the end of the century. Analyzing an ensemble of global climate model projections reveals an increase of the most extreme snowfall, while mean snowfall decreases.
Assessing repercussions beyond local impacts, I employ numerical simulations to compute indirect economic effects from weather extremes with the numerical agent-based shock propagation model Acclimate. This model is used in conjunction with the recently emerged storyline framework, which involves analyzing the impacts of a particular reference extreme event and comparing them to impacts in plausible counterfactual scenarios under various climate or socio-economic conditions. Using this approach, I introduce three primary storylines that shed light on the complex mechanisms underlying economic loss propagation.
In the second and third articles of this thesis, I analyze storylines for the historical Hurricanes Sandy (2012) and Harvey (2017) in the USA. For this, I first estimate local economic output losses and then simulate the resulting global economic response with Acclimate. The storyline for Hurricane Sandy thereby focuses on global consumption price anomalies and the resulting changes in consumption. I find that the local economic disruption leads to a global wave-like economic price ripple, with upstream effects propagating in the supplier direction and downstream effects in the buyer direction. Initially, an upstream demand reduction causes consumption price decreases, followed by a downstream supply shortage and increasing prices, before the anomalies decay in a normalization phase. A dominant upstream or downstream effect leads to net consumption gains or losses of a region, respectively. Moreover, I demonstrate that a longer direct economic shock intensifies the downstream effect for many regions, leading to an overall consumption loss.
The third article of my thesis builds upon the developed loss estimation method by incorporating projections to future global warming levels. I use these projections to explore how the global production response to Hurricane Harvey would change under further increased global warming. The results show that, while the USA is able to nationally offset direct losses in the reference configuration, other countries have to compensate for increasing shares of counterfactual future losses. This compensation is mainly achieved by large exporting countries, but gradually shifts towards smaller regions. These findings not only highlight the economy’s ability to flexibly mitigate disaster losses to a certain extent, but also reveal the vulnerability and economic disadvantage of regions that are exposed to extreme weather events.
The storyline in the fourth article of my thesis investigates the interaction between global economic stress and the propagation of losses from weather extremes. I examine indirect impacts of weather extremes — tropical cyclones, heat stress, and river floods — worldwide under two different economic conditions: an unstressed economy and a globally stressed economy, as seen during the Covid-19 pandemic. I demonstrate that the adverse effects of weather extremes on global consumption are strongly amplified when the economy is under stress. Specifically, consumption losses in the USA and China double and triple, respectively, due to the global economy’s decreased capacity for disaster loss compensation. An aggravated scarcity intensifies the price response, causing consumption losses to increase.
Advancing on the methods and tools used here, the final two articles in my thesis extend the agent-based model Acclimate and formalize the storyline approach. With the model extension described in the fifth article, regional consumers make rational choices on the goods bought such that their utility is maximized under a constrained budget. In an out-of-equilibrium economy, these rational consumers are shown to temporarily increase consumption of certain goods in spite of rising prices.
The sixth article of my thesis proposes a formalization of the storyline framework, drawing on multiple studies including storylines presented in this thesis. The proposed guideline defines eight central elements that can be used to construct a storyline.
Overall, this thesis contributes towards a better understanding of economic repercussions of weather extremes. It achieves this by providing assessments of local direct impacts, highlighting mechanisms and impacts of loss propagation, and advancing on methods and tools used.
In this bachelor’s thesis I implement the automatic theorem prover nanoCoP-Ω. This system is the result of porting arithmetic and equality handling procedures first introduced in the automatic theorem prover with arithmetic leanCoP-Ω into the similar system nanoCoP 2.0. To understand these procedures, I first introduce the mathematical background to both automatic theorem proving and arithmetic expressions. I present the predecessor projects leanCoP, nanoCoP and leanCoP-Ω, out of which nanCoP-Ω was developed. This is followed by an extensive description of the concepts the non-clausal connection calculus needed to be extended by, to allow for proving arithmetic expressions and equalities, as well as of their implementation into nanoCoP-Ω. An extensive comparison between both the runtimes and the number of solved problems of the systems nanoCoP-Ω and leanCoP-Ω was made. I come to the conclusion, that nanoCoP-Ω is considerably faster than leanCoP-Ω for small problems, though less well suited for larger problems. Additionally, I was able to construct a non-theorem that nanoCoP-Ω generates a false proof for. I discuss how this pressing issue could be resolved, as well as some possible optimizations and expansions of the system.
Reliable and robust data processing is one of the hardest requirements for systems in fields such as medicine, security, automotive, aviation, and space, to prevent critical system failures caused by changes in operating or environmental conditions. In particular, Signal Integrity (SI) effects such as crosstalk may distort the signal information in sensitive mixed-signal designs. A challenge for hardware systems used in the space are radiation effects. Namely, Single Event Effects (SEEs) induced by high-energy particle hits may lead to faulty computation, corrupted configuration settings, undesired system behavior, or even total malfunction.
Since these applications require an extra effort in design and implementation, it is beneficial to master the standard cell design process and corresponding design flow methodologies optimized for such challenges. Especially for reliable, low-noise differential signaling logic such as Current Mode Logic (CML), a digital design flow is an orthogonal approach compared to traditional manual design. As a consequence, mandatory preliminary considerations need to be addressed in more detail. First of all, standard cell library concepts with suitable cell extensions for reliable systems and robust space applications have to be elaborated. Resulting design concepts at the cell level should enable the logical synthesis for differential logic design or improve the radiation-hardness. In parallel, the main objectives of the proposed cell architectures are to reduce the occupied area, power, and delay overhead. Second, a special setup for standard cell characterization is additionally required for a proper and accurate logic gate modeling. Last but not least, design methodologies for mandatory design flow stages such as logic synthesis and place and route need to be developed for the respective hardware systems to keep the reliability or the radiation-hardness at an acceptable level.
This Thesis proposes and investigates standard cell-based design methodologies and techniques for reliable and robust hardware systems implemented in a conventional semi-conductor technology. The focus of this work is on reliable differential logic design and robust radiation-hardening-by-design circuits. The synergistic connections of the digital design flow stages are systematically addressed for these two types of hardware systems. In more detail, a library for differential logic is extended with single-ended pseudo-gates for intermediate design steps to support the logic synthesis and layout generation with commercial Computer-Aided Design (CAD) tools. Special cell layouts are proposed to relax signal routing. A library set for space applications is similarly extended by novel Radiation-Hardening-by-Design (RHBD) Triple Modular Redundancy (TMR) cells, enabling a one fault correction. Therein, additional optimized architectures for glitch filter cells, robust scannable and self-correcting flip-flops, and clock-gates are proposed. The circuit concepts and the physical layout representation views of the differential logic gates and the RHBD cells are discussed. However, the quality of results of designs depends implicitly on the accuracy of the standard cell characterization which is examined for both types therefore. The entire design flow is elaborated from the hardware design description to the layout representations. A 2-Phase routing approach together with an intermediate design conversion step is proposed after the initial place and route stage for reliable, pure differential designs, whereas a special constraining for RHBD applications in a standard technology is presented.
The digital design flow for differential logic design is successfully demonstrated on a reliable differential bipolar CML application. A balanced routing result of its differential signal pairs is obtained by the proposed 2-Phase-routing approach. Moreover, the elaborated standard cell concepts and design methodology for RHBD circuits are applied to the digital part of a 7.5-15.5 MSPS 14-bit Analog-to-Digital Converter (ADC) and a complex microcontroller architecture. The ADC is implemented in an unhardened standard semiconductor technology and successfully verified by electrical measurements. The overhead of the proposed hardening approach is additionally evaluated by design exploration of the microcontroller application. Furthermore, the first obtained related measurement results of novel RHBD-∆TMR flip-flops show a radiation-tolerance up to a threshold Linear Energy Transfer (LET) of 46.1, 52.0, and 62.5 MeV cm2 mg-1 and savings in silicon area of 25-50 % for selected TMR standard cell candidates.
As a conclusion, the presented design concepts at the cell and library levels, as well as the design flow modifications are adaptable and transferable to other technology nodes. In particular, the design of hybrid solutions with integrated reliable differential logic modules together with robust radiation-tolerant circuit parts is enabled by the standard cell concepts and design methods proposed in this work.
Accurately solving classification problems nowadays is likely to be the most relevant machine learning task. Binary classification separating two classes only is algorithmically simpler but has fewer potential applications as many real-world problems are multi-class. On the reverse, separating only a subset of classes simplifies the classification task. Even though existing multi-class machine learning algorithms are very flexible regarding the number of classes, they assume that the target set Y is fixed and cannot be restricted once the training is finished. On the other hand, existing state-of-the-art production environments are becoming increasingly interconnected with the advance of Industry 4.0 and related technologies such that additional information can simplify the respective classification problems. In light of this, the main aim of this thesis is to introduce dynamic classification that generalizes multi-class classification such that the target class set can be restricted arbitrarily to a non-empty class subset M of Y at any time between two consecutive predictions.
This task is solved by a combination of two algorithmic approaches. First, classifier calibration, which transforms predictions into posterior probability estimates that are intended to be well calibrated. The analysis provided focuses on monotonic calibration and in particular corrects wrong statements that appeared in the literature. It also reveals that bin-based evaluation metrics, which became popular in recent years, are unjustified and should not be used at all. Next, the validity of Platt scaling, which is the most relevant parametric calibration approach, is analyzed in depth. In particular, its optimality for classifier predictions distributed according to four different families of probability distributions as well its equivalence with Beta calibration up to a sigmoidal preprocessing are proven. For non-monotonic calibration, extended variants on kernel density estimation and the ensemble method EKDE are introduced. Finally, the calibration techniques are evaluated using a simulation study with complete information as well as on a selection of 46 real-world data sets.
Building on this, classifier calibration is applied as part of decomposition-based classification that aims to reduce multi-class problems to simpler (usually binary) prediction tasks. For the involved fusing step performed at prediction time, a new approach based on evidence theory is presented that uses classifier calibration to model mass functions. This allows the analysis of decomposition-based classification against a strictly formal background and to prove closed-form equations for the overall combinations. Furthermore, the same formalism leads to a consistent integration of dynamic class information, yielding a theoretically justified and computationally tractable dynamic classification model. The insights gained from this modeling are combined with pairwise coupling, which is one of the most relevant reduction-based classification approaches, such that all individual predictions are combined with a weight. This not only generalizes existing works on pairwise coupling but also enables the integration of dynamic class information.
Lastly, a thorough empirical study is performed that compares all newly introduced approaches to existing state-of-the-art techniques. For this, evaluation metrics for dynamic classification are introduced that depend on corresponding sampling strategies. Thereafter, these are applied during a three-part evaluation. First, support vector machines and random forests are applied on 26 data sets from the UCI Machine Learning Repository. Second, two state-of-the-art deep neural networks are evaluated on five benchmark data sets from a relatively recent reference work. Here, computationally feasible strategies to apply the presented algorithms in combination with large-scale models are particularly relevant because a naive application is computationally intractable. Finally, reference data from a real-world process allowing the inclusion of dynamic class information are collected and evaluated. The results show that in combination with support vector machines and random forests, pairwise coupling approaches yield the best results, while in combination with deep neural networks, differences between the different approaches are mostly small to negligible. Most importantly, all results empirically confirm that dynamic classification succeeds in improving the respective prediction accuracies. Therefore, it is crucial to pass dynamic class information in respective applications, which requires an appropriate digital infrastructure.
As a result of CMOS scaling, radiation-induced Single-Event Effects (SEEs) in electronic circuits became a critical reliability issue for modern Integrated Circuits (ICs) operating under harsh radiation conditions. SEEs can be triggered in combinational or sequential logic by the impact of high-energy particles, leading to destructive or non-destructive faults, resulting in data corruption or even system failure. Typically, the SEE mitigation methods are deployed statically in processing architectures based on the worst-case radiation conditions, which is most of the time unnecessary and results in a resource overhead. Moreover, the space radiation conditions are dynamically changing, especially during Solar Particle Events (SPEs). The intensity of space radiation can differ over five orders of magnitude within a few hours or days, resulting in several orders of magnitude fault probability variation in ICs during SPEs. This thesis introduces a comprehensive approach for designing a self-adaptive fault resilient multiprocessing system to overcome the static mitigation overhead issue. This work mainly addresses the following topics: (1) Design of on-chip radiation particle monitor for real-time radiation environment detection, (2) Investigation of space environment predictor, as support for solar particle events forecast, (3) Dynamic mode configuration in the resilient multiprocessing system. Therefore, according to detected and predicted in-flight space radiation conditions, the target system can be configured to use no mitigation or low-overhead mitigation during non-critical periods of time. The redundant resources can be used to improve system performance or save power. On the other hand, during increased radiation activity periods, such as SPEs, the mitigation methods can be dynamically configured appropriately depending on the real-time space radiation environment, resulting in higher system reliability. Thus, a dynamic trade-off in the target system between reliability, performance and power consumption in real-time can be achieved. All results of this work are evaluated in a highly reliable quad-core multiprocessing system that allows the self-adaptive setting of optimal radiation mitigation mechanisms during run-time. Proposed methods can serve as a basis for establishing a comprehensive self-adaptive resilient system design process. Successful implementation of the proposed design in the quad-core multiprocessor shows its application perspective also in the other designs.
The highly structured nature of the educational sector demands effective policy mechanisms close to the needs of the field. That is why evidence-based policy making, endorsed by the European Commission under Erasmus+ Key Action 3, aims to make an alignment between the domains of policy and practice. Against this background, this article addresses two issues: First, that there is a vertical gap in the translation of higher-level policies to local strategies and regulations. Second, that there is a horizontal gap between educational domains regarding the policy awareness of individual players. This was analyzed in quantitative and qualitative studies with domain experts from the fields of virtual mobility and teacher training. From our findings, we argue that the combination of both gaps puts the academic bridge from secondary to tertiary education at risk, including the associated knowledge proficiency levels. We discuss the role of digitalization in the academic bridge by asking the question: which value does the involved stakeholders expect from educational policies? As a theoretical basis, we rely on the model of value co-creation for and by stakeholders. We describe the used instruments along with the obtained results and proposed benefits. Moreover, we reflect on the methodology applied, and we finally derive recommendations for future academic bridge policies.
Large-scale databases that report the inhibitory capacities of many combinations of candidate drug compounds and cultivated cancer cell lines have driven the development of preclinical drug-sensitivity models based on machine learning. However, cultivated cell lines have devolved from human cancer cells over years or even decades under selective pressure in culture conditions. Moreover, models that have been trained on in vitro data cannot account for interactions with other types of cells. Drug-response data that are based on patient-derived cell cultures, xenografts, and organoids, on the other hand, are not available in the quantities that are needed to train high-capacity machine-learning models. We found that pre-training deep neural network models of drug sensitivity on in vitro drug-sensitivity databases before fine-tuning the model parameters on patient-derived data improves the models’ accuracy and improves the biological plausibility of the features, compared to training only on patient-derived data. From our experiments, we can conclude that pre-trained models outperform models that have been trained on the target domains in the vast majority of cases.
With the downscaling of CMOS technologies, the radiation-induced Single Event Transient (SET) effects in combinational logic have become a critical reliability issue for modern integrated circuits (ICs) intended for operation under harsh radiation conditions. The SET pulses generated in combinational logic may propagate through the circuit and eventually result in soft errors. It has thus become an imperative to address the SET effects in the early phases of the radiation-hard IC design. In general, the soft error mitigation solutions should accommodate both static and dynamic measures to ensure the optimal utilization of available resources. An efficient soft-error-aware design should address synergistically three main aspects: (i) characterization and modeling of soft errors, (ii) multi-level soft error mitigation, and (iii) online soft error monitoring. Although significant results have been achieved, the effectiveness of SET characterization methods, accuracy of predictive SET models, and efficiency of SET mitigation measures are still critical issues. Therefore, this work addresses the following topics: (i) Characterization and modeling of SET effects in standard combinational cells, (ii) Static mitigation of SET effects in standard combinational cells, and (iii) Online particle detection, as a support for dynamic soft error mitigation.
Since the standard digital libraries are widely used in the design of radiation-hard ICs, the characterization of SET effects in standard cells and the availability of accurate SET models for the Soft Error Rate (SER) evaluation are the main prerequisites for efficient radiation-hard design. This work introduces an approach for the SPICE-based standard cell characterization with the reduced number of simulations, improved SET models and optimized SET sensitivity database. It has been shown that the inherent similarities in the SET response of logic cells for different input levels can be utilized to reduce the number of required simulations. Based on characterization results, the fitting models for the SET sensitivity metrics (critical charge, generated SET pulse width and propagated SET pulse width) have been developed. The proposed models are based on the principle of superposition, and they express explicitly the dependence of the SET sensitivity of individual combinational cells on design, operating and irradiation parameters. In contrast to the state-of-the-art characterization methodologies which employ extensive look-up tables (LUTs) for storing the simulation results, this work proposes the use of LUTs for storing the fitting coefficients of the SET sensitivity models derived from the characterization results. In that way the amount of characterization data in the SET sensitivity database is reduced significantly.
The initial step in enhancing the robustness of combinational logic is the application of gate-level mitigation techniques. As a result, significant improvement of the overall SER can be achieved with minimum area, delay and power overheads. For the SET mitigation in standard cells, it is essential to employ the techniques that do not require modifying the cell structure. This work introduces the use of decoupling cells for improving the robustness of standard combinational cells. By insertion of two decoupling cells at the output of a target cell, the critical charge of the cell’s output node is increased and the attenuation of short SETs is enhanced. In comparison to the most common gate-level techniques (gate upsizing and gate duplication), the proposed approach provides better SET filtering. However, as there is no single gate-level mitigation technique with optimal performance, a combination of multiple techniques is required. This work introduces a comprehensive characterization of gate-level mitigation techniques aimed to quantify their impact on the SET robustness improvement, as well as introduced area, delay and power overhead per gate. By characterizing the gate-level mitigation techniques together with the standard cells, the required effort in subsequent SER analysis of a target design can be reduced. The characterization database of the hardened standard cells can be utilized as a guideline for selection of the most appropriate mitigation solution for a given design.
As a support for dynamic soft error mitigation techniques, it is important to enable the online detection of energetic particles causing the soft errors. This allows activating the power-greedy fault-tolerant configurations based on N-modular redundancy only at the high radiation levels. To enable such a functionality, it is necessary to monitor both the particle flux and the variation of particle LET, as these two parameters contribute significantly to the system SER. In this work, a particle detection approach based on custom-sized pulse stretching inverters is proposed. Employing the pulse stretching inverters connected in parallel enables to measure the particle flux in terms of the number of detected SETs, while the particle LET variations can be estimated from the distribution of SET pulse widths. This approach requires a purely digital processing logic, in contrast to the standard detectors which require complex mixed-signal processing. Besides the possibility of LET monitoring, additional advantages of the proposed particle detector are low detection latency and power consumption, and immunity to error accumulation.
The results achieved in this thesis can serve as a basis for establishment of an overall soft-error-aware database for a given digital library, and a comprehensive multi-level radiation-hard design flow that can be implemented with the standard IC design tools. The following step will be to evaluate the achieved results with the irradiation experiments.
The Internet of Things (IoT) is a system of physical objects that can be discovered, monitored, controlled, or interacted with by electronic devices that communicate over various networking interfaces and eventually can be connected to the wider Internet. [Guinard and Trifa, 2016]. IoT devices are equipped with sensors and/or actuators and may be constrained in terms of memory, computational power, network bandwidth, and energy. Interoperability can help to manage such heterogeneous devices. Interoperability is the ability of different types of systems to work together smoothly. There are four levels of interoperability: physical, network and transport, integration, and data. The data interoperability is subdivided into syntactic and semantic data. Semantic data describes the meaning of data and the common understanding of vocabulary e.g. with the help of dictionaries, taxonomies, ontologies. To achieve interoperability, semantic interoperability is necessary.
Many organizations and companies are working on standards and solutions for interoperability in the IoT. However, the commercial solutions produce a vendor lock-in. They focus on centralized approaches such as cloud-based solutions. This thesis proposes a decentralized approach namely Edge Computing. Edge Computing is based on the concepts of mesh networking and distributed processing. This approach has an advantage that information collection and processing are placed closer to the sources of this information. The goals are to reduce traffic, latency, and to be robust against a lossy or failed Internet connection.
We see management of IoT devices from the network configuration management perspective. This thesis proposes a framework for network configuration management of heterogeneous, constrained IoT devices by using semantic descriptions for interoperability. The MYNO framework is an acronym for MQTT, YANG, NETCONF and Ontology. The NETCONF protocol is the IETF standard for network configuration management. The MQTT protocol is the de-facto standard in the IoT. We picked up the idea of the NETCONF-MQTT bridge, originally proposed by Scheffler and Bonneß[2017], and extended it with semantic device descriptions. These device descriptions provide a description of the device capabilities. They are based on the oneM2M Base ontology and formalized by the Semantic Web Standards.
The novel approach is using a ontology-based device description directly on a constrained device in combination with the MQTT protocol. The bridge was extended in order to query such descriptions. Using a semantic annotation, we achieved that the device capabilities are self-descriptive, machine readable and re-usable.
The concept of a Virtual Device was introduced and implemented, based on semantic device descriptions. A Virtual Device aggregates the capabilities of all devices at the edge network and contributes therefore to the scalability. Thus, it is possible to control all devices via a single RPC call.
The model-driven NETCONF Web-Client is generated automatically from this YANG model which is generated by the bridge based on the semantic device description. The Web-Client provides a user-friendly interface, offers RPC calls and displays sensor values. We demonstrate the feasibility of this approach in different use cases: sensor and actuator scenarios, as well as event configuration and triggering.
The semantic approach results in increased memory overhead. Therefore, we evaluated CBOR and RDF HDT for optimization of ontology-based device descriptions for use on constrained devices. The evaluation shows that CBOR is not suitable for long strings and RDF HDT is a promising candidate but is still a W3C Member Submission. Finally, we used an optimized JSON-LD format for the syntax of the device descriptions.
One of the security tasks of network management is the distribution of firmware updates. The MYNO Update Protocol (MUP) was developed and evaluated on constrained devices CC2538dk and 6LoWPAN. The MYNO update process is focused on freshness and authenticity of the firmware. The evaluation shows that it is challenging but feasible to bring the firmware updates to constrained devices using MQTT. As a new requirement for the next MQTT version, we propose to add a slicing feature for the better support of constrained devices. The MQTT broker should slice data to the maximum packet size specified by the device and transfer it slice-by-slice.
For the performance and scalability evaluation of MYNO framework, we setup the High Precision Agriculture demonstrator with 10 ESP-32 NodeMCU boards at the edge of the network. The ESP-32 NodeMCU boards, connected by WLAN, were equipped with six sensors and two actuators. The performance evaluation shows that the processing of ontology-based descriptions on a Raspberry Pi 3B with the RDFLib is a challenging task regarding computational power. Nevertheless, it is feasible because it must be done only once per device during the discovery process.
The MYNO framework was tested with heterogeneous devices such as CC2538dk from Texas Instruments, Arduino Yún Rev 3, and ESP-32 NodeMCU, and IP-based networks such as 6LoWPAN and WLAN.
Summarizing, with the MYNO framework we could show that the semantic approach on constrained devices is feasible in the IoT.
Research publications and data nowadays should be publicly available on the internet and, theoretically, usable for everyone to develop further research, products, or services. The long-term accessibility of research data is, therefore, fundamental in the economy of the research production process. However, the availability of data is not sufficient by itself, but also their quality must be verifiable. Measures to ensure reuse and reproducibility need to include the entire research life cycle, from the experimental design to the generation of data, quality control, statistical analysis, interpretation, and validation of the results. Hence, high-quality records, particularly for providing a string of documents for the verifiable origin of data, are essential elements that can act as a certificate for potential users (customers). These records also improve the traceability and transparency of data and processes, therefore, improving the reliability of results. Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in the life science academic research are agreed procedures for complex routine research workflows. Here, well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. Therefore, this paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation.
In the last decades, there was a notable progress in solving the well-known Boolean satisfiability (Sat) problem, which can be witnessed by powerful Sat solvers. One of the reasons why these solvers are so fast are structural properties of instances that are utilized by the solver’s interna. This thesis deals with the well-studied structural property treewidth, which measures the closeness of an instance to being a tree. In fact, there are many problems parameterized by treewidth that are solvable in polynomial time in the instance size when parameterized by treewidth.
In this work, we study advanced treewidth-based methods and tools for problems in knowledge representation and reasoning (KR). Thereby, we provide means to establish precise runtime results (upper bounds) for canonical problems relevant to KR. Then, we present a new type of problem reduction, which we call decomposition-guided (DG) that
allows us to precisely monitor the treewidth when reducing from one problem to another problem. This new reduction type will be the basis for a long-open lower bound result for quantified Boolean formulas and allows us to design a new methodology for establishing runtime lower bounds for problems parameterized by treewidth.
Finally, despite these lower bounds, we provide an efficient implementation of algorithms that adhere to treewidth. Our approach finds suitable abstractions of instances, which are subsequently refined in a recursive fashion, and it uses Sat solvers for solving subproblems. It turns out that our resulting solver is quite competitive for two canonical counting problems related to Sat.
A common feature in Answer Set Programming is the use of a second negation, stronger than default negation and sometimes called explicit, strong or classical negation. This explicit negation is normally used in front of atoms, rather than allowing its use as a regular operator. In this paper we consider the arbitrary combination of explicit negation with nested expressions, as those defined by Lifschitz, Tang and Turner. We extend the concept of reduct for this new syntax and then prove that it can be captured by an extension of Equilibrium Logic with this second negation. We study some properties of this variant and compare to the already known combination of Equilibrium Logic with Nelson's strong negation.
MUP
(2020)
Message Queuing Telemetry Transport (MQTT) is one of the dominating protocols for edge- and cloud-based Internet of Things (IoT) solutions. When a security vulnerability of an IoT device is known, it has to be fixed as soon as possible. This requires a firmware update procedure. In this paper, we propose a secure update protocol for MQTT-connected devices which ensures the freshness of the firmware, authenticates the new firmware and considers constrained devices. We show that the update protocol is easy to integrate in an MQTT-based IoT network using a semantic approach. The feasibility of our approach is demonstrated by a detailed performance analysis of our prototype implementation on a IoT device with 32 kB RAM. Thereby, we identify design issues in MQTT 5 which can help to improve the support of constrained devices.
In a recent line of research, two familiar concepts from logic programming semantics (unfounded sets and splitting) were extrapolated to the case of epistemic logic programs. The property of epistemic splitting provides a natural and modular way to understand programs without epistemic cycles but, surprisingly, was only fulfilled by Gelfond's original semantics (G91), among the many proposals in the literature. On the other hand, G91 may suffer from a kind of self-supported, unfounded derivations when epistemic cycles come into play. Recently, the absence of these derivations was also formalised as a property of epistemic semantics called foundedness. Moreover, a first semantics proved to satisfy foundedness was also proposed, the so-called Founded Autoepistemic Equilibrium Logic (FAEEL). In this paper, we prove that FAEEL also satisfies the epistemic splitting property something that, together with foundedness, was not fulfilled by any other approach up to date. To prove this result, we provide an alternative characterisation of FAEEL as a combination of G91 with a simpler logic we called Founded Epistemic Equilibrium Logic (FEEL), which is somehow an extrapolation of the stable model semantics to the modal logic S5.
Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraints that are specific to the task at hand. Specifically for argumentation parsing, two decoding approaches have been recently proposed: Minimum Spanning Trees (MST) and Integer Linear Programming (ILP), following similar trends in discourse parsing. In contrast to discourse parsing though, where trees are not always used as underlying annotation schemes, argumentation structures so far have always been represented with trees. Using the 'argumentative microtext corpus' [in: Argumentation and Reasoned Action: Proceedings of the 1st European Conference on Argumentation, Lisbon 2015 / Vol. 2, College Publications, London, 2016, pp. 801-815] as underlying data and replicating three different decoding mechanisms, in this paper we propose a novel ILP decoder and an extension to our earlier MST work, and then thoroughly compare the approaches. The result is that our new decoder outperforms related work in important respects, and that in general, ILP and MST yield very similar performance.
Flux-P
(2012)
Quantitative knowledge of intracellular fluxes in metabolic networks is invaluable for inferring metabolic system behavior and the design principles of biological systems. However, intracellular reaction rates can not often be calculated directly but have to be estimated; for instance, via 13C-based metabolic flux analysis, a model-based interpretation of stable carbon isotope patterns in intermediates of metabolism. Existing software such as FiatFlux, OpenFLUX or 13CFLUX supports experts in this complex analysis, but requires several steps that have to be carried out manually, hence restricting the use of this software for data interpretation to a rather small number of experiments. In this paper, we present Flux-P as an approach to automate and standardize 13C-based metabolic flux analysis, using the Bio-jETI workflow framework. Exemplarily based on the FiatFlux software, it demonstrates how services can be created that carry out the different analysis steps autonomously and how these can subsequently be assembled into software workflows that perform automated, high-throughput intracellular flux analysis of high quality and reproducibility. Besides significant acceleration and standardization of the data analysis, the agile workflow-based realization supports flexible changes of the analysis workflows on the user level, making it easy to perform custom analyses.
A central insight from psychological studies on human eye movements is that eye movement patterns are highly individually characteristic. They can, therefore, be used as a biometric feature, that is, subjects can be identified based on their eye movements. This thesis introduces new machine learning methods to identify subjects based on their eye movements while viewing arbitrary content. The thesis focuses on probabilistic modeling of the problem, which has yielded the best results in the most recent literature. The thesis studies the problem in three phases by proposing a purely probabilistic, probabilistic deep learning, and probabilistic deep metric learning approach. In the first phase, the thesis studies models that rely on psychological concepts about eye movements. Recent literature illustrates that individual-specific distributions of gaze patterns can be used to accurately identify individuals. In these studies, models were based on a simple parametric family of distributions. Such simple parametric models can be robustly estimated from sparse data, but have limited flexibility to capture the differences between individuals. Therefore, this thesis proposes a semiparametric model of gaze patterns that is flexible yet robust for individual identification. These patterns can be understood as domain knowledge derived from psychological literature. Fixations and saccades are examples of simple gaze patterns. The proposed semiparametric densities are drawn under a Gaussian process prior centered at a simple parametric distribution. Thus, the model will stay close to the parametric class of densities if little data is available, but it can also deviate from this class if enough data is available, increasing the flexibility of the model. The proposed method is evaluated on a large-scale dataset, showing significant improvements over the state-of-the-art. Later, the thesis replaces the model based on gaze patterns derived from psychological concepts with a deep neural network that can learn more informative and complex patterns from raw eye movement data. As previous work has shown that the distribution of these patterns across a sequence is informative, a novel statistical aggregation layer called the quantile layer is introduced. It explicitly fits the distribution of deep patterns learned directly from the raw eye movement data. The proposed deep learning approach is end-to-end learnable, such that the deep model learns to extract informative, short local patterns while the quantile layer learns to approximate the distributions of these patterns. Quantile layers are a generic approach that can converge to standard pooling layers or have a more detailed description of the features being pooled, depending on the problem. The proposed model is evaluated in a large-scale study using the eye movements of subjects viewing arbitrary visual input. The model improves upon the standard pooling layers and other statistical aggregation layers proposed in the literature. It also improves upon the state-of-the-art eye movement biometrics by a wide margin. Finally, for the model to identify any subject — not just the set of subjects it is trained on — a metric learning approach is developed. Metric learning learns a distance function over instances. The metric learning model maps the instances into a metric space, where sequences of the same individual are close, and sequences of different individuals are further apart. This thesis introduces a deep metric learning approach with distributional embeddings. The approach represents sequences as a set of continuous distributions in a metric space; to achieve this, a new loss function based on Wasserstein distances is introduced. The proposed method is evaluated on multiple domains besides eye movement biometrics. This approach outperforms the state of the art in deep metric learning in several domains while also outperforming the state of the art in eye movement biometrics.
Emotions are a central element of human experience. They occur with high frequency in everyday life and play an important role in decision making. However, currently there is no consensus among researchers on what constitutes an emotion and on how emotions should be investigated. This dissertation identifies three problems of current emotion research: the problem of ground truth, the problem of incomplete constructs and the problem of optimal representation. I argue for a focus on the detailed measurement of emotion manifestations with computer-aided methods to solve these problems. This approach is demonstrated in three research projects, which describe the development of methods specific to these problems as well as their application to concrete research questions.
The problem of ground truth describes the practice to presuppose a certain structure of emotions as the a priori ground truth. This determines the range of emotion descriptions and sets a standard for the correct assignment of these descriptions. The first project illustrates how this problem can be circumvented with a multidimensional emotion perception paradigm which stands in contrast to the emotion recognition paradigm typically employed in emotion research. This paradigm allows to calculate an objective difficulty measure and to collect subjective difficulty ratings for the perception of emotional stimuli. Moreover, it enables the use of an arbitrary number of emotion stimuli categories as compared to the commonly used six basic emotion categories. Accordingly, we collected data from 441 participants using dynamic facial expression stimuli from 40 emotion categories. Our findings suggest an increase in emotion perception difficulty with increasing actor age and provide evidence to suggest that young adults, the elderly and men underestimate their emotion perception difficulty. While these effects were predicted from the literature, we also found unexpected and novel results. In particular, the increased difficulty on the objective difficulty measure for female actors and observers stood in contrast to reported findings. Exploratory analyses revealed low relevance of person-specific variables for the prediction of emotion perception difficulty, but highlighted the importance of a general pleasure dimension for the ease of emotion perception.
The second project targets the problem of incomplete constructs which relates to vaguely defined psychological constructs on emotion with insufficient ties to tangible manifestations. The project exemplifies how a modern data collection method such as face tracking data can be used to sharpen these constructs on the example of arousal, a long-standing but fuzzy construct in emotion research. It describes how measures of distance, speed and magnitude of acceleration can be computed from face tracking data and investigates their intercorrelations. We find moderate to strong correlations among all measures of static information on one hand and all measures of dynamic information on the other. The project then investigates how self-rated arousal is tied to these measures in 401 neurotypical individuals and 19 individuals with autism. Distance to the neutral face was predictive of arousal ratings in both groups. Lower mean arousal ratings were found for the autistic group, but no difference in correlation of the measures and arousal ratings could be found between groups. Results were replicated in a high autistic traits group consisting of 41 participants. The findings suggest a qualitatively similar perception of arousal for individuals with and without autism. No correlations between valence ratings and any of the measures could be found which emphasizes the specificity of our tested measures for the construct of arousal.
The problem of optimal representation refers to the search for the best representation of emotions and the assumption that there is a one-fits-all solution. In the third project we introduce partial least squares analysis as a general method to find an optimal representation to relate two high-dimensional data sets to each other. The project demonstrates its applicability to emotion research on the question of emotion perception differences between men and women. The method was used with emotion rating data from 441 participants and face tracking data computed on 306 videos. We found quantitative as well as qualitative differences in the perception of emotional facial expressions between these groups. We showed that women’s emotional perception systematically captured more of the variance in facial expressions. Additionally, we could show that significant differences exist in the way that women and men perceive some facial expressions which could be visualized as concrete facial expression sequences. These expressions suggest differing perceptions of masked and ambiguous facial expressions between the sexes. In order to facilitate use of the developed method by the research community, a package for the statistical environment R was written. Furthermore, to call attention to the method and its usefulness for emotion research, a website was designed that allows users to explore a model of emotion ratings and facial expression data in an interactive fashion.
PLATON
(2019)
Lesson planning is both an important and demanding task—especially as part of teacher training. This paper presents the requirements for a lesson planning system and evaluates existing systems regarding these requirements. One major drawback of existing software tools is that most are limited to a text- or form-based representation of the lesson designs. In this article, a new approach with a graphical, time-based representation with (automatic) analyses methods is proposed and the system architecture and domain model are described in detail. The approach is implemented in an interactive, web-based prototype called PLATON, which additionally supports the management of lessons in units as well as the modelling of teacher and student-generated resources. The prototype was evaluated in a study with 61 prospective teachers (bachelor’s and master’s preservice teachers as well as teacher trainees in post-university teacher training) in Berlin, Germany, with a focus on usability. The results show that this approach proofed usable for lesson planning and offers positive effects for the perception of time and self-reflection.
In this thesis we introduce the concept of the degree of formality. It is directed against a dualistic point of view, which only distinguishes between formal and informal proofs. This dualistic attitude does not respect the differences between the argumentations classified as informal and it is unproductive because the individual potential of the respective argumentation styles cannot be appreciated and remains untapped.
This thesis has two parts. In the first of them we analyse the concept of the degree of formality (including a discussion about the respective benefits for each degree) while in the second we demonstrate its usefulness in three case studies. In the first case study we will repair Haskell B. Curry's view of mathematics, which incidentally is of great importance in the first part of this thesis, in light of the different degrees of formality. In the second case study we delineate how awareness of the different degrees of formality can be used to help students to learn how to prove. Third, we will show how the advantages of proofs of different degrees of formality can be combined by the development of so called tactics having a medium degree of formality. Together the three case studies show that the degrees of formality provide a convincing solution to the problem of untapped potential.
Physical computing covers the design and realization of interactive objects and installations and allows learners to develop concrete, tangible products of the real world, which arise from their imagination. This can be used in computer science education to provide learners with interesting and motivating access to the different topic areas of the subject in constructionist and creative learning environments. However, if at all, physical computing has so far mostly been taught in afternoon clubs or other extracurricular settings. Thus, for the majority of students so far there are no opportunities to design and create their own interactive objects in regular school lessons.
Despite its increasing popularity also for schools, the topic has not yet been clearly and sufficiently characterized in the context of computer science education. The aim of this doctoral thesis therefore is to clarify physical computing from the perspective of computer science education and to adequately prepare the topic both content-wise and methodologically for secondary school teaching. For this purpose, teaching examples, activities, materials and guidelines for classroom use are developed, implemented and evaluated in schools.
In the theoretical part of the thesis, first the topic is examined from a technical point of view. A structured literature analysis shows that basic concepts used in physical computing can be derived from embedded systems, which are the core of a large field of different application areas and disciplines. Typical methods of physical computing in professional settings are analyzed and, from an educational perspective, elements suitable for computer science teaching in secondary schools are extracted, e. g. tinkering and prototyping. The investigation and classification of suitable tools for school teaching show that microcontrollers and mini computers, often with extensions that greatly facilitate the handling of additional components, are particularly attractive tools for secondary education. Considering the perspectives of science, teachers, students and society, in addition to general design principles, exemplary teaching approaches for school education and suitable learning materials are developed and the design, production and evaluation of a physical computing construction kit suitable for teaching is described.
In the practical part of this thesis, with “My Interactive Garden”, an exemplary approach to integrate physical computing in computer science teaching is tested and evaluated in different courses and refined based on the findings in a design-based research approach. In a series of workshops on physical computing, which is based on a concept for constructionist professional development that is developed specifically for this purpose, teachers are empowered and encouraged to develop and conduct physical computing lessons suitable for their particular classroom settings. Based on their in-class experiences, a process model of physical computing teaching is derived. Interviews with those teachers illustrate that benefits of physical computing, including the tangibility of crafted objects and creativity in the classroom, outweigh possible drawbacks like longer preparation times, technical difficulties or difficult assessment. Hurdles in the classroom are identified and possible solutions discussed.
Empirical investigations in the different settings reveal that “My Interactive Garden” and physical computing in general have a positive impact, among others, on learner motivation, fun and interest in class and perceived competencies.
Finally, the results from all evaluations are combined to evaluate the design principles for physical computing teaching and to provide a perspective on the development of decision-making aids for physical computing activities in school education.
Answer Set Programming (ASP) is a declarative problem solving approach, combining a rich yet simple modeling language with high-performance solving capabilities. Although this has already resulted in various applications, certain aspects of such applications are more naturally modeled using variables over finite domains, for accounting for resources, fine timings, coordinates, or functions. Our goal is thus to extend ASP with constraints over integers while preserving its declarative nature. This allows for fast prototyping and elaboration tolerant problem descriptions of resource related applications. The resulting paradigm is called Constraint Answer Set Programming (CASP).
We present three different approaches for solving CASP problems. The first one, a lazy, modular approach combines an ASP solver with an external system for handling constraints. This approach has the advantage that two state of the art technologies work hand in hand to solve the problem, each concentrating on its part of the problem. The drawback is that inter-constraint dependencies cannot be communicated back to the ASP solver, impeding its learning algorithm. The second approach translates all constraints to ASP. Using the appropriate encoding techniques, this results in a very fast, monolithic system. Unfortunately, due to the large, explicit representation of constraints and variables, translation techniques are restricted to small and mid-sized domains. The third approach merges the lazy and the translational approach, combining the strength of both while removing their weaknesses. To this end, we enhance the dedicated learning techniques of an ASP solver with the inferences of the translating approach in a lazy way. That is, the important knowledge is only made explicit when needed.
By using state of the art techniques from neighboring fields, we provide ways to tackle real world, industrial size problems. By extending CASP to reactive solving, we open up new application areas such as online planning with continuous domains and durations.
Contemporary multi-core processors are parallel systems that also provide shared memory for programs running on them. Both the increasing number of cores in so-called many-core systems and the still growing computational power of the cores demand for memory systems that are able to deliver high bandwidths. Caches are essential components to satisfy this requirement. Nevertheless, hardware-based cache coherence in many-core chips faces practical limits to provide both coherence and high memory bandwidths. In addition, a shift away from global coherence can be observed. As a result, alternative architectures and suitable programming models need to be investigated.
This thesis focuses on fast communication for non-cache-coherent many-core architectures. Experiments are conducted on the Single-Chip Cloud Computer (SCC), a non-cache-coherent many-core processor with 48 mesh-connected cores. Although originally designed for message passing, the results of this thesis show that shared memory can be efficiently used for one-sided communication on this kind of architecture. One-sided communication enables data exchanges between processes where the receiver is not required to know the details of the performed communication. In the notion of the Message Passing Interface (MPI) standard, this type of communication allows to access memory of remote processes. In order to support this communication scheme on non-cache-coherent architectures, both an efficient process synchronization and a communication scheme with software-managed cache coherence are designed and investigated.
The process synchronization realizes the concept of the general active target synchronization scheme from the MPI standard. An existing classification of implementation approaches is extended and used to identify an appropriate class for the non-cache-coherent shared memory platform. Based on this classification, existing implementations are surveyed in order to find beneficial concepts, which are then used to design a lightweight synchronization protocol for the SCC that uses shared memory and uncached memory accesses. The proposed scheme is not prone to process skew and also enables direct communication as soon as both communication partners are ready. Experimental results show very good scaling properties and up to five times lower synchronization latency compared to a tuned message-based MPI implementation for the SCC.
For the communication, SCOSCo, a shared memory approach with software-managed cache coherence, is presented. According requirements for the coherence that fulfill MPI's separate memory model are formulated, and a lightweight implementation exploiting SCC hard- and software features is developed. Despite a discovered malfunction in the SCC's memory subsystem, the experimental evaluation of the design reveals up to five times better bandwidths and nearly four times lower latencies in micro-benchmarks compared to the SCC-tuned but message-based MPI library. For application benchmarks, like a parallel 3D fast Fourier transform, the runtime share of communication can be reduced by a factor of up to five. In addition, this thesis postulates beneficial hardware concepts that would support software-managed coherence for one-sided communication on future non-cache-coherent architectures where coherence might be only available in local subdomains but not on a global processor level.
Although it has become common practice to build applications based on the reuse of existing components or services, technical complexity and semantic challenges constitute barriers to ensuring a successful and wide reuse of components and services. In the geospatial application domain, the barriers are self-evident due to heterogeneous geographic data, a lack of interoperability and complex analysis processes.
Constructing workflows manually and discovering proper services and data that match user intents and preferences is difficult and time-consuming especially for users who are not trained in software development. Furthermore, considering the multi-objective nature of environmental modeling for the assessment of climate change impacts and the various types of geospatial data (e.g., formats, scales, and georeferencing systems) increases the complexity challenges.
Automatic service composition approaches that provide semantics-based assistance in the process of workflow design have proven to be a solution to overcome these challenges and have become a frequent demand especially by end users who are not IT experts. In this light, the major contributions of this thesis are:
(i) Simplification of service reuse and workflow design of applications for climate impact analysis by following the eXtreme Model-Driven Development (XMDD) paradigm.
(ii) Design of a semantic domain model for climate impact analysis applications that comprises specifically designed services, ontologies that provide domain-specific vocabulary for referring to types and services, and the input/output annotation of the services using the terms defined in the ontologies.
(iii) Application of a constraint-driven method for the automatic composition of workflows for analyzing the impacts of sea-level rise. The application scenario demonstrates the impact of domain modeling decisions on the results and the performance of the synthesis algorithm.
Services that operate over the Internet are under constant threat of being exposed to fraudulent use. Maintaining good user experience for legitimate users often requires the classification of entities as malicious or legitimate in order to initiate countermeasures. As an example, inbound email spam filters decide for spam or non-spam. They can base their decision on both the content of each email as well as on features that summarize prior emails received from the sending server. In general, discriminative classification methods learn to distinguish positive from negative entities. Each decision for a label may be based on features of the entity and related entities. When labels of related entities have strong interdependencies---as can be assumed e.g. for emails being delivered by the same user---classification decisions should not be made independently and dependencies should be modeled in the decision function. This thesis addresses the formulation of discriminative classification problems that are tailored for the specific demands of the following three Internet security applications. Theoretical and algorithmic solutions are devised to protect an email service against flooding of user inboxes, to mitigate abusive usage of outbound email servers, and to protect web servers against distributed denial of service attacks.
In the application of filtering an inbound email stream for unsolicited emails, utilizing features that go beyond each individual email's content can be valuable. Information about each sending mail server can be aggregated over time and may help in identifying unwanted emails. However, while this information will be available to the deployed email filter, some parts of the training data that are compiled by third party providers may not contain this information. The missing features have to be estimated at training time in order to learn a classification model. In this thesis an algorithm is derived that learns a decision function that integrates over a distribution of values for each missing entry. The distribution of missing values is a free parameter that is optimized to learn an optimal decision function.
The outbound stream of emails of an email service provider can be separated by the customer IDs that ask for delivery. All emails that are sent by the same ID in the same period of time are related, both in content and in label. Hijacked customer accounts may send batches of unsolicited emails to other email providers, which in turn might blacklist the sender's email servers after detection of incoming spam emails. The risk of being blocked from further delivery depends on the rate of outgoing unwanted emails and the duration of high spam sending rates. An optimization problem is developed that minimizes the expected cost for the email provider by learning a decision function that assigns a limit on the sending rate to customers based on the each customer's email stream.
Identifying attacking IPs during HTTP-level DDoS attacks allows to block those IPs from further accessing the web servers. DDoS attacks are usually carried out by infected clients that are members of the same botnet and show similar traffic patterns. HTTP-level attacks aim at exhausting one or more resources of the web server infrastructure, such as CPU time. If the joint set of attackers cannot increase resource usage close to the maximum capacity, no effect will be experienced by legitimate users of hosted web sites. However, if the additional load raises the computational burden towards the critical range, user experience will degrade until service may be unavailable altogether. As the loss of missing one attacker depends on block decisions for other attackers---if most other attackers are detected, not blocking one client will likely not be harmful---a structured output model has to be learned. In this thesis an algorithm is developed that learns a structured prediction decoder that searches the space of label assignments, guided by a policy.
Each model is evaluated on real-world data and is compared to reference methods. The results show that modeling each classification problem according to the specific demands of the task improves performance over solutions that do not consider the constraints inherent to an application.
Personal fabrication tools, such as 3D printers, are on the way of enabling a future in which non-technical users will be able to create custom objects. However, while the hardware is there, the current interaction model behind existing design tools is not suitable for non-technical users. Today, 3D printers are operated by fabricating the object in one go, which tends to take overnight due to the slow 3D printing technology. Consequently, the current interaction model requires users to think carefully before printing as every mistake may imply another overnight print. Planning every step ahead, however, is not feasible for non-technical users as they lack the experience to reason about the consequences of their design decisions.
In this dissertation, we propose changing the interaction model around personal fabrication tools to better serve this user group. We draw inspiration from personal computing and argue that the evolution of personal fabrication may resemble the evolution of personal computing: Computing started with machines that executed a program in one go before returning the result to the user. By decreasing the interaction unit to single requests, turn-taking systems such as the command line evolved, which provided users with feedback after every input. Finally, with the introduction of direct-manipulation interfaces, users continuously interacted with a program receiving feedback about every action in real-time. In this dissertation, we explore whether these interaction concepts can be applied to personal fabrication as well.
We start with fabricating an object in one go and investigate how to tighten the feedback-cycle on an object-level: We contribute a method called low-fidelity fabrication, which saves up to 90% fabrication time by creating objects as fast low-fidelity previews, which are sufficient to evaluate key design aspects. Depending on what is currently being tested, we propose different conversions that enable users to focus on different parts: faBrickator allows for a modular design in the early stages of prototyping; when users move on WirePrint allows quickly testing an object's shape, while Platener allows testing an object's technical function. We present an interactive editor for each technique and explain the underlying conversion algorithms.
By interacting on smaller units, such as a single element of an object, we explore what it means to transition from systems that fabricate objects in one go to turn-taking systems. We start with a 2D system called constructable: Users draw with a laser pointer onto the workpiece inside a laser cutter. The drawing is captured with an overhead camera. As soon as the the user finishes drawing an element, such as a line, the constructable system beautifies the path and cuts it--resulting in physical output after every editing step. We extend constructable towards 3D editing by developing a novel laser-cutting technique for 3D objects called LaserOrigami that works by heating up the workpiece with the defocused laser until the material becomes compliant and bends down under gravity. While constructable and LaserOrigami allow for fast physical feedback, the interaction is still best described as turn-taking since it consists of two discrete steps: users first create an input and afterwards the system provides physical output.
By decreasing the interaction unit even further to a single feature, we can achieve real-time physical feedback: Input by the user and output by the fabrication device are so tightly coupled that no visible lag exists. This allows us to explore what it means to transition from turn-taking interfaces, which only allow exploring one option at a time, to direct manipulation interfaces with real-time physical feedback, which allow users to explore the entire space of options continuously with a single interaction. We present a system called FormFab, which allows for such direct control. FormFab is based on the same principle as LaserOrigami: It uses a workpiece that when warmed up becomes compliant and can be reshaped. However, FormFab achieves the reshaping not based on gravity, but through a pneumatic system that users can control interactively. As users interact, they see the shape change in real-time.
We conclude this dissertation by extrapolating the current evolution into a future in which large numbers of people use the new technology to create objects. We see two additional challenges on the horizon: sustainability and intellectual property. We investigate sustainability by demonstrating how to print less and instead patch physical objects. We explore questions around intellectual property with a system called Scotty that transfers objects without creating duplicates, thereby preserving the designer's copyright.
Computer Security deals with the detection and mitigation of threats to computer networks, data, and computing hardware. This
thesis addresses the following two computer security problems: email spam campaign and malware detection.
Email spam campaigns can easily be generated using popular dissemination tools by specifying simple grammars that serve as message templates. A grammar is disseminated to nodes of a bot net, the nodes create messages by instantiating the grammar at random. Email spam campaigns can encompass huge data volumes and therefore pose a threat to the stability of the infrastructure of email service providers that have to store them. Malware -software that serves a malicious purpose- is affecting web servers, client computers via active content, and client computers through executable files. Without the help of malware detection systems it would be easy for malware creators to collect sensitive information or to infiltrate computers.
The detection of threats -such as email-spam messages, phishing messages, or malware- is an adversarial and therefore intrinsically
difficult problem. Threats vary greatly and evolve over time. The detection of threats based on manually-designed rules is therefore
difficult and requires a constant engineering effort. Machine-learning is a research area that revolves around the analysis of data and the discovery of patterns that describe aspects of the data. Discriminative learning methods extract prediction models from data that are optimized to predict a target attribute as accurately as possible. Machine-learning methods hold the promise of automatically identifying patterns that robustly and accurately detect threats. This thesis focuses on the design and analysis of discriminative learning methods for the two computer-security problems under investigation: email-campaign and malware detection.
The first part of this thesis addresses email-campaign detection. We focus on regular expressions as a syntactic framework, because regular expressions are intuitively comprehensible by security engineers and administrators, and they can be applied as a detection mechanism in an extremely efficient manner. In this setting, a prediction model is provided with exemplary messages from an email-spam campaign. The prediction model has to generate a regular expression that reveals the syntactic pattern that underlies the entire campaign, and that a security engineers finds comprehensible and feels confident enough to use the expression to blacklist further messages at the email server. We model this problem as two-stage learning problem with structured input and output spaces which can be solved using standard cutting plane methods. Therefore we develop an appropriate loss function, and derive a decoder for the resulting optimization problem.
The second part of this thesis deals with the problem of predicting whether a given JavaScript or PHP file is malicious or benign. Recent malware analysis techniques use static or dynamic features, or both. In fully dynamic analysis, the software or script is executed and observed for malicious behavior in a sandbox environment. By contrast, static analysis is based on features that can be extracted directly from the program file. In order to bypass static detection mechanisms, code obfuscation techniques are used to spread a malicious program file in many different syntactic variants. Deobfuscating the code before applying a static classifier can be subjected to mostly static code analysis and can overcome the problem of obfuscated malicious code, but on the other hand increases the computational costs of malware detection by an order of magnitude. In this thesis we present a cascaded architecture in which a classifier first performs a static analysis of the original code and -based on the outcome of this first classification step- the code may be deobfuscated and classified again. We explore several types of features including token $n$-grams, orthogonal sparse bigrams, subroutine-hashings, and syntax-tree features and study the robustness of detection methods and feature types against the evolution of malware over time. The developed tool scans very large file collections quickly and accurately.
Each model is evaluated on real-world data and compared to reference methods. Our approach of inferring regular expressions to filter emails belonging to an email spam campaigns leads to models with a high true-positive rate at a very low false-positive rate that is an order of magnitude lower than that of a commercial content-based filter. Our presented system -REx-SVMshort- is being used by a commercial email service provider and complements content-based and IP-address based filtering.
Our cascaded malware detection system is evaluated on a high-quality data set of almost 400,000 conspicuous PHP files and a collection of more than 1,00,000 JavaScript files. From our case study we can conclude that our system can quickly and accurately process large data collections at a low false-positive rate.
Geospatial data has become a natural part of a growing number of information systems and services in the economy, society, and people's personal lives. In particular, virtual 3D city and landscape models constitute valuable information sources within a wide variety of applications such as urban planning, navigation, tourist information, and disaster management. Today, these models are often visualized in detail to provide realistic imagery. However, a photorealistic rendering does not automatically lead to high image quality, with respect to an effective information transfer, which requires important or prioritized information to be interactively highlighted in a context-dependent manner.
Approaches in non-photorealistic renderings particularly consider a user's task and camera perspective when attempting optimal expression, recognition, and communication of important or prioritized information. However, the design and implementation of non-photorealistic rendering techniques for 3D geospatial data pose a number of challenges, especially when inherently complex geometry, appearance, and thematic data must be processed interactively. Hence, a promising technical foundation is established by the programmable and parallel computing architecture of graphics processing units.
This thesis proposes non-photorealistic rendering techniques that enable both the computation and selection of the abstraction level of 3D geospatial model contents according to user interaction and dynamically changing thematic information. To achieve this goal, the techniques integrate with hardware-accelerated rendering pipelines using shader technologies of graphics processing units for real-time image synthesis. The techniques employ principles of artistic rendering, cartographic generalization, and 3D semiotics—unlike photorealistic rendering—to synthesize illustrative renditions of geospatial feature type entities such as water surfaces, buildings, and infrastructure networks. In addition, this thesis contributes a generic system that enables to integrate different graphic styles—photorealistic and non-photorealistic—and provide their seamless transition according to user tasks, camera view, and image resolution.
Evaluations of the proposed techniques have demonstrated their significance to the field of geospatial information visualization including topics such as spatial perception, cognition, and mapping. In addition, the applications in illustrative and focus+context visualization have reflected their potential impact on optimizing the information transfer regarding factors such as cognitive load, integration of non-realistic information, visualization of uncertainty, and visualization on small displays.
The main objective of this dissertation is to analyse prerequisites, expectations, apprehensions, and attitudes of students studying computer science, who are willing to gain a bachelor degree. The research will also investigate in the students’ learning style according to the Felder-Silverman model. These investigations fall in the attempt to make an impact on reducing the “dropout”/shrinkage rate among students, and to suggest a better learning environment.
The first investigation starts with a survey that has been made at the computer science department at the University of Baghdad to investigate the attitudes of computer science students in an environment dominated by women, showing the differences in attitudes between male and female students in different study years. Students are accepted to university studies via a centrally controlled admission procedure depending mainly on their final score at school. This leads to a high percentage of students studying subjects they do not want. Our analysis shows that 75% of the female students do not regret studying computer science although it was not their first choice. And according to statistics over previous years, women manage to succeed in their study and often graduate on top of their class. We finish with a comparison of attitudes between the freshman students of two different cultures and two different university enrolment procedures (University of Baghdad, in Iraq, and the University of Potsdam, in Germany) both with opposite gender majority.
The second step of investigation took place at the department of computer science at the University of Potsdam in Germany and analyzes the learning styles of students studying the three major fields of study offered by the department (computer science, business informatics, and computer science teaching). Investigating the differences in learning styles between the students of those study fields who usually take some joint courses is important to be aware of which changes are necessary to be adopted in the teaching methods to address those different students. It was a two stage study using two questionnaires; the main one is based on the Index of Learning Styles Questionnaire of B. A. Solomon and R. M. Felder, and the second questionnaire was an investigation on the students’ attitudes towards the findings of their personal first questionnaire. Our analysis shows differences in the preferences of learning style between male and female students of the different study fields, as well as differences between students with the different specialties (computer science, business informatics, and computer science teaching).
The third investigation looks closely into the difficulties, issues, apprehensions and expectations of freshman students studying computer science. The study took place at the computer science department at the University of Potsdam with a volunteer sample of students. The goal is to determine and discuss the difficulties and issues that they are facing in their study that may lead them to think in dropping-out, changing the study field, or changing the university. The research continued with the same sample of students (with business informatics students being the majority) through more than three semesters. Difficulties and issues during the study were documented, as well as students’ attitudes, apprehensions, and expectations. Some of the professors and lecturers opinions and solutions to some students’ problems were also documented. Many participants had apprehensions and difficulties, especially towards informatics subjects. Some business informatics participants began to think of changing the university, in particular when they reached their third semester, others thought about changing their field of study. Till the end of this research, most of the participants continued in their studies (the study they have started with or the new study they have changed to) without leaving the higher education system.
A lot has been published about the competencies needed by
students in the 21st century (Ravenscroft et al., 2012). However, equally
important are the competencies needed by educators in the new era
of digital education. We review the key competencies for educators in
light of the new methods of teaching and learning proposed by Massive
Open Online Courses (MOOCs) and their on-campus counterparts,
Small Private Online Courses (SPOCs).
Participants of this workshop will be confronted exemplarily
with a considerable inconsistency of global Informatics education at
lower secondary level. More importantly, they are invited to contribute
actively on this issue in form of short case studies of their countries.
Until now, very few countries have been successful in implementing
Informatics or Computing at primary and lower secondary level. The
spectrum from digital literacy to informatics, particularly as a discipline
in its own right, has not really achieved a breakthrough and seems to
be underrepresented for these age groups. The goal of this workshop
is not only to discuss the anamnesis and diagnosis of this fragmented
field, but also to discuss and suggest viable forms of therapy in form of
setting educational standards. Making visible good practices in some
countries and comparing successful approaches are rewarding tasks for
this workshop.
Discussing and defining common educational standards on a transcontinental
level for the age group of 14 to 15 years old students in a readable,
assessable and acceptable form should keep the participants of this
workshop active beyond the limited time at the workshop.
Let’s talk about CS!
(2015)
To communicate about a science is the most important key
competence in education for any science. Without communication we
cannot teach, so teachers should reflect about the language they use in
class properly. But the language students and teachers use to communicate
about their CS courses is very heterogeneous, inconsistent and
deeply influenced by tool names. There is a big lack of research and
discussion in CS education regarding the terminology and the role of
concepts and tools in our science. We don’t have a consistent set of
terminology that we agree on to be helpful for learning our science.
This makes it nearly impossible to do research on CS competencies as
long as we have not agreed on the names we use to describe these. This
workshop intends to provide room to fill with discussion and first ideas
for future research in this field.
ProtoSense
(2015)
The poster and abstract describe the importance of teaching
information security in school. After a short description of information
security and important aspects, I will show, how information security
fits into different guidelines or models for computer science educations
and that it is therefore on of the key competencies. Afterwards I will
present you a rough insight of teaching information security in Austria.
Current curricular trends require teachers in Baden-
Wuerttemberg (Germany) to integrate Computer Science (CS) into
traditional subjects, such as Physical Science. However, concrete guidelines
are missing. To fill this gap, we outline an approach where a
microcontroller is used to perform and evaluate measurements in the
Physical Science classroom.
Using the open-source Arduino platform, we expect students to acquire
and develop both CS and Physical Science competencies by using a
self-programmed microcontroller. In addition to this combined development
of competencies in Physical Science and CS, the subject matter
will be embedded in suitable contexts and learning environments,
such as weather and climate.
Think logarithmically!
(2015)
We discuss here a number of algorithmic topics which we
use in our teaching and in learning of mathematics and informatics to
illustrate and document the power of logarithm in designing very efficient
algorithms and computations – logarithmic thinking is one of the
most important key competencies for solving real world practical problems.
We demonstrate also how to introduce logarithm independently
of mathematical formalism using a conceptual model for reducing a
problem size by at least half. It is quite surprising that the idea, which
leads to logarithm, is present in Euclid’s algorithm described almost
2000 years before John Napier invented logarithm.
A project involving the composition of a number of pieces
of music by public participants revealed levels of engagement with and
mastery of complex music technologies by a number of secondary student
volunteers. This paper reports briefly on some initial findings of
that project and seeks to illuminate an understanding of computational
thinking across the curriculum.
Physical computing covers the design and realization of interactive
objects and installations and allows students to develop concrete,
tangible products of the real world that arise from the learners’
imagination. This way, constructionist learning is raised to a level that
enables students to gain haptic experience and thereby concretizes the
virtual. In this paper the defining characteristics of physical computing
are described. Key competences to be gained with physical computing
will be identified.
Mentoring in a Digital World
(2015)
This paper focuses on the results of the evaluation of the first
pilot of an e-mentoring unit designed by the Hands-On ICT consortium,
funded by the EU LLL programme. The overall aim of this two-year
activity is to investigate the value for professional learning of Massive
Online Open Courses (MOOCs) and Community Online Open Courses
(COOCs) in the context of a ‘community of practice’. Three units in the
first pilot covered aspects of using digital technologies to develop creative
thinking skills. The findings in this paper relate to the fourth unit
about e-mentoring, a skill that was important to delivering the course
content in the other three units. Findings about the e-mentoring unit
included: the students’ request for detailed profiles so that participants
can get to know each other; and, the need to reconcile the different
interpretations of e-mentoring held by the participants when the course
begins. The evaluators concluded that the major issues were that: not all
professional learners would self-organise and network; and few would
wish to mentor their colleagues voluntarily. Therefore, the e-mentoring
issues will need careful consideration in pilots two and three to identify
how e-mentoring will be organised.
The study reported in this paper involved the employment
of specific in-class exercises using a Personal Response System (PRS).
These exercises were designed with two goals: to enhance students’
capabilities of tracing a given code and of explaining a given code in
natural language with some abstraction. The paper presents evidence
from the actual use of the PRS along with students’ subjective impressions
regarding both the use of the PRS and the special exercises. The
conclusions from the findings are followed with a short discussion on
benefits of PRS-based mental processing exercises for learning programming
and beyond.
In this paper we describe the recent state of our research
project concerning computer science teachers’ knowledge on students’
cognition. We did a comprehensive analysis of textbooks, curricula
and other resources, which give teachers guidance to formulate assignments.
In comparison to other subjects there are only a few concepts
and strategies taught to prospective computer science teachers in university.
We summarize them and given an overview on our empirical
approach to measure this knowledge.
How does the Implementation of a Literacy Learning Tool Kit influence Literacy Skill Acquisition?
(2015)
This study aimed at following how teachers transfer skills
into results while using ABRA literacy software. This was done in
the second part of the pilot study whose aim was to provide equity to
control group teachers and students by exposing them to the ABRACADABRA
treatment after the end of phase 1. This opportunity was
used to follow the phase 1 teachers to see how the skills learned were
being transformed into results. A standard three-day initial training and
planning session on how to use ABRA to teach literacy was held at the
beginning of each phase for ABRA teachers (phase 1 experimental and
phase 2 delayed ABRA). Teachers were provided with teaching materials
including a tentative ABRA curriculum developed to align with the
Kenyan English Language requirements for year 1 and 3 students. Results
showed that although there was no significant difference between
the groups in vocabulary-related subscales which include word reading
and meaning as well as sentence comprehension, students in ABRACADABRA
classes improved their scores at a significantly higher rate
than students in control classes in comprehension related scores. An
average student in the ABRACADABRA group improved by 12 and
16 percentile points respectively compared to their counterparts in the
control group.
The Technology Proficiency Self-Assessment (TPSA) questionnaire
has been used for 15 years in the USA and other nations as a
self-efficacy measure for proficiencies fundamental to effective technology
integration in the classroom learning environment. Internal consistency
reliabilities for each of the five-item scales have typically ranged
from .73 to .88 for preservice or inservice technology-using teachers.
Due to changing technologies used in education, researchers sought to
renovate partially obsolete items and extend self-efficacy assessment to
new areas, such as social media and mobile learning. Analysis of 2014
data gathered on a new, 34 item version of the TPSA indicates that the
four established areas of email, World Wide Web (WWW), integrated
applications, and teaching with technology continue to form consistent
scales with reliabilities ranging from .81 to .93, while the 14 new items
gathered to represent emerging technologies and media separate into
two scales, each with internal consistency reliabilities greater than .9.
The renovated TPSA is deemed to be worthy of continued use in the
teaching with technology context.
Computational Thinking
(2015)
Digital technology has radically changed the way people
work in industry, finance, services, media and commerce. Informatics
has contributed to the scientific and technological development of our
society in general and to the digital revolution in particular. Computational
thinking is the term indicating the key ideas of this discipline that
might be included in the key competencies underlying the curriculum
of compulsory education. The educational potential of informatics has
a history dating back to the sixties. In this article, we briefly revisit this
history looking for lessons learned. In particular, we focus on experiences
of teaching and learning programming. However, computational
thinking is more than coding. It is a way of thinking and practicing interactive
dynamic modeling with computers. We advocate that learners
can practice computational thinking in playful contexts where they can
develop personal projects, for example building videogames and/or robots,
share and discuss their construction with others. In our view, this
approach allows an integration of computational thinking in the K-12
curriculum across disciplines.
How Things Work
(2015)
Recognizing and defining functionality is a key competence
adopted in all kinds of programming projects. This study investigates
how far students without specific informatics training are able to identify
and verbalize functions and parameters. It presents observations
from classroom activities on functional modeling in high school chemistry
lessons with altogether 154 students. Finally it discusses the potential
of functional modelling to improve the comprehension of scientific
content.
This paper originated from discussions about the need for
important changes in the curriculum for Computing including two focus
group meetings at IFIP conferences over the last two years. The
paper examines how recent developments in curriculum, together with
insights from curriculum thinking in other subject areas, especially mathematics
and science, can inform curriculum design for Computing.
The analysis presented in the paper provides insights into the complexity
of curriculum design as well as identifying important constraints and
considerations for the ongoing development of a vision and framework
for a Computing curriculum.