004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (80)
- Doctoral Thesis (62)
- Postprint (6)
- Conference Proceeding (5)
- Master's Thesis (3)
- Other (3)
- Bachelor Thesis (1)
- Monograph/Edited Volume (1)
- Habilitation Thesis (1)
- Preprint (1)
Language
- English (163) (remove)
Keywords
- Machine Learning (7)
- Maschinelles Lernen (7)
- Computer Science Education (5)
- answer set programming (5)
- Antwortmengenprogrammierung (4)
- Answer Set Programming (3)
- Competence Measurement (3)
- DPLL (3)
- Optimization (3)
- Secondary Education (3)
Institute
- Institut für Informatik und Computational Science (163) (remove)
Answer Set Programming faces an increasing popularity for problem solving in various domains. While its modeling language allows us to express many complex problems in an easy way, its solving technology enables their effective resolution. In what follows, we detail some of the key factors of its success. Answer Set Programming [ASP; Brewka et al. Commun ACM 54(12):92–103, (2011)] is seeing a rapid proliferation in academia and industry due to its easy and flexible way to model and solve knowledge-intense combinatorial (optimization) problems. To this end, ASP offers a high-level modeling language paired with high-performance solving technology. As a result, ASP systems provide out-off-the-box, general-purpose search engines that allow for enumerating (optimal) solutions. They are represented as answer sets, each being a set of atoms representing a solution. The declarative approach of ASP allows a user to concentrate on a problem’s specification rather than the computational means to solve it. This makes ASP a prime candidate for rapid prototyping and an attractive tool for teaching key AI techniques since complex problems can be expressed in a succinct and elaboration tolerant way. This is eased by the tuning of ASP’s modeling language to knowledge representation and reasoning (KRR). The resulting impact is nicely reflected by a growing range of successful applications of ASP [Erdem et al. AI Mag 37(3):53–68, 2016; Falkner et al. Industrial applications of answer set programming. K++nstliche Intelligenz (2018)]
The field of machine learning studies algorithms that infer predictive models from data. Predictive models are applicable for many practical tasks such as spam filtering, face and handwritten digit recognition, and personalized product recommendation. In general, they are used to predict a target label for a given data instance. In order to make an informed decision about the deployment of a predictive model, it is crucial to know the model’s approximate performance. To evaluate performance, a set of labeled test instances is required that is drawn from the distribution the model will be exposed to at application time. In many practical scenarios, unlabeled test instances are readily available, but the process of labeling them can be a time- and cost-intensive task and may involve a human expert. This thesis addresses the problem of evaluating a given predictive model accurately with minimal labeling effort. We study an active model evaluation process that selects certain instances of the data according to an instrumental sampling distribution and queries their labels. We derive sampling distributions that minimize estimation error with respect to different performance measures such as error rate, mean squared error, and F-measures. An analysis of the distribution that governs the estimator leads to confidence intervals, which indicate how precise the error estimation is. Labeling costs may vary across different instances depending on certain characteristics of the data. For instance, documents differ in their length, comprehensibility, and technical requirements; these attributes affect the time a human labeler needs to judge relevance or to assign topics. To address this, the sampling distribution is extended to incorporate instance-specific costs. We empirically study conditions under which the active evaluation processes are more accurate than a standard estimate that draws equally many instances from the test distribution. We also address the problem of comparing the risks of two predictive models. The standard approach would be to draw instances according to the test distribution, label the selected instances, and apply statistical tests to identify significant differences. Drawing instances according to an instrumental distribution affects the power of a statistical test. We derive a sampling procedure that maximizes test power when used to select instances, and thereby minimizes the likelihood of choosing the inferior model. Furthermore, we investigate the task of comparing several alternative models; the objective of an evaluation could be to rank the models according to the risk that they incur or to identify the model with lowest risk. An experimental study shows that the active procedure leads to higher test power than the standard test in many application domains. Finally, we study the problem of evaluating the performance of ranking functions, which are used for example for web search. In practice, ranking performance is estimated by applying a given ranking model to a representative set of test queries and manually assessing the relevance of all retrieved items for each query. We apply the concepts of active evaluation and active comparison to ranking functions and derive optimal sampling distributions for the commonly used performance measures Discounted Cumulative Gain and Expected Reciprocal Rank. Experiments on web search engine data illustrate significant reductions in labeling costs.
Software-as-a-Service (SaaS) offers several advantages to both service providers and users. Service providers can benefit from the reduction of Total Cost of Ownership (TCO), better scalability, and better resource utilization. On the other hand, users can use the service anywhere and anytime, and minimize upfront investment by following the pay-as-you-go model. Despite the benefits of SaaS, users still have concerns about the security and privacy of their data. Due to the nature of SaaS and the Cloud in general, the data and the computation are beyond the users' control, and hence data security becomes a vital factor in this new paradigm. Furthermore, in multi-tenant SaaS applications, the tenants become more concerned about the confidentiality of their data since several tenants are co-located onto a shared infrastructure.
To address those concerns, we start protecting the data from the provisioning process by controlling how tenants are being placed in the infrastructure. We present a resource allocation algorithm designed to minimize the risk of co-resident tenants called SecPlace. It enables the SaaS provider to control the resource (i.e., database instance) allocation process while taking into account the security of tenants as a requirement.
Due to the design principles of the multi-tenancy model, tenants follow some degree of sharing on both application and infrastructure levels. Thus, strong security-isolation should be present. Therefore, we develop SignedQuery, a technique that prevents one tenant from accessing others' data. We use the Signing Concept to create a signature that is used to sign the tenant's request, then the server can verifies the signature and recognizes the requesting tenant, and hence ensures that the data to be accessed is belonging to the legitimate tenant.
Finally, Data confidentiality remains a critical concern due to the fact that data in the Cloud is out of users' premises, and hence beyond their control. Cryptography is increasingly proposed as a potential approach to address such a challenge. Therefore, we present SecureDB, a system designed to run SQL-based applications over an encrypted database. SecureDB captures the schema design and analyzes it to understand the internal structure of the data (i.e., relationships between the tables and their attributes). Moreover, we determine the appropriate partialhomomorphic encryption scheme for each attribute where computation is possible even when the data is encrypted.
To evaluate our work, we conduct extensive experiments with di↵erent settings. The main use case in our work is a popular open source HRM application, called OrangeHRM. The results show that our multi-layered approach is practical, provides enhanced security and isolation among tenants, and have a moderate complexity in terms of processing encrypted data.
This article shows a discussion about the key competencies
in informatics and ICT viewed from a philosophical foundation presented
by Martha Nussbaum, which is known as ‘ten central capabilities’.
Firstly, the outline of ‘The Capability Approach’, which has been presented
by Amartya Sen and Nussbaum as a theoretical framework of
assessing the state of social welfare, will be explained. Secondly, the
body of Nussbaum’s ten central capabilities and the reason for being
applied as the basis of discussion will be shown. Thirdly, the relationship
between the concept of ‘capability’ and ‘competency’ is to be
discussed. After that, the author’s assumption of the key competencies
in informatics and ICT led from the examination of Nussbaum’s ten
capabilities will be presented.
A project involving the composition of a number of pieces
of music by public participants revealed levels of engagement with and
mastery of complex music technologies by a number of secondary student
volunteers. This paper reports briefly on some initial findings of
that project and seeks to illuminate an understanding of computational
thinking across the curriculum.
This paper describes the implementation of a workflow model for service-oriented computing of potential areas for wind turbines in jABC. By implementing a re-executable model the manual effort of a multi-criteria site analysis can be reduced. The aim is to determine the shift of typical geoprocessing tools of geographic information systems (GIS) from the desktop to the web. The analysis is based on a vector data set and mainly uses web services of the “Center for Spatial Information Science and Systems” (CSISS). This paper discusses effort, benefits and problems associated with the use of the web services.
The protein classification workflow described in this report enables users to get information about a novel protein sequence automatically. The information is derived by different bioinformatic analysis tools which calculate or predict features of a protein sequence. Also, databases are used to compare the novel sequence with known proteins.
In this bachelor’s thesis I implement the automatic theorem prover nanoCoP-Ω. This system is the result of porting arithmetic and equality handling procedures first introduced in the automatic theorem prover with arithmetic leanCoP-Ω into the similar system nanoCoP 2.0. To understand these procedures, I first introduce the mathematical background to both automatic theorem proving and arithmetic expressions. I present the predecessor projects leanCoP, nanoCoP and leanCoP-Ω, out of which nanCoP-Ω was developed. This is followed by an extensive description of the concepts the non-clausal connection calculus needed to be extended by, to allow for proving arithmetic expressions and equalities, as well as of their implementation into nanoCoP-Ω. An extensive comparison between both the runtimes and the number of solved problems of the systems nanoCoP-Ω and leanCoP-Ω was made. I come to the conclusion, that nanoCoP-Ω is considerably faster than leanCoP-Ω for small problems, though less well suited for larger problems. Additionally, I was able to construct a non-theorem that nanoCoP-Ω generates a false proof for. I discuss how this pressing issue could be resolved, as well as some possible optimizations and expansions of the system.
Physical computing covers the design and realization of interactive
objects and installations and allows students to develop concrete,
tangible products of the real world that arise from the learners’
imagination. This way, constructionist learning is raised to a level that
enables students to gain haptic experience and thereby concretizes the
virtual. In this paper the defining characteristics of physical computing
are described. Key competences to be gained with physical computing
will be identified.
This work introduces novel internal and external memory algorithms for computing voxel skeletons of massive voxel objects with complex network-like architecture and for converting these voxel skeletons to piecewise linear geometry, that is triangle meshes and piecewise straight lines. The presented techniques help to tackle the challenge of visualizing and analyzing 3d images of increasing size and complexity, which are becoming more and more important in, for example, biological and medical research. Section 2.3.1 contributes to the theoretical foundations of thinning algorithms with a discussion of homotopic thinning in the grid cell model. The grid cell model explicitly represents a cell complex built of faces, edges, and vertices shared between voxels. A characterization of pairs of cells to be deleted is much simpler than characterizations of simple voxels were before. The grid cell model resolves topologically unclear voxel configurations at junctions and locked voxel configurations causing, for example, interior voxels in sets of non-simple voxels. A general conclusion is that the grid cell model is superior to indecomposable voxels for algorithms that need detailed control of topology. Section 2.3.2 introduces a noise-insensitive measure based on the geodesic distance along the boundary to compute two-dimensional skeletons. The measure is able to retain thin object structures if they are geometrically important while ignoring noise on the object's boundary. This combination of properties is not known of other measures. The measure is also used to guide erosion in a thinning process from the boundary towards lines centered within plate-like structures. Geodesic distance based quantities seem to be well suited to robustly identify one- and two-dimensional skeletons. Chapter 6 applies the method to visualization of bone micro-architecture. Chapter 3 describes a novel geometry generation scheme for representing voxel skeletons, which retracts voxel skeletons to piecewise linear geometry per dual cube. The generated triangle meshes and graphs provide a link to geometry processing and efficient rendering of voxel skeletons. The scheme creates non-closed surfaces with boundaries, which contain fewer triangles than a representation of voxel skeletons using closed surfaces like small cubes or iso-surfaces. A conclusion is that thinking specifically about voxel skeleton configurations instead of generic voxel configurations helps to deal with the topological implications. The geometry generation is one foundation of the applications presented in Chapter 6. Chapter 5 presents a novel external memory algorithm for distance ordered homotopic thinning. The presented method extends known algorithms for computing chamfer distance transformations and thinning to execute I/O-efficiently when input is larger than the available main memory. The applied block-wise decomposition schemes are quite simple. Yet it was necessary to carefully analyze effects of block boundaries to devise globally correct external memory variants of known algorithms. In general, doing so is superior to naive block-wise processing ignoring boundary effects. Chapter 6 applies the algorithms in a novel method based on confocal microscopy for quantitative study of micro-vascular networks in the field of microcirculation.
Mentoring in a Digital World
(2015)
This paper focuses on the results of the evaluation of the first
pilot of an e-mentoring unit designed by the Hands-On ICT consortium,
funded by the EU LLL programme. The overall aim of this two-year
activity is to investigate the value for professional learning of Massive
Online Open Courses (MOOCs) and Community Online Open Courses
(COOCs) in the context of a ‘community of practice’. Three units in the
first pilot covered aspects of using digital technologies to develop creative
thinking skills. The findings in this paper relate to the fourth unit
about e-mentoring, a skill that was important to delivering the course
content in the other three units. Findings about the e-mentoring unit
included: the students’ request for detailed profiles so that participants
can get to know each other; and, the need to reconcile the different
interpretations of e-mentoring held by the participants when the course
begins. The evaluators concluded that the major issues were that: not all
professional learners would self-organise and network; and few would
wish to mentor their colleagues voluntarily. Therefore, the e-mentoring
issues will need careful consideration in pilots two and three to identify
how e-mentoring will be organised.
Computer Security deals with the detection and mitigation of threats to computer networks, data, and computing hardware. This
thesis addresses the following two computer security problems: email spam campaign and malware detection.
Email spam campaigns can easily be generated using popular dissemination tools by specifying simple grammars that serve as message templates. A grammar is disseminated to nodes of a bot net, the nodes create messages by instantiating the grammar at random. Email spam campaigns can encompass huge data volumes and therefore pose a threat to the stability of the infrastructure of email service providers that have to store them. Malware -software that serves a malicious purpose- is affecting web servers, client computers via active content, and client computers through executable files. Without the help of malware detection systems it would be easy for malware creators to collect sensitive information or to infiltrate computers.
The detection of threats -such as email-spam messages, phishing messages, or malware- is an adversarial and therefore intrinsically
difficult problem. Threats vary greatly and evolve over time. The detection of threats based on manually-designed rules is therefore
difficult and requires a constant engineering effort. Machine-learning is a research area that revolves around the analysis of data and the discovery of patterns that describe aspects of the data. Discriminative learning methods extract prediction models from data that are optimized to predict a target attribute as accurately as possible. Machine-learning methods hold the promise of automatically identifying patterns that robustly and accurately detect threats. This thesis focuses on the design and analysis of discriminative learning methods for the two computer-security problems under investigation: email-campaign and malware detection.
The first part of this thesis addresses email-campaign detection. We focus on regular expressions as a syntactic framework, because regular expressions are intuitively comprehensible by security engineers and administrators, and they can be applied as a detection mechanism in an extremely efficient manner. In this setting, a prediction model is provided with exemplary messages from an email-spam campaign. The prediction model has to generate a regular expression that reveals the syntactic pattern that underlies the entire campaign, and that a security engineers finds comprehensible and feels confident enough to use the expression to blacklist further messages at the email server. We model this problem as two-stage learning problem with structured input and output spaces which can be solved using standard cutting plane methods. Therefore we develop an appropriate loss function, and derive a decoder for the resulting optimization problem.
The second part of this thesis deals with the problem of predicting whether a given JavaScript or PHP file is malicious or benign. Recent malware analysis techniques use static or dynamic features, or both. In fully dynamic analysis, the software or script is executed and observed for malicious behavior in a sandbox environment. By contrast, static analysis is based on features that can be extracted directly from the program file. In order to bypass static detection mechanisms, code obfuscation techniques are used to spread a malicious program file in many different syntactic variants. Deobfuscating the code before applying a static classifier can be subjected to mostly static code analysis and can overcome the problem of obfuscated malicious code, but on the other hand increases the computational costs of malware detection by an order of magnitude. In this thesis we present a cascaded architecture in which a classifier first performs a static analysis of the original code and -based on the outcome of this first classification step- the code may be deobfuscated and classified again. We explore several types of features including token $n$-grams, orthogonal sparse bigrams, subroutine-hashings, and syntax-tree features and study the robustness of detection methods and feature types against the evolution of malware over time. The developed tool scans very large file collections quickly and accurately.
Each model is evaluated on real-world data and compared to reference methods. Our approach of inferring regular expressions to filter emails belonging to an email spam campaigns leads to models with a high true-positive rate at a very low false-positive rate that is an order of magnitude lower than that of a commercial content-based filter. Our presented system -REx-SVMshort- is being used by a commercial email service provider and complements content-based and IP-address based filtering.
Our cascaded malware detection system is evaluated on a high-quality data set of almost 400,000 conspicuous PHP files and a collection of more than 1,00,000 JavaScript files. From our case study we can conclude that our system can quickly and accurately process large data collections at a low false-positive rate.
Multi-sided platforms (MSP) strongly affect markets and play a crucial part within the digital and networked economy. Although empirical evidence indicates their occurrence in many industries, research has not investigated the game-changing impact of MSP on traditional markets to a sufficient extent. More specifically, we have little knowledge of how MSP affect value creation and customer interaction in entire markets, exploiting the potential of digital technologies to offer new value propositions. Our paper addresses this research gap and provides an initial systematic approach to analyze the impact of MSP on the insurance industry. For this purpose, we analyze the state of the art in research and practice in order to develop a reference model of the value network for the insurance industry. On this basis, we conduct a case-study analysis to discover and analyze roles which are occupied or even newly created by MSP. As a final step, we categorize MSP with regard to their relation to traditional insurance companies, resulting in a classification scheme with four MSP standard types: Competition, Coordination, Cooperation, Collaboration.
Structuring process models
(2012)
One can fairly adopt the ideas of Donald E. Knuth to conclude that process modeling is both a science and an art. Process modeling does have an aesthetic sense. Similar to composing an opera or writing a novel, process modeling is carried out by humans who undergo creative practices when engineering a process model. Therefore, the very same process can be modeled in a myriad number of ways. Once modeled, processes can be analyzed by employing scientific methods. Usually, process models are formalized as directed graphs, with nodes representing tasks and decisions, and directed arcs describing temporal constraints between the nodes. Common process definition languages, such as Business Process Model and Notation (BPMN) and Event-driven Process Chain (EPC) allow process analysts to define models with arbitrary complex topologies. The absence of structural constraints supports creativity and productivity, as there is no need to force ideas into a limited amount of available structural patterns. Nevertheless, it is often preferable that models follow certain structural rules. A well-known structural property of process models is (well-)structuredness. A process model is (well-)structured if and only if every node with multiple outgoing arcs (a split) has a corresponding node with multiple incoming arcs (a join), and vice versa, such that the set of nodes between the split and the join induces a single-entry-single-exit (SESE) region; otherwise the process model is unstructured. The motivations for well-structured process models are manifold: (i) Well-structured process models are easier to layout for visual representation as their formalizations are planar graphs. (ii) Well-structured process models are easier to comprehend by humans. (iii) Well-structured process models tend to have fewer errors than unstructured ones and it is less probable to introduce new errors when modifying a well-structured process model. (iv) Well-structured process models are better suited for analysis with many existing formal techniques applicable only for well-structured process models. (v) Well-structured process models are better suited for efficient execution and optimization, e.g., when discovering independent regions of a process model that can be executed concurrently. Consequently, there are process modeling languages that encourage well-structured modeling, e.g., Business Process Execution Language (BPEL) and ADEPT. However, the well-structured process modeling implies some limitations: (i) There exist processes that cannot be formalized as well-structured process models. (ii) There exist processes that when formalized as well-structured process models require a considerable duplication of modeling constructs. Rather than expecting well-structured modeling from start, we advocate for the absence of structural constraints when modeling. Afterwards, automated methods can suggest, upon request and whenever possible, alternative formalizations that are "better" structured, preferably well-structured. In this thesis, we study the problem of automatically transforming process models into equivalent well-structured models. The developed transformations are performed under a strong notion of behavioral equivalence which preserves concurrency. The findings are implemented in a tool, which is publicly available.
This talk will describe My Digital Life (TU100), a distance learning module that introduces computer science through immediate engagement with ubiquitous computing (ubicomp). This talk will describe some of the principles and concepts we have adopted for this modern computing introduction: the idea of the ‘informed digital citizen’; engagement through narrative; playful pedagogy; making the power of ubicomp available to novices; setting technical skills in real contexts. It will also trace how the pedagogy is informed by experiences and research in Computer Science education.
The objectives of this study were to examine (a) the effect
of dynamic assessment (DA) in a 3D Immersive Virtual Reality
(IVR) environment as compared with computerized 2D and noncomputerized
(NC) situations on cognitive modifiability, and (b) the
transfer effects of these conditions on more difficult problem solving
administered two weeks later in a non-computerized environment. A
sample of 117 children aged 6:6-9:0 years were randomly assigned
into three experimental groups of DA conditions: 3D, 2D, and NC, and
one control group (C). All groups received the pre- and post-teaching
Analogies subtest of the Cognitive Modifiability Battery (CMB-AN).
The experimental groups received a teaching phase in conditions similar
to the pre-and post-teaching phases. The findings showed that cognitive
modifiability, in a 3D IVR, was distinctively higher than in the two
other experimental groups (2D computer group and NC group). It was
also found that the 3D group showed significantly higher performance
in transfer problems than the 2D and NC groups.
The study reported in this paper involved the employment
of specific in-class exercises using a Personal Response System (PRS).
These exercises were designed with two goals: to enhance students’
capabilities of tracing a given code and of explaining a given code in
natural language with some abstraction. The paper presents evidence
from the actual use of the PRS along with students’ subjective impressions
regarding both the use of the PRS and the special exercises. The
conclusions from the findings are followed with a short discussion on
benefits of PRS-based mental processing exercises for learning programming
and beyond.
BugHunt
(2015)
Competencies related to operating systems and computer
security are usually taught systematically. In this paper we present
a different approach, in which students have to remove virus-like
behaviour on their respective computers, which has been induced by
software developed for this purpose. They have to develop appropriate
problem-solving strategies and thereby explore essential elements of
the operating system. The approach was implemented exemplarily in
two computer science courses at a regional general upper secondary
school and showed great motivation and interest in the participating
students.