004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (80)
- Doctoral Thesis (62)
- Postprint (6)
- Conference Proceeding (5)
- Master's Thesis (3)
- Other (3)
- Bachelor Thesis (1)
- Monograph/Edited Volume (1)
- Habilitation Thesis (1)
- Preprint (1)
Language
- English (163) (remove)
Keywords
- Machine Learning (7)
- Maschinelles Lernen (7)
- Computer Science Education (5)
- answer set programming (5)
- Antwortmengenprogrammierung (4)
- Answer Set Programming (3)
- Competence Measurement (3)
- DPLL (3)
- Optimization (3)
- Secondary Education (3)
Institute
- Institut für Informatik und Computational Science (163) (remove)
User Experience (UX) describes the holistic experience of a user before, during, and after interaction with a platform, product, or service. UX adds value and attraction to their sole functionality and is therefore highly relevant for firms. The increased interest in UX has produced a vast amount of scholarly research since 1983. The research field is, therefore, complex and scattered. Conducting a bibliometric analysis, we aim at structuring the field quantitatively and rather abstractly. We employed citation analyses, co-citation analyses, and content analyses to evaluate productivity and impact of extant research. We suggest that future research should focus more on business and management related topics.
We study the concept of reversibility in connection with parallel communicating systems of finite automata (PCFA in short). We define the notion of reversibility in the case of PCFA (also covering the non-deterministic case) and discuss the relationship of the reversibility of the systems and the reversibility of its components. We show that a system can be reversible with non-reversible components, and the other way around, the reversibility of the components does not necessarily imply the reversibility of the system as a whole. We also investigate the computational power of deterministic centralized reversible PCFA. We show that these very simple types of PCFA (returning or non-returning) can recognize regular languages which cannot be accepted by reversible (deterministic) finite automata, and that they can even accept languages that are not context-free. We also separate the deterministic and non-deterministic variants in the case of systems with non-returning communication. We show that there are languages accepted by non-deterministic centralized PCFA, which cannot be recognized by any deterministic variant of the same type.
Accurately solving classification problems nowadays is likely to be the most relevant machine learning task. Binary classification separating two classes only is algorithmically simpler but has fewer potential applications as many real-world problems are multi-class. On the reverse, separating only a subset of classes simplifies the classification task. Even though existing multi-class machine learning algorithms are very flexible regarding the number of classes, they assume that the target set Y is fixed and cannot be restricted once the training is finished. On the other hand, existing state-of-the-art production environments are becoming increasingly interconnected with the advance of Industry 4.0 and related technologies such that additional information can simplify the respective classification problems. In light of this, the main aim of this thesis is to introduce dynamic classification that generalizes multi-class classification such that the target class set can be restricted arbitrarily to a non-empty class subset M of Y at any time between two consecutive predictions.
This task is solved by a combination of two algorithmic approaches. First, classifier calibration, which transforms predictions into posterior probability estimates that are intended to be well calibrated. The analysis provided focuses on monotonic calibration and in particular corrects wrong statements that appeared in the literature. It also reveals that bin-based evaluation metrics, which became popular in recent years, are unjustified and should not be used at all. Next, the validity of Platt scaling, which is the most relevant parametric calibration approach, is analyzed in depth. In particular, its optimality for classifier predictions distributed according to four different families of probability distributions as well its equivalence with Beta calibration up to a sigmoidal preprocessing are proven. For non-monotonic calibration, extended variants on kernel density estimation and the ensemble method EKDE are introduced. Finally, the calibration techniques are evaluated using a simulation study with complete information as well as on a selection of 46 real-world data sets.
Building on this, classifier calibration is applied as part of decomposition-based classification that aims to reduce multi-class problems to simpler (usually binary) prediction tasks. For the involved fusing step performed at prediction time, a new approach based on evidence theory is presented that uses classifier calibration to model mass functions. This allows the analysis of decomposition-based classification against a strictly formal background and to prove closed-form equations for the overall combinations. Furthermore, the same formalism leads to a consistent integration of dynamic class information, yielding a theoretically justified and computationally tractable dynamic classification model. The insights gained from this modeling are combined with pairwise coupling, which is one of the most relevant reduction-based classification approaches, such that all individual predictions are combined with a weight. This not only generalizes existing works on pairwise coupling but also enables the integration of dynamic class information.
Lastly, a thorough empirical study is performed that compares all newly introduced approaches to existing state-of-the-art techniques. For this, evaluation metrics for dynamic classification are introduced that depend on corresponding sampling strategies. Thereafter, these are applied during a three-part evaluation. First, support vector machines and random forests are applied on 26 data sets from the UCI Machine Learning Repository. Second, two state-of-the-art deep neural networks are evaluated on five benchmark data sets from a relatively recent reference work. Here, computationally feasible strategies to apply the presented algorithms in combination with large-scale models are particularly relevant because a naive application is computationally intractable. Finally, reference data from a real-world process allowing the inclusion of dynamic class information are collected and evaluated. The results show that in combination with support vector machines and random forests, pairwise coupling approaches yield the best results, while in combination with deep neural networks, differences between the different approaches are mostly small to negligible. Most importantly, all results empirically confirm that dynamic classification succeeds in improving the respective prediction accuracies. Therefore, it is crucial to pass dynamic class information in respective applications, which requires an appropriate digital infrastructure.
M-rate 0L systems are interactionless Lindenmayer systems together with a function assigning to every string a set of multisets of productions that may be applied simultaneously to the string. Some questions that have been left open in the forerunner papers are examined, and the computational power of deterministic M-rate 0L systems is investigated, where also tabled and extended variants are taken into consideration.
The highly structured nature of the educational sector demands effective policy mechanisms close to the needs of the field. That is why evidence-based policy making, endorsed by the European Commission under Erasmus+ Key Action 3, aims to make an alignment between the domains of policy and practice. Against this background, this article addresses two issues: First, that there is a vertical gap in the translation of higher-level policies to local strategies and regulations. Second, that there is a horizontal gap between educational domains regarding the policy awareness of individual players. This was analyzed in quantitative and qualitative studies with domain experts from the fields of virtual mobility and teacher training. From our findings, we argue that the combination of both gaps puts the academic bridge from secondary to tertiary education at risk, including the associated knowledge proficiency levels. We discuss the role of digitalization in the academic bridge by asking the question: which value does the involved stakeholders expect from educational policies? As a theoretical basis, we rely on the model of value co-creation for and by stakeholders. We describe the used instruments along with the obtained results and proposed benefits. Moreover, we reflect on the methodology applied, and we finally derive recommendations for future academic bridge policies.
Research publications and data nowadays should be publicly available on the internet and, theoretically, usable for everyone to develop further research, products, or services. The long-term accessibility of research data is, therefore, fundamental in the economy of the research production process. However, the availability of data is not sufficient by itself, but also their quality must be verifiable. Measures to ensure reuse and reproducibility need to include the entire research life cycle, from the experimental design to the generation of data, quality control, statistical analysis, interpretation, and validation of the results. Hence, high-quality records, particularly for providing a string of documents for the verifiable origin of data, are essential elements that can act as a certificate for potential users (customers). These records also improve the traceability and transparency of data and processes, therefore, improving the reliability of results. Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in the life science academic research are agreed procedures for complex routine research workflows. Here, well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. Therefore, this paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation.
Research publications and data nowadays should be publicly available on the internet and, theoretically, usable for everyone to develop further research, products, or services. The long-term accessibility of research data is, therefore, fundamental in the economy of the research production process. However, the availability of data is not sufficient by itself, but also their quality must be verifiable. Measures to ensure reuse and reproducibility need to include the entire research life cycle, from the experimental design to the generation of data, quality control, statistical analysis, interpretation, and validation of the results. Hence, high-quality records, particularly for providing a string of documents for the verifiable origin of data, are essential elements that can act as a certificate for potential users (customers). These records also improve the traceability and transparency of data and processes, therefore, improving the reliability of results. Standards for data acquisition, analysis, and documentation have been fostered in the last decade driven by grassroot initiatives of researchers and organizations such as the Research Data Alliance (RDA). Nevertheless, what is still largely missing in the life science academic research are agreed procedures for complex routine research workflows. Here, well-crafted documentation like standard operating procedures (SOPs) offer clear direction and instructions specifically designed to avoid deviations as an absolute necessity for reproducibility. Therefore, this paper provides a standardized workflow that explains step by step how to write an SOP to be used as a starting point for appropriate research documentation.
The aim of our project design space exploration with answer set programming is to develop a general framework based on Answer Set Programming (ASP) that finds valid solutions to the system design problem and simultaneously performs Design Space Exploration (DSE) to find the most favorable alternatives. We leverage recent developments in ASP solving that allow for tight integration of background theories to create a holistic framework for effective DSE.
The Potsdam answer set solving collection, or Potassco for short, bundles various tools implementing and/or applying answer set programming. The article at hand succeeds an earlier description of the Potassco project published in Gebser et al. (AI Commun 24(2):107-124, 2011). Hence, we concentrate in what follows on the major features of the most recent, fifth generation of the ASP system clingo and highlight some recent resulting application systems.
Answer Set Programming faces an increasing popularity for problem solving in various domains. While its modeling language allows us to express many complex problems in an easy way, its solving technology enables their effective resolution. In what follows, we detail some of the key factors of its success. Answer Set Programming [ASP; Brewka et al. Commun ACM 54(12):92–103, (2011)] is seeing a rapid proliferation in academia and industry due to its easy and flexible way to model and solve knowledge-intense combinatorial (optimization) problems. To this end, ASP offers a high-level modeling language paired with high-performance solving technology. As a result, ASP systems provide out-off-the-box, general-purpose search engines that allow for enumerating (optimal) solutions. They are represented as answer sets, each being a set of atoms representing a solution. The declarative approach of ASP allows a user to concentrate on a problem’s specification rather than the computational means to solve it. This makes ASP a prime candidate for rapid prototyping and an attractive tool for teaching key AI techniques since complex problems can be expressed in a succinct and elaboration tolerant way. This is eased by the tuning of ASP’s modeling language to knowledge representation and reasoning (KRR). The resulting impact is nicely reflected by a growing range of successful applications of ASP [Erdem et al. AI Mag 37(3):53–68, 2016; Falkner et al. Industrial applications of answer set programming. K++nstliche Intelligenz (2018)]
In the last decades, there was a notable progress in solving the well-known Boolean satisfiability (Sat) problem, which can be witnessed by powerful Sat solvers. One of the reasons why these solvers are so fast are structural properties of instances that are utilized by the solver’s interna. This thesis deals with the well-studied structural property treewidth, which measures the closeness of an instance to being a tree. In fact, there are many problems parameterized by treewidth that are solvable in polynomial time in the instance size when parameterized by treewidth.
In this work, we study advanced treewidth-based methods and tools for problems in knowledge representation and reasoning (KR). Thereby, we provide means to establish precise runtime results (upper bounds) for canonical problems relevant to KR. Then, we present a new type of problem reduction, which we call decomposition-guided (DG) that
allows us to precisely monitor the treewidth when reducing from one problem to another problem. This new reduction type will be the basis for a long-open lower bound result for quantified Boolean formulas and allows us to design a new methodology for establishing runtime lower bounds for problems parameterized by treewidth.
Finally, despite these lower bounds, we provide an efficient implementation of algorithms that adhere to treewidth. Our approach finds suitable abstractions of instances, which are subsequently refined in a recursive fashion, and it uses Sat solvers for solving subproblems. It turns out that our resulting solver is quite competitive for two canonical counting problems related to Sat.
Discriminative Models for Biometric Identification using Micro- and Macro-Movements of the Eyes
(2021)
Human visual perception is an active process. Eye movements either alternate between fixations and saccades or follow a smooth pursuit movement in case of moving targets. Besides these macroscopic gaze patterns, the eyes perform involuntary micro-movements during fixations which are commonly categorized into micro-saccades, drift and tremor. Eye movements are frequently studied in cognitive psychology, because they reflect a complex interplay of perception, attention and oculomotor control.
A common insight of psychological research is that macro-movements are highly individual. Inspired by this finding, there has been a considerable amount of prior research on oculomotoric biometric identification. However, the accuracy of known approaches is too low and the time needed for identification is too long for any practical application. This thesis explores discriminative models for the task of biometric identification.
Discriminative models optimize a quality measure of the predictions and are usually superior to generative approaches in discriminative tasks. However, using discriminative models requires to select a suitable form of data representation for sequential eye gaze data; i.e., by engineering features or constructing a sequence kernel and the performance of the classification model strongly depends on the data representation. We study two fundamentally different ways of representing eye gaze within a discriminative framework. In the first part of this thesis, we explore the integration of data and psychological background knowledge in the form of generative models to construct representations. To this end, we first develop generative statistical models of gaze behavior during reading and scene viewing that account for viewer-specific distributional properties of gaze patterns. In a second step, we develop a discriminative identification model by deriving Fisher kernel functions from these and several baseline models. We find that an SVM with Fisher kernel is able to reliably identify users based on their eye gaze during reading and scene viewing. However, since the generative models are constrained to use low-frequency macro-movements, they discard a significant amount of information contained in the raw eye tracking signal at a high cost: identification requires about one minute of input recording, which makes it inapplicable for real world biometric systems. In the second part of this thesis, we study a purely data-driven modeling approach. Here, we aim at automatically discovering the individual pattern hidden in the raw eye tracking signal. To this end, we develop a deep convolutional neural network DeepEyedentification that processes yaw and pitch gaze velocities and learns a representation end-to-end. Compared to prior work, this model increases the identification accuracy by one order of magnitude and the time to identification decreases to only seconds. The DeepEyedentificationLive model further improves upon the identification performance by processing binocular input and it also detects presentation-attacks.
We find that by learning a representation, the performance of oculomotoric identification and presentation-attack detection can be driven close to practical relevance for biometric applications. Eye tracking devices with high sampling frequency and precision are expensive and the applicability of eye movement as a biometric feature heavily depends on cost of recording devices.
In the last part of this thesis, we therefore study the requirements on data quality by evaluating the performance of the DeepEyedentificationLive network under reduced spatial and temporal resolution. We find that the method still attains a high identification accuracy at a temporal resolution of only 250 Hz and a precision of 0.03 degrees. Reducing both does not have an additive deteriorating effect.
We introduce a type and effect system, for an imperative object calculus, which infers sharing possibly introduced by the evaluation of an expression, represented as an equivalence relation among its free variables. This direct representation of sharing effects at the syntactic level allows us to express in a natural way, and to generalize, widely-used notions in literature, notably uniqueness and borrowing. Moreover, the calculus is pure in the sense that reduction is defined on language terms only, since they directly encode store. The advantage of this non-standard execution model with respect to a behaviorally equivalent standard model using a global auxiliary structure is that reachability relations among references are partly encoded by scoping. (C) 2018 Elsevier B.V. All rights reserved.
plasp 3
(2019)
We describe the new version of the Planning Domain Definition Language (PDDL)-to-Answer Set Programming (ASP) translator plasp. First, it widens the range of accepted PDDL features. Second, it contains novel planning encodings, some inspired by Satisfiability Testing (SAT) planning and others exploiting ASP features such as well-foundedness. All of them are designed for handling multivalued fluents in order to capture both PDDL as well as SAS planning formats. Third, enabled by multishot ASP solving, it offers advanced planning algorithms also borrowed from SAT planning. As a result, plasp provides us with an ASP-based framework for studying a variety of planning techniques in a uniform setting. Finally, we demonstrate in an empirical analysis that these techniques have a significant impact on the performance of ASP planning.
A common feature in Answer Set Programming is the use of a second negation, stronger than default negation and sometimes called explicit, strong or classical negation. This explicit negation is normally used in front of atoms, rather than allowing its use as a regular operator. In this paper we consider the arbitrary combination of explicit negation with nested expressions, as those defined by Lifschitz, Tang and Turner. We extend the concept of reduct for this new syntax and then prove that it can be captured by an extension of Equilibrium Logic with this second negation. We study some properties of this variant and compare to the already known combination of Equilibrium Logic with Nelson's strong negation.
In a recent line of research, two familiar concepts from logic programming semantics (unfounded sets and splitting) were extrapolated to the case of epistemic logic programs. The property of epistemic splitting provides a natural and modular way to understand programs without epistemic cycles but, surprisingly, was only fulfilled by Gelfond's original semantics (G91), among the many proposals in the literature. On the other hand, G91 may suffer from a kind of self-supported, unfounded derivations when epistemic cycles come into play. Recently, the absence of these derivations was also formalised as a property of epistemic semantics called foundedness. Moreover, a first semantics proved to satisfy foundedness was also proposed, the so-called Founded Autoepistemic Equilibrium Logic (FAEEL). In this paper, we prove that FAEEL also satisfies the epistemic splitting property something that, together with foundedness, was not fulfilled by any other approach up to date. To prove this result, we provide an alternative characterisation of FAEEL as a combination of G91 with a simpler logic we called Founded Epistemic Equilibrium Logic (FEEL), which is somehow an extrapolation of the stable model semantics to the modal logic S5.
Parsing of argumentative structures has become a very active line of research in recent years. Like discourse parsing or any other natural language task that requires prediction of linguistic structures, most approaches choose to learn a local model and then perform global decoding over the local probability distributions, often imposing constraints that are specific to the task at hand. Specifically for argumentation parsing, two decoding approaches have been recently proposed: Minimum Spanning Trees (MST) and Integer Linear Programming (ILP), following similar trends in discourse parsing. In contrast to discourse parsing though, where trees are not always used as underlying annotation schemes, argumentation structures so far have always been represented with trees. Using the 'argumentative microtext corpus' [in: Argumentation and Reasoned Action: Proceedings of the 1st European Conference on Argumentation, Lisbon 2015 / Vol. 2, College Publications, London, 2016, pp. 801-815] as underlying data and replicating three different decoding mechanisms, in this paper we propose a novel ILP decoder and an extension to our earlier MST work, and then thoroughly compare the approaches. The result is that our new decoder outperforms related work in important respects, and that in general, ILP and MST yield very similar performance.
Multi-sided platforms (MSP) strongly affect markets and play a crucial part within the digital and networked economy. Although empirical evidence indicates their occurrence in many industries, research has not investigated the game-changing impact of MSP on traditional markets to a sufficient extent. More specifically, we have little knowledge of how MSP affect value creation and customer interaction in entire markets, exploiting the potential of digital technologies to offer new value propositions. Our paper addresses this research gap and provides an initial systematic approach to analyze the impact of MSP on the insurance industry. For this purpose, we analyze the state of the art in research and practice in order to develop a reference model of the value network for the insurance industry. On this basis, we conduct a case-study analysis to discover and analyze roles which are occupied or even newly created by MSP. As a final step, we categorize MSP with regard to their relation to traditional insurance companies, resulting in a classification scheme with four MSP standard types: Competition, Coordination, Cooperation, Collaboration.
Emotions are a central element of human experience. They occur with high frequency in everyday life and play an important role in decision making. However, currently there is no consensus among researchers on what constitutes an emotion and on how emotions should be investigated. This dissertation identifies three problems of current emotion research: the problem of ground truth, the problem of incomplete constructs and the problem of optimal representation. I argue for a focus on the detailed measurement of emotion manifestations with computer-aided methods to solve these problems. This approach is demonstrated in three research projects, which describe the development of methods specific to these problems as well as their application to concrete research questions.
The problem of ground truth describes the practice to presuppose a certain structure of emotions as the a priori ground truth. This determines the range of emotion descriptions and sets a standard for the correct assignment of these descriptions. The first project illustrates how this problem can be circumvented with a multidimensional emotion perception paradigm which stands in contrast to the emotion recognition paradigm typically employed in emotion research. This paradigm allows to calculate an objective difficulty measure and to collect subjective difficulty ratings for the perception of emotional stimuli. Moreover, it enables the use of an arbitrary number of emotion stimuli categories as compared to the commonly used six basic emotion categories. Accordingly, we collected data from 441 participants using dynamic facial expression stimuli from 40 emotion categories. Our findings suggest an increase in emotion perception difficulty with increasing actor age and provide evidence to suggest that young adults, the elderly and men underestimate their emotion perception difficulty. While these effects were predicted from the literature, we also found unexpected and novel results. In particular, the increased difficulty on the objective difficulty measure for female actors and observers stood in contrast to reported findings. Exploratory analyses revealed low relevance of person-specific variables for the prediction of emotion perception difficulty, but highlighted the importance of a general pleasure dimension for the ease of emotion perception.
The second project targets the problem of incomplete constructs which relates to vaguely defined psychological constructs on emotion with insufficient ties to tangible manifestations. The project exemplifies how a modern data collection method such as face tracking data can be used to sharpen these constructs on the example of arousal, a long-standing but fuzzy construct in emotion research. It describes how measures of distance, speed and magnitude of acceleration can be computed from face tracking data and investigates their intercorrelations. We find moderate to strong correlations among all measures of static information on one hand and all measures of dynamic information on the other. The project then investigates how self-rated arousal is tied to these measures in 401 neurotypical individuals and 19 individuals with autism. Distance to the neutral face was predictive of arousal ratings in both groups. Lower mean arousal ratings were found for the autistic group, but no difference in correlation of the measures and arousal ratings could be found between groups. Results were replicated in a high autistic traits group consisting of 41 participants. The findings suggest a qualitatively similar perception of arousal for individuals with and without autism. No correlations between valence ratings and any of the measures could be found which emphasizes the specificity of our tested measures for the construct of arousal.
The problem of optimal representation refers to the search for the best representation of emotions and the assumption that there is a one-fits-all solution. In the third project we introduce partial least squares analysis as a general method to find an optimal representation to relate two high-dimensional data sets to each other. The project demonstrates its applicability to emotion research on the question of emotion perception differences between men and women. The method was used with emotion rating data from 441 participants and face tracking data computed on 306 videos. We found quantitative as well as qualitative differences in the perception of emotional facial expressions between these groups. We showed that women’s emotional perception systematically captured more of the variance in facial expressions. Additionally, we could show that significant differences exist in the way that women and men perceive some facial expressions which could be visualized as concrete facial expression sequences. These expressions suggest differing perceptions of masked and ambiguous facial expressions between the sexes. In order to facilitate use of the developed method by the research community, a package for the statistical environment R was written. Furthermore, to call attention to the method and its usefulness for emotion research, a website was designed that allows users to explore a model of emotion ratings and facial expression data in an interactive fashion.
The usage of mobile devices is rapidly growing with Android being the most prevalent mobile operating system. Thanks to the vast variety of mobile applications, users are preferring smartphones over desktops for day to day tasks like Internet surfing. Consequently, smartphones store a plenitude of sensitive data. This data together with the high values of smartphones make them an attractive target for device/data theft (thieves/malicious applications).
Unfortunately, state-of-the-art anti-theft solutions do not work if they do not have an active network connection, e.g., if the SIM card was removed from the device. In the majority of these cases, device owners permanently lose their smartphone together with their personal data, which is even worse.
Apart from that malevolent applications perform malicious activities to steal sensitive information from smartphones. Recent research considered static program analysis to detect dangerous data leaks. These analyses work well for data leaks due to inter-component communication, but suffer from shortcomings for inter-app communication with respect to precision, soundness, and scalability.
This thesis focuses on enhancing users' privacy on Android against physical device loss/theft and (un)intentional data leaks. It presents three novel frameworks: (1) ThiefTrap, an anti-theft framework for Android, (2) IIFA, a modular inter-app intent information flow analysis of Android applications, and (3) PIAnalyzer, a precise approach for PendingIntent vulnerability analysis.
ThiefTrap is based on a novel concept of an anti-theft honeypot account that protects the owner's data while preventing a thief from resetting the device.
We implemented the proposed scheme and evaluated it through an empirical user study with 35 participants. In this study, the owner's data could be protected, recovered, and anti-theft functionality could be performed unnoticed from the thief in all cases.
IIFA proposes a novel approach for Android's inter-component/inter-app communication (ICC/IAC) analysis. Our main contribution is the first fully automatic, sound, and precise ICC/IAC information flow analysis that is scalable for realistic apps due to modularity, avoiding combinatorial explosion: Our approach determines communicating apps using short summaries rather than inlining intent calls between components and apps, which requires simultaneously analyzing all apps installed on a device.
We evaluate IIFA in terms of precision, recall, and demonstrate its scalability to a large corpus of real-world apps. IIFA reports 62 problematic ICC-/IAC-related information flows via two or more apps/components.
PIAnalyzer proposes a novel approach to analyze PendingIntent related vulnerabilities. PendingIntents are a powerful and universal feature of Android for inter-component communication. We empirically evaluate PIAnalyzer on a set of 1000 randomly selected applications and find 1358 insecure usages of PendingIntents, including 70 severe vulnerabilities.
Solving problems combining task and motion planning requires searching across a symbolic search space and a geometric search space. Because of the semantic gap between symbolic and geometric representations, symbolic sequences of actions are not guaranteed to be geometrically feasible. This compels us to search in the combined search space, in which frequent backtracks between symbolic and geometric levels make the search inefficient.We address this problem by guiding symbolic search with rich information extracted from the geometric level through culprit detection mechanisms.
Services that operate over the Internet are under constant threat of being exposed to fraudulent use. Maintaining good user experience for legitimate users often requires the classification of entities as malicious or legitimate in order to initiate countermeasures. As an example, inbound email spam filters decide for spam or non-spam. They can base their decision on both the content of each email as well as on features that summarize prior emails received from the sending server. In general, discriminative classification methods learn to distinguish positive from negative entities. Each decision for a label may be based on features of the entity and related entities. When labels of related entities have strong interdependencies---as can be assumed e.g. for emails being delivered by the same user---classification decisions should not be made independently and dependencies should be modeled in the decision function. This thesis addresses the formulation of discriminative classification problems that are tailored for the specific demands of the following three Internet security applications. Theoretical and algorithmic solutions are devised to protect an email service against flooding of user inboxes, to mitigate abusive usage of outbound email servers, and to protect web servers against distributed denial of service attacks.
In the application of filtering an inbound email stream for unsolicited emails, utilizing features that go beyond each individual email's content can be valuable. Information about each sending mail server can be aggregated over time and may help in identifying unwanted emails. However, while this information will be available to the deployed email filter, some parts of the training data that are compiled by third party providers may not contain this information. The missing features have to be estimated at training time in order to learn a classification model. In this thesis an algorithm is derived that learns a decision function that integrates over a distribution of values for each missing entry. The distribution of missing values is a free parameter that is optimized to learn an optimal decision function.
The outbound stream of emails of an email service provider can be separated by the customer IDs that ask for delivery. All emails that are sent by the same ID in the same period of time are related, both in content and in label. Hijacked customer accounts may send batches of unsolicited emails to other email providers, which in turn might blacklist the sender's email servers after detection of incoming spam emails. The risk of being blocked from further delivery depends on the rate of outgoing unwanted emails and the duration of high spam sending rates. An optimization problem is developed that minimizes the expected cost for the email provider by learning a decision function that assigns a limit on the sending rate to customers based on the each customer's email stream.
Identifying attacking IPs during HTTP-level DDoS attacks allows to block those IPs from further accessing the web servers. DDoS attacks are usually carried out by infected clients that are members of the same botnet and show similar traffic patterns. HTTP-level attacks aim at exhausting one or more resources of the web server infrastructure, such as CPU time. If the joint set of attackers cannot increase resource usage close to the maximum capacity, no effect will be experienced by legitimate users of hosted web sites. However, if the additional load raises the computational burden towards the critical range, user experience will degrade until service may be unavailable altogether. As the loss of missing one attacker depends on block decisions for other attackers---if most other attackers are detected, not blocking one client will likely not be harmful---a structured output model has to be learned. In this thesis an algorithm is developed that learns a structured prediction decoder that searches the space of label assignments, guided by a policy.
Each model is evaluated on real-world data and is compared to reference methods. The results show that modeling each classification problem according to the specific demands of the task improves performance over solutions that do not consider the constraints inherent to an application.
Personal fabrication tools, such as 3D printers, are on the way of enabling a future in which non-technical users will be able to create custom objects. However, while the hardware is there, the current interaction model behind existing design tools is not suitable for non-technical users. Today, 3D printers are operated by fabricating the object in one go, which tends to take overnight due to the slow 3D printing technology. Consequently, the current interaction model requires users to think carefully before printing as every mistake may imply another overnight print. Planning every step ahead, however, is not feasible for non-technical users as they lack the experience to reason about the consequences of their design decisions.
In this dissertation, we propose changing the interaction model around personal fabrication tools to better serve this user group. We draw inspiration from personal computing and argue that the evolution of personal fabrication may resemble the evolution of personal computing: Computing started with machines that executed a program in one go before returning the result to the user. By decreasing the interaction unit to single requests, turn-taking systems such as the command line evolved, which provided users with feedback after every input. Finally, with the introduction of direct-manipulation interfaces, users continuously interacted with a program receiving feedback about every action in real-time. In this dissertation, we explore whether these interaction concepts can be applied to personal fabrication as well.
We start with fabricating an object in one go and investigate how to tighten the feedback-cycle on an object-level: We contribute a method called low-fidelity fabrication, which saves up to 90% fabrication time by creating objects as fast low-fidelity previews, which are sufficient to evaluate key design aspects. Depending on what is currently being tested, we propose different conversions that enable users to focus on different parts: faBrickator allows for a modular design in the early stages of prototyping; when users move on WirePrint allows quickly testing an object's shape, while Platener allows testing an object's technical function. We present an interactive editor for each technique and explain the underlying conversion algorithms.
By interacting on smaller units, such as a single element of an object, we explore what it means to transition from systems that fabricate objects in one go to turn-taking systems. We start with a 2D system called constructable: Users draw with a laser pointer onto the workpiece inside a laser cutter. The drawing is captured with an overhead camera. As soon as the the user finishes drawing an element, such as a line, the constructable system beautifies the path and cuts it--resulting in physical output after every editing step. We extend constructable towards 3D editing by developing a novel laser-cutting technique for 3D objects called LaserOrigami that works by heating up the workpiece with the defocused laser until the material becomes compliant and bends down under gravity. While constructable and LaserOrigami allow for fast physical feedback, the interaction is still best described as turn-taking since it consists of two discrete steps: users first create an input and afterwards the system provides physical output.
By decreasing the interaction unit even further to a single feature, we can achieve real-time physical feedback: Input by the user and output by the fabrication device are so tightly coupled that no visible lag exists. This allows us to explore what it means to transition from turn-taking interfaces, which only allow exploring one option at a time, to direct manipulation interfaces with real-time physical feedback, which allow users to explore the entire space of options continuously with a single interaction. We present a system called FormFab, which allows for such direct control. FormFab is based on the same principle as LaserOrigami: It uses a workpiece that when warmed up becomes compliant and can be reshaped. However, FormFab achieves the reshaping not based on gravity, but through a pneumatic system that users can control interactively. As users interact, they see the shape change in real-time.
We conclude this dissertation by extrapolating the current evolution into a future in which large numbers of people use the new technology to create objects. We see two additional challenges on the horizon: sustainability and intellectual property. We investigate sustainability by demonstrating how to print less and instead patch physical objects. We explore questions around intellectual property with a system called Scotty that transfers objects without creating duplicates, thereby preserving the designer's copyright.
Computer Security deals with the detection and mitigation of threats to computer networks, data, and computing hardware. This
thesis addresses the following two computer security problems: email spam campaign and malware detection.
Email spam campaigns can easily be generated using popular dissemination tools by specifying simple grammars that serve as message templates. A grammar is disseminated to nodes of a bot net, the nodes create messages by instantiating the grammar at random. Email spam campaigns can encompass huge data volumes and therefore pose a threat to the stability of the infrastructure of email service providers that have to store them. Malware -software that serves a malicious purpose- is affecting web servers, client computers via active content, and client computers through executable files. Without the help of malware detection systems it would be easy for malware creators to collect sensitive information or to infiltrate computers.
The detection of threats -such as email-spam messages, phishing messages, or malware- is an adversarial and therefore intrinsically
difficult problem. Threats vary greatly and evolve over time. The detection of threats based on manually-designed rules is therefore
difficult and requires a constant engineering effort. Machine-learning is a research area that revolves around the analysis of data and the discovery of patterns that describe aspects of the data. Discriminative learning methods extract prediction models from data that are optimized to predict a target attribute as accurately as possible. Machine-learning methods hold the promise of automatically identifying patterns that robustly and accurately detect threats. This thesis focuses on the design and analysis of discriminative learning methods for the two computer-security problems under investigation: email-campaign and malware detection.
The first part of this thesis addresses email-campaign detection. We focus on regular expressions as a syntactic framework, because regular expressions are intuitively comprehensible by security engineers and administrators, and they can be applied as a detection mechanism in an extremely efficient manner. In this setting, a prediction model is provided with exemplary messages from an email-spam campaign. The prediction model has to generate a regular expression that reveals the syntactic pattern that underlies the entire campaign, and that a security engineers finds comprehensible and feels confident enough to use the expression to blacklist further messages at the email server. We model this problem as two-stage learning problem with structured input and output spaces which can be solved using standard cutting plane methods. Therefore we develop an appropriate loss function, and derive a decoder for the resulting optimization problem.
The second part of this thesis deals with the problem of predicting whether a given JavaScript or PHP file is malicious or benign. Recent malware analysis techniques use static or dynamic features, or both. In fully dynamic analysis, the software or script is executed and observed for malicious behavior in a sandbox environment. By contrast, static analysis is based on features that can be extracted directly from the program file. In order to bypass static detection mechanisms, code obfuscation techniques are used to spread a malicious program file in many different syntactic variants. Deobfuscating the code before applying a static classifier can be subjected to mostly static code analysis and can overcome the problem of obfuscated malicious code, but on the other hand increases the computational costs of malware detection by an order of magnitude. In this thesis we present a cascaded architecture in which a classifier first performs a static analysis of the original code and -based on the outcome of this first classification step- the code may be deobfuscated and classified again. We explore several types of features including token $n$-grams, orthogonal sparse bigrams, subroutine-hashings, and syntax-tree features and study the robustness of detection methods and feature types against the evolution of malware over time. The developed tool scans very large file collections quickly and accurately.
Each model is evaluated on real-world data and compared to reference methods. Our approach of inferring regular expressions to filter emails belonging to an email spam campaigns leads to models with a high true-positive rate at a very low false-positive rate that is an order of magnitude lower than that of a commercial content-based filter. Our presented system -REx-SVMshort- is being used by a commercial email service provider and complements content-based and IP-address based filtering.
Our cascaded malware detection system is evaluated on a high-quality data set of almost 400,000 conspicuous PHP files and a collection of more than 1,00,000 JavaScript files. From our case study we can conclude that our system can quickly and accurately process large data collections at a low false-positive rate.
Geospatial data has become a natural part of a growing number of information systems and services in the economy, society, and people's personal lives. In particular, virtual 3D city and landscape models constitute valuable information sources within a wide variety of applications such as urban planning, navigation, tourist information, and disaster management. Today, these models are often visualized in detail to provide realistic imagery. However, a photorealistic rendering does not automatically lead to high image quality, with respect to an effective information transfer, which requires important or prioritized information to be interactively highlighted in a context-dependent manner.
Approaches in non-photorealistic renderings particularly consider a user's task and camera perspective when attempting optimal expression, recognition, and communication of important or prioritized information. However, the design and implementation of non-photorealistic rendering techniques for 3D geospatial data pose a number of challenges, especially when inherently complex geometry, appearance, and thematic data must be processed interactively. Hence, a promising technical foundation is established by the programmable and parallel computing architecture of graphics processing units.
This thesis proposes non-photorealistic rendering techniques that enable both the computation and selection of the abstraction level of 3D geospatial model contents according to user interaction and dynamically changing thematic information. To achieve this goal, the techniques integrate with hardware-accelerated rendering pipelines using shader technologies of graphics processing units for real-time image synthesis. The techniques employ principles of artistic rendering, cartographic generalization, and 3D semiotics—unlike photorealistic rendering—to synthesize illustrative renditions of geospatial feature type entities such as water surfaces, buildings, and infrastructure networks. In addition, this thesis contributes a generic system that enables to integrate different graphic styles—photorealistic and non-photorealistic—and provide their seamless transition according to user tasks, camera view, and image resolution.
Evaluations of the proposed techniques have demonstrated their significance to the field of geospatial information visualization including topics such as spatial perception, cognition, and mapping. In addition, the applications in illustrative and focus+context visualization have reflected their potential impact on optimizing the information transfer regarding factors such as cognitive load, integration of non-realistic information, visualization of uncertainty, and visualization on small displays.
The main objective of this dissertation is to analyse prerequisites, expectations, apprehensions, and attitudes of students studying computer science, who are willing to gain a bachelor degree. The research will also investigate in the students’ learning style according to the Felder-Silverman model. These investigations fall in the attempt to make an impact on reducing the “dropout”/shrinkage rate among students, and to suggest a better learning environment.
The first investigation starts with a survey that has been made at the computer science department at the University of Baghdad to investigate the attitudes of computer science students in an environment dominated by women, showing the differences in attitudes between male and female students in different study years. Students are accepted to university studies via a centrally controlled admission procedure depending mainly on their final score at school. This leads to a high percentage of students studying subjects they do not want. Our analysis shows that 75% of the female students do not regret studying computer science although it was not their first choice. And according to statistics over previous years, women manage to succeed in their study and often graduate on top of their class. We finish with a comparison of attitudes between the freshman students of two different cultures and two different university enrolment procedures (University of Baghdad, in Iraq, and the University of Potsdam, in Germany) both with opposite gender majority.
The second step of investigation took place at the department of computer science at the University of Potsdam in Germany and analyzes the learning styles of students studying the three major fields of study offered by the department (computer science, business informatics, and computer science teaching). Investigating the differences in learning styles between the students of those study fields who usually take some joint courses is important to be aware of which changes are necessary to be adopted in the teaching methods to address those different students. It was a two stage study using two questionnaires; the main one is based on the Index of Learning Styles Questionnaire of B. A. Solomon and R. M. Felder, and the second questionnaire was an investigation on the students’ attitudes towards the findings of their personal first questionnaire. Our analysis shows differences in the preferences of learning style between male and female students of the different study fields, as well as differences between students with the different specialties (computer science, business informatics, and computer science teaching).
The third investigation looks closely into the difficulties, issues, apprehensions and expectations of freshman students studying computer science. The study took place at the computer science department at the University of Potsdam with a volunteer sample of students. The goal is to determine and discuss the difficulties and issues that they are facing in their study that may lead them to think in dropping-out, changing the study field, or changing the university. The research continued with the same sample of students (with business informatics students being the majority) through more than three semesters. Difficulties and issues during the study were documented, as well as students’ attitudes, apprehensions, and expectations. Some of the professors and lecturers opinions and solutions to some students’ problems were also documented. Many participants had apprehensions and difficulties, especially towards informatics subjects. Some business informatics participants began to think of changing the university, in particular when they reached their third semester, others thought about changing their field of study. Till the end of this research, most of the participants continued in their studies (the study they have started with or the new study they have changed to) without leaving the higher education system.
In this project I constructed a workflow that takes a DNA sequence as input and provides a phylogenetic tree, consisting of the input sequence and other sequences which were found during a database search. In this phylogenetic tree the sequences are arranged depending on similarities. In bioinformatics, constructing phylogenetic trees is often used to explore the evolutionary relationships of genes or organisms and to understand the mechanisms of evolution itself.
Spotlocator is a game wherein people have to guess the spots of where photos were taken. The photos of a defined area for each game are from panoramio.com. They are published at http://spotlocator. drupalgardens.com with an ID. Everyone can guess the photo spots by sending a special tweet via Twitter that contains the hashtag #spotlocator, the guessed coordinates and the ID of the photo. An evaluation is published for all tweets. The players are informed about the distance to the real photo spots and the positions are shown on a map.
Exploratory Data Analysis
(2014)
In bioinformatics the term exploratory data analysis refers to different methods to get an overview of large biological data sets. Hence, it helps to create a framework for further analysis and hypothesis testing. The workflow facilitates this first important step of the data analysis created by high-throughput technologies. The results are different plots showing the structure of the measurements. The goal of the workflow is the automatization of the exploratory data analysis, but also the flexibility should be guaranteed. The basic tool is the free software R.
The protein classification workflow described in this report enables users to get information about a novel protein sequence automatically. The information is derived by different bioinformatic analysis tools which calculate or predict features of a protein sequence. Also, databases are used to compare the novel sequence with known proteins.
Lessons Learned
(2014)
This chapter summarizes the experience and the lessons we learned concerning the application of the jABC as a framework for design and execution of scientific workflows. It reports experiences from the domain modeling (especially service integration) and workflow design phases and evaluates the resulting models statistically with respect to the SIB library and hierarchy levels.
The Course's SIB Libraries
(2014)
This chapter gives a detailed description of the service framework underlying all the example projects that form the foundation of this book. It describes the different SIB libraries that we made available for the course “Process modeling in the natural sciences” to provide the functionality that was required for the envisaged applications. The students used these SIB libraries to realize their projects.
A major part of the scientific experiments that are carried out today requires thorough computational support. While database and algorithm providers face the problem of bundling resources to create and sustain powerful computation nodes, the users have to deal with combining sets of (remote) services into specific data analysis and transformation processes. Today’s attention to “big data” amplifies the issues of size, heterogeneity, and process-level diversity/integration. In the last decade, especially workflow-based approaches to deal with these processes have enjoyed great popularity. This book concerns a particularly agile and model-driven approach to manage scientific workflows that is based on the XMDD paradigm. In this chapter we explain the scope and purpose of the book, briefly describe the concepts and technologies of the XMDD paradigm, explain the principal differences to related approaches, and outline the structure of the book.
We summarize here the main characteristics and features of the jABC framework, used in the case studies as a graphical tool for modeling scientific processes and workflows. As a comprehensive environment for service-oriented modeling and design according to the XMDD (eXtreme Model-Driven Design) paradigm, the jABC offers much more than the pure modeling capability. Associated technologies and plugins provide in fact means for a rich variety of supporting functionality, such as remote service integration, taxonomical service classification, model execution, model verification, model synthesis, and model compilation. We describe here in short both the essential jABC features and the service integration philosophy followed in the environment. In our work over the last years we have seen that this kind of service definition and provisioning platform has the potential to become a core technology in interdisciplinary service orchestration and technology transfer: Domain experts, like scientists not specially trained in computer science, directly define complex service orchestrations as process models and use efficient and complex domain-specific tools in a simple and intuitive way.
Software-as-a-Service (SaaS) offers several advantages to both service providers and users. Service providers can benefit from the reduction of Total Cost of Ownership (TCO), better scalability, and better resource utilization. On the other hand, users can use the service anywhere and anytime, and minimize upfront investment by following the pay-as-you-go model. Despite the benefits of SaaS, users still have concerns about the security and privacy of their data. Due to the nature of SaaS and the Cloud in general, the data and the computation are beyond the users' control, and hence data security becomes a vital factor in this new paradigm. Furthermore, in multi-tenant SaaS applications, the tenants become more concerned about the confidentiality of their data since several tenants are co-located onto a shared infrastructure.
To address those concerns, we start protecting the data from the provisioning process by controlling how tenants are being placed in the infrastructure. We present a resource allocation algorithm designed to minimize the risk of co-resident tenants called SecPlace. It enables the SaaS provider to control the resource (i.e., database instance) allocation process while taking into account the security of tenants as a requirement.
Due to the design principles of the multi-tenancy model, tenants follow some degree of sharing on both application and infrastructure levels. Thus, strong security-isolation should be present. Therefore, we develop SignedQuery, a technique that prevents one tenant from accessing others' data. We use the Signing Concept to create a signature that is used to sign the tenant's request, then the server can verifies the signature and recognizes the requesting tenant, and hence ensures that the data to be accessed is belonging to the legitimate tenant.
Finally, Data confidentiality remains a critical concern due to the fact that data in the Cloud is out of users' premises, and hence beyond their control. Cryptography is increasingly proposed as a potential approach to address such a challenge. Therefore, we present SecureDB, a system designed to run SQL-based applications over an encrypted database. SecureDB captures the schema design and analyzes it to understand the internal structure of the data (i.e., relationships between the tables and their attributes). Moreover, we determine the appropriate partialhomomorphic encryption scheme for each attribute where computation is possible even when the data is encrypted.
To evaluate our work, we conduct extensive experiments with di↵erent settings. The main use case in our work is a popular open source HRM application, called OrangeHRM. The results show that our multi-layered approach is practical, provides enhanced security and isolation among tenants, and have a moderate complexity in terms of processing encrypted data.
A lot has been published about the competencies needed by
students in the 21st century (Ravenscroft et al., 2012). However, equally
important are the competencies needed by educators in the new era
of digital education. We review the key competencies for educators in
light of the new methods of teaching and learning proposed by Massive
Open Online Courses (MOOCs) and their on-campus counterparts,
Small Private Online Courses (SPOCs).
Participants of this workshop will be confronted exemplarily
with a considerable inconsistency of global Informatics education at
lower secondary level. More importantly, they are invited to contribute
actively on this issue in form of short case studies of their countries.
Until now, very few countries have been successful in implementing
Informatics or Computing at primary and lower secondary level. The
spectrum from digital literacy to informatics, particularly as a discipline
in its own right, has not really achieved a breakthrough and seems to
be underrepresented for these age groups. The goal of this workshop
is not only to discuss the anamnesis and diagnosis of this fragmented
field, but also to discuss and suggest viable forms of therapy in form of
setting educational standards. Making visible good practices in some
countries and comparing successful approaches are rewarding tasks for
this workshop.
Discussing and defining common educational standards on a transcontinental
level for the age group of 14 to 15 years old students in a readable,
assessable and acceptable form should keep the participants of this
workshop active beyond the limited time at the workshop.
Let’s talk about CS!
(2015)
To communicate about a science is the most important key
competence in education for any science. Without communication we
cannot teach, so teachers should reflect about the language they use in
class properly. But the language students and teachers use to communicate
about their CS courses is very heterogeneous, inconsistent and
deeply influenced by tool names. There is a big lack of research and
discussion in CS education regarding the terminology and the role of
concepts and tools in our science. We don’t have a consistent set of
terminology that we agree on to be helpful for learning our science.
This makes it nearly impossible to do research on CS competencies as
long as we have not agreed on the names we use to describe these. This
workshop intends to provide room to fill with discussion and first ideas
for future research in this field.
ProtoSense
(2015)
The poster and abstract describe the importance of teaching
information security in school. After a short description of information
security and important aspects, I will show, how information security
fits into different guidelines or models for computer science educations
and that it is therefore on of the key competencies. Afterwards I will
present you a rough insight of teaching information security in Austria.