004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (80)
- Doctoral Thesis (62)
- Postprint (6)
- Conference Proceeding (5)
- Master's Thesis (3)
- Other (3)
- Bachelor Thesis (1)
- Monograph/Edited Volume (1)
- Habilitation Thesis (1)
- Preprint (1)
Language
- English (163) (remove)
Keywords
- Machine Learning (7)
- Maschinelles Lernen (7)
- Computer Science Education (5)
- answer set programming (5)
- Antwortmengenprogrammierung (4)
- Answer Set Programming (3)
- Competence Measurement (3)
- DPLL (3)
- Optimization (3)
- Secondary Education (3)
Institute
- Institut für Informatik und Computational Science (163) (remove)
GraffDok is an application helping to maintain an overview over sprayed images somewhere in a city. At the time of writing it aims at vandalism rather than at beautiful photographic graffiti in an underpass. Looking at hundreds of tags and scribbles on monuments, house walls, etc. it would be interesting to not only record them in writing but even make them accessible electronically, including images.
GraffDok’s workflow is simple and only requires an EXIF-GPS-tagged photograph of a graffito. It automatically determines its location by using reverse geocoding with the given GPS-coordinates and the Gisgraphy WebService. While asking the user for some more meta data, GraffDok analyses the image in parallel with this and tries to detect fore- and background – before extracting the drawing lines and make them stand alone. The command line based tool ImageMagick is used here as well as for accessing EXIF data.
Any meta data is written to csv-files, which will stay easily accessible and can be integrated in TeX-files as well. The latter ones are converted to PDF at the end of the workflow, containing a table about all graffiti and a summary for each – including the generated characteristic graffiti pattern image.
Boolean constraint solving technology has made tremendous progress over the last decade, leading to industrial-strength solvers, for example, in the areas of answer set programming (ASP), the constraint satisfaction problem (CSP), propositional satisfiability (SAT) and satisfiability of quantified Boolean formulas (QBF). However, in all these areas, there exist multiple solving strategies that work well on different applications; no strategy dominates all other strategies. Therefore, no individual solver shows robust state-of-the-art performance in all kinds of applications. Additionally, the question arises how to choose a well-performing solving strategy for a given application; this is a challenging question even for solver and domain experts. One way to address this issue is the use of portfolio solvers, that is, a set of different solvers or solver configurations. We present three new automatic portfolio methods: (i) automatic construction of parallel portfolio solvers (ACPP) via algorithm configuration,(ii) solving the $NP$-hard problem of finding effective algorithm schedules with Answer Set Programming (aspeed), and (iii) a flexible algorithm selection framework (claspfolio2) allowing for fair comparison of different selection approaches. All three methods show improved performance and robustness in comparison to individual solvers on heterogeneous instance sets from many different applications. Since parallel solvers are important to effectively solve hard problems on parallel computation systems (e.g., multi-core processors), we extend all three approaches to be effectively applicable in parallel settings. We conducted extensive experimental studies different instance sets from ASP, CSP, MAXSAT, Operation Research (OR), SAT and QBF that indicate an improvement in the state-of-the-art solving heterogeneous instance sets. Last but not least, from our experimental studies, we deduce practical advice regarding the question when to apply which of our methods.
This document presents a formula selection system for classical first order theorem proving based on the relevance of formulae for the proof of a conjecture. It is based on unifiability of predicates and is also able to use a linguistic approach for the selection. The scope of the technique is the reduction of the set of formulae and the increase of the amount of provable conjectures in a given time. Since the technique generates a subset of the formula set, it can be used as a preprocessor for automated theorem proving. The document contains the conception, implementation and evaluation of both selection concepts. While the one concept generates a search graph over the negation normal forms or Skolem normal forms of the given formulae, the linguistic concept analyses the formulae and determines frequencies of lexemes and uses a tf-idf weighting algorithm to determine the relevance of the formulae. Though the concept is built for first order logic, it is not limited to it. The concept can be used for higher order and modal logik, too, with minimal adoptions. The system was also evaluated at the world championship of automated theorem provers (CADE ATP Systems Competition, CASC-24) in combination with the leanCoP theorem prover and the evaluation of the results of the CASC and the benchmarks with the problems of the CASC of the year 2012 (CASC-J6) show that the concept of the system has positive impact to the performance of automated theorem provers. Also, the benchmarks with two different theorem provers which use different calculi have shown that the selection is independent from the calculus. Moreover, the concept of TEMPLAR has shown to be competitive to some extent with the concept of SinE and even helped one of the theorem provers to solve problems that were not (or slower) solved with SinE selection in the CASC. Finally, the evaluation implies that the combination of the unification based and linguistic selection yields more improved results though no optimisation was done for the problems.
This document presents an axiom selection technique for classic first order theorem proving based on the relevance of axioms for the proof of a conjecture. It is based on unifiability of predicates and does not need statistical information like symbol frequency. The scope of the technique is the reduction of the set of axioms and the increase of the amount of provable conjectures in a given time. Since the technique generates a subset of the axiom set, it can be used as a preprocessor for automated theorem proving. This technical report describes the conception, implementation and evaluation of ARDE. The selection method, which is based on a breadth-first graph search by unifiability of predicates, is a weakened form of the connection calculus and uses specialised variants or unifiability to speed up the selection. The implementation of the concept is evaluated with comparison to the results of the world championship of theorem provers of the year 2012 (CASC J6). It is shown that both the theorem prover leanCoP which uses the connection calculus and E which uses equality reasoning, can benefit from the selection approach. Also, the evaluation shows that the concept is applyable for theorem proving problems with thousands of formulae and that the selection is independent from the calculus used by the theorem prover.
Deciphering the functioning of biological networks is one of the central tasks in systems biology. In particular, signal transduction networks are crucial for the understanding of the cellular response to external and internal perturbations. Importantly, in order to cope with the complexity of these networks, mathematical and computational modeling is required. We propose a computational modeling framework in order to achieve more robust discoveries in the context of logical signaling networks. More precisely, we focus on modeling the response of logical signaling networks by means of automated reasoning using Answer Set Programming (ASP). ASP provides a declarative language for modeling various knowledge representation and reasoning problems. Moreover, available ASP solvers provide several reasoning modes for assessing the multitude of answer sets. Therefore, leveraging its rich modeling language and its highly efficient solving capacities, we use ASP to address three challenging problems in the context of logical signaling networks: learning of (Boolean) logical networks, experimental design, and identification of intervention strategies. Overall, the contribution of this thesis is three-fold. Firstly, we introduce a mathematical framework for characterizing and reasoning on the response of logical signaling networks. Secondly, we contribute to a growing list of successful applications of ASP in systems biology. Thirdly, we present a software providing a complete pipeline for automated reasoning on the response of logical signaling networks.
The objective and motivation behind this research is to provide applications with easy-to-use interfaces to communities of deaf and functionally illiterate users, which enables them to work without any human assistance. Although recent years have witnessed technological advancements, the availability of technology does not ensure accessibility to information and communication technologies (ICT). Extensive use of text from menus to document contents means that deaf or functionally illiterate can not access services implemented on most computer software. Consequently, most existing computer applications pose an accessibility barrier to those who are unable to read fluently. Online technologies intended for such groups should be developed in continuous partnership with primary users and include a thorough investigation into their limitations, requirements and usability barriers. In this research, I investigated existing tools in voice, web and other multimedia technologies to identify learning gaps and explored ways to enhance the information literacy for deaf and functionally illiterate users. I worked on the development of user-centered interfaces to increase the capabilities of deaf and low literacy users by enhancing lexical resources and by evaluating several multimedia interfaces for them. The interface of the platform-independent Italian Sign Language (LIS) Dictionary has been developed to enhance the lexical resources for deaf users. The Sign Language Dictionary accepts Italian lemmas as input and provides their representation in the Italian Sign Language as output. The Sign Language dictionary has 3082 signs as set of Avatar animations in which each sign is linked to a corresponding Italian lemma. I integrated the LIS lexical resources with MultiWordNet (MWN) database to form the first LIS MultiWordNet(LMWN). LMWN contains information about lexical relations between words, semantic relations between lexical concepts (synsets), correspondences between Italian and sign language lexical concepts and semantic fields (domains). The approach enhances the deaf users’ understanding of written Italian language and shows that a relatively small set of lexicon can cover a significant portion of MWN. Integration of LIS signs with MWN made it useful tool for computational linguistics and natural language processing. The rule-based translation process from written Italian text to LIS has been transformed into service-oriented system. The translation process is composed of various modules including parser, semantic interpreter, generator, and spatial allocation planner. This translation procedure has been implemented in the Java Application Building Center (jABC), which is a framework for extreme model driven design (XMDD). The XMDD approach focuses on bringing software development closer to conceptual design, so that the functionality of a software solution could be understood by someone who is unfamiliar with programming concepts. The transformation addresses the heterogeneity challenge and enhances the re-usability of the system. For enhancing the e-participation of functionally illiterate users, two detailed studies were conducted in the Republic of Rwanda. In the first study, the traditional (textual) interface was compared with the virtual character-based interactive interface. The study helped to identify usability barriers and users evaluated these interfaces according to three fundamental areas of usability, i.e. effectiveness, efficiency and satisfaction. In another study, we developed four different interfaces to analyze the usability and effects of online assistance (consistent help) for functionally illiterate users and compared different help modes including textual, vocal and virtual character on the performance of semi-literate users. In our newly designed interfaces the instructions were automatically translated in Swahili language. All the interfaces were evaluated on the basis of task accomplishment, time consumption, System Usability Scale (SUS) rating and number of times the help was acquired. The results show that the performance of semi-literate users improved significantly when using the online assistance. The dissertation thus introduces a new development approach in which virtual characters are used as additional support for barely literate or naturally challenged users. Such components enhanced the application utility by offering a variety of services like translating contents in local language, providing additional vocal information, and performing automatic translation from text to sign language. Obviously, there is no such thing as one design solution that fits for all in the underlying domain. Context sensitivity, literacy and mental abilities are key factors on which I concentrated and the results emphasize that computer interfaces must be based on a thoughtful definition of target groups, purposes and objectives.
Learning a model for the relationship between the attributes and the annotated labels of data examples serves two purposes. Firstly, it enables the prediction of the label for examples without annotation. Secondly, the parameters of the model can provide useful insights into the structure of the data. If the data has an inherent partitioned structure, it is natural to mirror this structure in the model. Such mixture models predict by combining the individual predictions generated by the mixture components which correspond to the partitions in the data. Often the partitioned structure is latent, and has to be inferred when learning the mixture model. Directly evaluating the accuracy of the inferred partition structure is, in many cases, impossible because the ground truth cannot be obtained for comparison. However it can be assessed indirectly by measuring the prediction accuracy of the mixture model that arises from it. This thesis addresses the interplay between the improvement of predictive accuracy by uncovering latent cluster structure in data, and further addresses the validation of the estimated structure by measuring the accuracy of the resulting predictive model. In the application of filtering unsolicited emails, the emails in the training set are latently clustered into advertisement campaigns. Uncovering this latent structure allows filtering of future emails with very low false positive rates. In order to model the cluster structure, a Bayesian clustering model for dependent binary features is developed in this thesis. Knowing the clustering of emails into campaigns can also aid in uncovering which emails have been sent on behalf of the same network of captured hosts, so-called botnets. This association of emails to networks is another layer of latent clustering. Uncovering this latent structure allows service providers to further increase the accuracy of email filtering and to effectively defend against distributed denial-of-service attacks. To this end, a discriminative clustering model is derived in this thesis that is based on the graph of observed emails. The partitionings inferred using this model are evaluated through their capacity to predict the campaigns of new emails. Furthermore, when classifying the content of emails, statistical information about the sending server can be valuable. Learning a model that is able to make use of it requires training data that includes server statistics. In order to also use training data where the server statistics are missing, a model that is a mixture over potentially all substitutions thereof is developed. Another application is to predict the navigation behavior of the users of a website. Here, there is no a priori partitioning of the users into clusters, but to understand different usage scenarios and design different layouts for them, imposing a partitioning is necessary. The presented approach simultaneously optimizes the discriminative as well as the predictive power of the clusters. Each model is evaluated on real-world data and compared to baseline methods. The results show that explicitly modeling the assumptions about the latent cluster structure leads to improved predictions compared to the baselines. It is beneficial to incorporate a small number of hyperparameters that can be tuned to yield the best predictions in cases where the prediction accuracy can not be optimized directly.
This thesis presents novel ideas and research findings for the Web of Data – a global data space spanning many so-called Linked Open Data sources. Linked Open Data adheres to a set of simple principles to allow easy access and reuse for data published on the Web. Linked Open Data is by now an established concept and many (mostly academic) publishers adopted the principles building a powerful web of structured knowledge available to everybody. However, so far, Linked Open Data does not yet play a significant role among common web technologies that currently facilitate a high-standard Web experience. In this work, we thoroughly discuss the state-of-the-art for Linked Open Data and highlight several shortcomings – some of them we tackle in the main part of this work. First, we propose a novel type of data source meta-information, namely the topics of a dataset. This information could be published with dataset descriptions and support a variety of use cases, such as data source exploration and selection. For the topic retrieval, we present an approach coined Annotated Pattern Percolation (APP), which we evaluate with respect to topics extracted from Wikipedia portals. Second, we contribute to entity linking research by presenting an optimization model for joint entity linking, showing its hardness, and proposing three heuristics implemented in the LINked Data Alignment (LINDA) system. Our first solution can exploit multi-core machines, whereas the second and third approach are designed to run in a distributed shared-nothing environment. We discuss and evaluate the properties of our approaches leading to recommendations which algorithm to use in a specific scenario. The distributed algorithms are among the first of their kind, i.e., approaches for joint entity linking in a distributed fashion. Also, we illustrate that we can tackle the entity linking problem on the very large scale with data comprising more than 100 millions of entity representations from very many sources. Finally, we approach a sub-problem of entity linking, namely the alignment of concepts. We again target a method that looks at the data in its entirety and does not neglect existing relations. Also, this concept alignment method shall execute very fast to serve as a preprocessing for further computations. Our approach, called Holistic Concept Matching (HCM), achieves the required speed through grouping the input by comparing so-called knowledge representations. Within the groups, we perform complex similarity computations, relation conclusions, and detect semantic contradictions. The quality of our result is again evaluated on a large and heterogeneous dataset from the real Web. In summary, this work contributes a set of techniques for enhancing the current state of the Web of Data. All approaches have been tested on large and heterogeneous real-world input.
Cloud computing is a model for enabling on-demand access to a shared pool of computing resources. With virtually limitless on-demand resources, a cloud environment enables the hosted Internet application to quickly cope when there is an increase in the workload. However, the overhead of provisioning resources exposes the Internet application to periods of under-provisioning and performance degradation. Moreover, the performance interference, due to the consolidation in the cloud environment, complicates the performance management of the Internet applications. In this dissertation, we propose two approaches to mitigate the impact of the resources provisioning overhead. The first approach employs control theory to scale resources vertically and cope fast with workload. This approach assumes that the provider has knowledge and control over the platform running in the virtual machines (VMs), which limits it to Platform as a Service (PaaS) and Software as a Service (SaaS) providers. The second approach is a customer-side one that deals with the horizontal scalability in an Infrastructure as a Service (IaaS) model. It addresses the trade-off problem between cost and performance with a multi-goal optimization solution. This approach finds the scale thresholds that achieve the highest performance with the lowest increase in the cost. Moreover, the second approach employs a proposed time series forecasting algorithm to scale the application proactively and avoid under-utilization periods. Furthermore, to mitigate the interference impact on the Internet application performance, we developed a system which finds and eliminates the VMs suffering from performance interference. The developed system is a light-weight solution which does not imply provider involvement. To evaluate our approaches and the designed algorithms at large-scale level, we developed a simulator called (ScaleSim). In the simulator, we implemented scalability components acting as the scalability components of Amazon EC2. The current scalability implementation in Amazon EC2 is used as a reference point for evaluating the improvement in the scalable application performance. ScaleSim is fed with realistic models of the RUBiS benchmark extracted from the real environment. The workload is generated from the access logs of the 1998 world cup website. The results show that optimizing the scalability thresholds and adopting proactive scalability can mitigate 88% of the resources provisioning overhead impact with only a 9% increase in the cost.
This thesis proposes a privacy protection framework for the controlled distribution and use of personal private data. The framework is based on the idea that privacy policies can be set directly by the data owner and can be automatically enforced against the data user. Data privacy continues to be a very important topic, as our dependency on electronic communication maintains its current growth, and private data is shared between multiple devices, users and locations. The growing amount and the ubiquitous availability of personal private data increases the likelihood of data misuse. Early privacy protection techniques, such as anonymous email and payment systems have focused on data avoidance and anonymous use of services. They did not take into account that data sharing cannot be avoided when people participate in electronic communication scenarios that involve social interactions. This leads to a situation where data is shared widely and uncontrollably and in most cases the data owner has no control over further distribution and use of personal private data. Previous efforts to integrate privacy awareness into data processing workflows have focused on the extension of existing access control frameworks with privacy aware functions or have analysed specific individual problems such as the expressiveness of policy languages. So far, very few implementations of integrated privacy protection mechanisms exist and can be studied to prove their effectiveness for privacy protection. Second level issues that stem from practical application of the implemented mechanisms, such as usability, life-time data management and changes in trustworthiness have received very little attention so far, mainly because they require actual implementations to be studied. Most existing privacy protection schemes silently assume that it is the privilege of the data user to define the contract under which personal private data is released. Such an approach simplifies policy management and policy enforcement for the data user, but leaves the data owner with a binary decision to submit or withhold his or her personal data based on the provided policy. We wanted to empower the data owner to express his or her privacy preferences through privacy policies that follow the so-called Owner-Retained Access Control (ORAC) model. ORAC has been proposed by McCollum, et al. as an alternate access control mechanism that leaves the authority over access decisions by the originator of the data. The data owner is given control over the release policy for his or her personal data, and he or she can set permissions or restrictions according to individually perceived trust values. Such a policy needs to be expressed in a coherent way and must allow the deterministic policy evaluation by different entities. The privacy policy also needs to be communicated from the data owner to the data user, so that it can be enforced. Data and policy are stored together as a Protected Data Object that follows the Sticky Policy paradigm as defined by Mont, et al. and others. We developed a unique policy combination approach that takes usability aspects for the creation and maintenance of policies into consideration. Our privacy policy consists of three parts: A Default Policy provides basic privacy protection if no specific rules have been entered by the data owner. An Owner Policy part allows the customisation of the default policy by the data owner. And a so-called Safety Policy guarantees that the data owner cannot specify disadvantageous policies, which, for example, exclude him or her from further access to the private data. The combined evaluation of these three policy-parts yields the necessary access decision. The automatic enforcement of privacy policies in our protection framework is supported by a reference monitor implementation. We started our work with the development of a client-side protection mechanism that allows the enforcement of data-use restrictions after private data has been released to the data user. The client-side enforcement component for data-use policies is based on a modified Java Security Framework. Privacy policies are translated into corresponding Java permissions that can be automatically enforced by the Java Security Manager. When we later extended our work to implement server-side protection mechanisms, we found several drawbacks for the privacy enforcement through the Java Security Framework. We solved this problem by extending our reference monitor design to use Aspect-Oriented Programming (AOP) and the Java Reflection API to intercept data accesses in existing applications and provide a way to enforce data owner-defined privacy policies for business applications.
3D from 2D touch
(2013)
While interaction with computers used to be dominated by mice and keyboards, new types of sensors now allow users to interact through touch, speech, or using their whole body in 3D space. These new interaction modalities are often referred to as "natural user interfaces" or "NUIs." While 2D NUIs have experienced major success on billions of mobile touch devices sold, 3D NUI systems have so far been unable to deliver a mobile form factor, mainly due to their use of cameras. The fact that cameras require a certain distance from the capture volume has prevented 3D NUI systems from reaching the flat form factor mobile users expect. In this dissertation, we address this issue by sensing 3D input using flat 2D sensors. The systems we present observe the input from 3D objects as 2D imprints upon physical contact. By sampling these imprints at very high resolutions, we obtain the objects' textures. In some cases, a texture uniquely identifies a biometric feature, such as the user's fingerprint. In other cases, an imprint stems from the user's clothing, such as when walking on multitouch floors. By analyzing from which part of the 3D object the 2D imprint results, we reconstruct the object's pose in 3D space. While our main contribution is a general approach to sensing 3D input on 2D sensors upon physical contact, we also demonstrate three applications of our approach. (1) We present high-accuracy touch devices that allow users to reliably touch targets that are a third of the size of those on current touch devices. We show that different users and 3D finger poses systematically affect touch sensing, which current devices perceive as random input noise. We introduce a model for touch that compensates for this systematic effect by deriving the 3D finger pose and the user's identity from each touch imprint. We then investigate this systematic effect in detail and explore how users conceptually touch targets. Our findings indicate that users aim by aligning visual features of their fingers with the target. We present a visual model for touch input that eliminates virtually all systematic effects on touch accuracy. (2) From each touch, we identify users biometrically by analyzing their fingerprints. Our prototype Fiberio integrates fingerprint scanning and a display into the same flat surface, solving a long-standing problem in human-computer interaction: secure authentication on touchscreens. Sensing 3D input and authenticating users upon touch allows Fiberio to implement a variety of applications that traditionally require the bulky setups of current 3D NUI systems. (3) To demonstrate the versatility of 3D reconstruction on larger touch surfaces, we present a high-resolution pressure-sensitive floor that resolves the texture of objects upon touch. Using the same principles as before, our system GravitySpace analyzes all imprints and identifies users based on their shoe soles, detects furniture, and enables accurate touch input using feet. By classifying all imprints, GravitySpace detects the users' body parts that are in contact with the floor and then reconstructs their 3D body poses using inverse kinematics. GravitySpace thus enables a range of applications for future 3D NUI systems based on a flat sensor, such as smart rooms in future homes. We conclude this dissertation by projecting into the future of mobile devices. Focusing on the mobility aspect of our work, we explore how NUI devices may one day augment users directly in the form of implanted devices.
Interactive rendering techniques for focus+context visualization of 3D geovirtual environments
(2013)
This thesis introduces a collection of new real-time rendering techniques and applications for focus+context visualization of interactive 3D geovirtual environments such as virtual 3D city and landscape models. These environments are generally characterized by a large number of objects and are of high complexity with respect to geometry and textures. For these reasons, their interactive 3D rendering represents a major challenge. Their 3D depiction implies a number of weaknesses such as occlusions, cluttered image contents, and partial screen-space usage. To overcome these limitations and, thus, to facilitate the effective communication of geo-information, principles of focus+context visualization can be used for the design of real-time 3D rendering techniques for 3D geovirtual environments (see Figure). In general, detailed views of a 3D geovirtual environment are combined seamlessly with abstracted views of the context within a single image. To perform the real-time image synthesis required for interactive visualization, dedicated parallel processors (GPUs) for rasterization of computer graphics primitives are used. For this purpose, the design and implementation of appropriate data structures and rendering pipelines are necessary. The contribution of this work comprises the following five real-time rendering methods: • The rendering technique for 3D generalization lenses enables the combination of different 3D city geometries (e.g., generalized versions of a 3D city model) in a single image in real time. The method is based on a generalized and fragment-precise clipping approach, which uses a compressible, raster-based data structure. It enables the combination of detailed views in the focus area with the representation of abstracted variants in the context area. • The rendering technique for the interactive visualization of dynamic raster data in 3D geovirtual environments facilitates the rendering of 2D surface lenses. It enables a flexible combination of different raster layers (e.g., aerial images or videos) using projective texturing for decoupling image and geometry data. Thus, various overlapping and nested 2D surface lenses of different contents can be visualized interactively. • The interactive rendering technique for image-based deformation of 3D geovirtual environments enables the real-time image synthesis of non-planar projections, such as cylindrical and spherical projections, as well as multi-focal 3D fisheye-lenses and the combination of planar and non-planar projections. • The rendering technique for view-dependent multi-perspective views of 3D geovirtual environments, based on the application of global deformations to the 3D scene geometry, can be used for synthesizing interactive panorama maps to combine detailed views close to the camera (focus) with abstract views in the background (context). This approach reduces occlusions, increases the usage the available screen space, and reduces the overload of image contents. • The object-based and image-based rendering techniques for highlighting objects and focus areas inside and outside the view frustum facilitate preattentive perception. The concepts and implementations of interactive image synthesis for focus+context visualization and their selected applications enable a more effective communication of spatial information, and provide building blocks for design and development of new applications and systems in the field of 3D geovirtual environments.
The field of machine learning studies algorithms that infer predictive models from data. Predictive models are applicable for many practical tasks such as spam filtering, face and handwritten digit recognition, and personalized product recommendation. In general, they are used to predict a target label for a given data instance. In order to make an informed decision about the deployment of a predictive model, it is crucial to know the model’s approximate performance. To evaluate performance, a set of labeled test instances is required that is drawn from the distribution the model will be exposed to at application time. In many practical scenarios, unlabeled test instances are readily available, but the process of labeling them can be a time- and cost-intensive task and may involve a human expert. This thesis addresses the problem of evaluating a given predictive model accurately with minimal labeling effort. We study an active model evaluation process that selects certain instances of the data according to an instrumental sampling distribution and queries their labels. We derive sampling distributions that minimize estimation error with respect to different performance measures such as error rate, mean squared error, and F-measures. An analysis of the distribution that governs the estimator leads to confidence intervals, which indicate how precise the error estimation is. Labeling costs may vary across different instances depending on certain characteristics of the data. For instance, documents differ in their length, comprehensibility, and technical requirements; these attributes affect the time a human labeler needs to judge relevance or to assign topics. To address this, the sampling distribution is extended to incorporate instance-specific costs. We empirically study conditions under which the active evaluation processes are more accurate than a standard estimate that draws equally many instances from the test distribution. We also address the problem of comparing the risks of two predictive models. The standard approach would be to draw instances according to the test distribution, label the selected instances, and apply statistical tests to identify significant differences. Drawing instances according to an instrumental distribution affects the power of a statistical test. We derive a sampling procedure that maximizes test power when used to select instances, and thereby minimizes the likelihood of choosing the inferior model. Furthermore, we investigate the task of comparing several alternative models; the objective of an evaluation could be to rank the models according to the risk that they incur or to identify the model with lowest risk. An experimental study shows that the active procedure leads to higher test power than the standard test in many application domains. Finally, we study the problem of evaluating the performance of ranking functions, which are used for example for web search. In practice, ranking performance is estimated by applying a given ranking model to a representative set of test queries and manually assessing the relevance of all retrieved items for each query. We apply the concepts of active evaluation and active comparison to ranking functions and derive optimal sampling distributions for the commonly used performance measures Discounted Cumulative Gain and Expected Reciprocal Rank. Experiments on web search engine data illustrate significant reductions in labeling costs.
This talk will describe My Digital Life (TU100), a distance learning module that introduces computer science through immediate engagement with ubiquitous computing (ubicomp). This talk will describe some of the principles and concepts we have adopted for this modern computing introduction: the idea of the ‘informed digital citizen’; engagement through narrative; playful pedagogy; making the power of ubicomp available to novices; setting technical skills in real contexts. It will also trace how the pedagogy is informed by experiences and research in Computer Science education.
A survey has been carried out in the Computer Science (CS) department at the University of Baghdad to investigate the attitudes of CS students in a female dominant environment, showing the differences between male and female students in different academic years. We also compare the attitudes of the freshman students of two different cultures (University of Baghdad, Iraq, and the University of Potsdam).
Scientific writing is an important skill for computer science and computer engineering professionals. In this paper we present a writing concept across the curriculum program directed towards scientific writing. The program is built around a hierarchy of learning outcomes. The hierarchy is constructed through analyzing the learning outcomes in relation to competencies that are needed to fulfill them.
Nowadays, model-driven engineering (MDE) promises to ease software development by decreasing the inherent complexity of classical software development. In order to deliver on this promise, MDE increases the level of abstraction and automation, through a consideration of domain-specific models (DSMs) and model operations (e.g. model transformations or code generations). DSMs conform to domain-specific modeling languages (DSMLs), which increase the level of abstraction, and model operations are first-class entities of software development because they increase the level of automation. Nevertheless, MDE has to deal with at least two new dimensions of complexity, which are basically caused by the increased linguistic and technological heterogeneity. The first dimension of complexity is setting up an MDE environment, an activity comprised of the implementation or selection of DSMLs and model operations. Setting up an MDE environment is both time-consuming and error-prone because of the implementation or adaptation of model operations. The second dimension of complexity is concerned with applying MDE for actual software development. Applying MDE is challenging because a collection of DSMs, which conform to potentially heterogeneous DSMLs, are required to completely specify a complex software system. A single DSML can only be used to describe a specific aspect of a software system at a certain level of abstraction and from a certain perspective. Additionally, DSMs are usually not independent but instead have inherent interdependencies, reflecting (partial) similar aspects of a software system at different levels of abstraction or from different perspectives. A subset of these dependencies are applications of various model operations, which are necessary to keep the degree of automation high. This becomes even worse when addressing the first dimension of complexity. Due to continuous changes, all kinds of dependencies, including the applications of model operations, must also be managed continuously. This comprises maintaining the existence of these dependencies and the appropriate (re-)application of model operations. The contribution of this thesis is an approach that combines traceability and model management to address the aforementioned challenges of configuring and applying MDE for software development. The approach is considered as a traceability approach because it supports capturing and automatically maintaining dependencies between DSMs. The approach is considered as a model management approach because it supports managing the automated (re-)application of heterogeneous model operations. In addition, the approach is considered as a comprehensive model management. Since the decomposition of model operations is encouraged to alleviate the first dimension of complexity, the subsequent composition of model operations is required to counteract their fragmentation. A significant portion of this thesis concerns itself with providing a method for the specification of decoupled yet still highly cohesive complex compositions of heterogeneous model operations. The approach supports two different kinds of compositions - data-flow compositions and context compositions. Data-flow composition is used to define a network of heterogeneous model operations coupled by sharing input and output DSMs alone. Context composition is related to a concept used in declarative model transformation approaches to compose individual model transformation rules (units) at any level of detail. In this thesis, context composition provides the ability to use a collection of dependencies as context for the composition of other dependencies, including model operations. In addition, the actual implementation of model operations, which are going to be composed, do not need to implement any composition concerns. The approach is realized by means of a formalism called an executable and dynamic hierarchical megamodel, based on the original idea of megamodels. This formalism supports specifying compositions of dependencies (traceability and model operations). On top of this formalism, traceability is realized by means of a localization concept, and model management by means of an execution concept.
Virtual 3D city and landscape models are the main subject investigated in this thesis. They digitally represent urban space and have many applications in different domains, e.g., simulation, cadastral management, and city planning. Visualization is an elementary component of these applications. Photo-realistic visualization with an increasingly high degree of detail leads to fundamental problems for comprehensible visualization. A large number of highly detailed and textured objects within a virtual 3D city model may create visual noise and overload the users with information. Objects are subject to perspective foreshortening and may be occluded or not displayed in a meaningful way, as they are too small. In this thesis we present abstraction techniques that automatically process virtual 3D city and landscape models to derive abstracted representations. These have a reduced degree of detail, while essential characteristics are preserved. After introducing definitions for model, scale, and multi-scale representations, we discuss the fundamentals of map generalization as well as techniques for 3D generalization. The first presented technique is a cell-based generalization of virtual 3D city models. It creates abstract representations that have a highly reduced level of detail while maintaining essential structures, e.g., the infrastructure network, landmark buildings, and free spaces. The technique automatically partitions the input virtual 3D city model into cells based on the infrastructure network. The single building models contained in each cell are aggregated to abstracted cell blocks. Using weighted infrastructure elements, cell blocks can be computed on different hierarchical levels, storing the hierarchy relation between the cell blocks. Furthermore, we identify initial landmark buildings within a cell by comparing the properties of individual buildings with the aggregated properties of the cell. For each block, the identified landmark building models are subtracted using Boolean operations and integrated in a photo-realistic way. Finally, for the interactive 3D visualization we discuss the creation of the virtual 3D geometry and their appearance styling through colors, labeling, and transparency. We demonstrate the technique with example data sets. Additionally, we discuss applications of generalization lenses and transitions between abstract representations. The second technique is a real-time-rendering technique for geometric enhancement of landmark objects within a virtual 3D city model. Depending on the virtual camera distance, landmark objects are scaled to ensure their visibility within a specific distance interval while deforming their environment. First, in a preprocessing step a landmark hierarchy is computed, this is then used to derive distance intervals for the interactive rendering. At runtime, using the virtual camera distance, a scaling factor is computed and applied to each landmark. The scaling factor is interpolated smoothly at the interval boundaries using cubic Bézier splines. Non-landmark geometry that is near landmark objects is deformed with respect to a limited number of landmarks. We demonstrate the technique by applying it to a highly detailed virtual 3D city model and a generalized 3D city model. In addition we discuss an adaptation of the technique for non-linear projections and mobile devices. The third technique is a real-time rendering technique to create abstract 3D isocontour visualization of virtual 3D terrain models. The virtual 3D terrain model is visualized as a layered or stepped relief. The technique works without preprocessing and, as it is implemented using programmable graphics hardware, can be integrated with minimal changes into common terrain rendering techniques. Consequently, the computation is done in the rendering pipeline for each vertex, primitive, i.e., triangle, and fragment. For each vertex, the height is quantized to the nearest isovalue. For each triangle, the vertex configuration with respect to their isovalues is determined first. Using the configuration, the triangle is then subdivided. The subdivision forms a partial step geometry aligned with the triangle. For each fragment, the surface appearance is determined, e.g., depending on the surface texture, shading, and height-color-mapping. Flexible usage of the technique is demonstrated with applications from focus+context visualization, out-of-core terrain rendering, and information visualization. This thesis presents components for the creation of abstract representations of virtual 3D city and landscape models. Re-using visual language from cartography, the techniques enable users to build on their experience with maps when interpreting these representations. Simultaneously, characteristics of 3D geovirtual environments are taken into account by addressing and discussing, e.g., continuous scale, interaction, and perspective.
In the early days of computer graphics, research was mainly driven by the goal to create realistic synthetic imagery. By contrast, non-photorealistic computer graphics, established as its own branch of computer graphics in the early 1990s, is mainly motivated by concepts and principles found in traditional art forms, such as painting, illustration, and graphic design, and it investigates concepts and techniques that abstract from reality using expressive, stylized, or illustrative rendering techniques. This thesis focuses on the artistic stylization of two-dimensional content and presents several novel automatic techniques for the creation of simplified stylistic illustrations from color images, video, and 3D renderings. Primary innovation of these novel techniques is that they utilize the smooth structure tensor as a simple and efficient way to obtain information about the local structure of an image. More specifically, this thesis contributes to knowledge in this field in the following ways. First, a comprehensive review of the structure tensor is provided. In particular, different methods for integrating the minor eigenvector field of the smoothed structure tensor are developed, and the superiority of the smoothed structure tensor over the popular edge tangent flow is demonstrated. Second, separable implementations of the popular bilateral and difference of Gaussians filters that adapt to the local structure are presented. These filters avoid artifacts while being computationally highly efficient. Taken together, both provide an effective way to create a cartoon-style effect. Third, a generalization of the Kuwahara filter is presented that avoids artifacts by adapting the shape, scale, and orientation of the filter to the local structure. This causes directional image features to be better preserved and emphasized, resulting in overall sharper edges and a more feature-abiding painterly effect. In addition to the single-scale variant, a multi-scale variant is presented, which is capable of performing a highly aggressive abstraction. Fourth, a technique that builds upon the idea of combining flow-guided smoothing with shock filtering is presented, allowing for an aggressive exaggeration and an emphasis of directional image features. All presented techniques are suitable for temporally coherent per-frame filtering of video or dynamic 3D renderings, without requiring expensive extra processing, such as optical flow. Moreover, they can be efficiently implemented to process content in real-time on a GPU.
In many applications one is faced with the problem of inferring some functional relation between input and output variables from given data. Consider, for instance, the task of email spam filtering where one seeks to find a model which automatically assigns new, previously unseen emails to class spam or non-spam. Building such a predictive model based on observed training inputs (e.g., emails) with corresponding outputs (e.g., spam labels) is a major goal of machine learning. Many learning methods assume that these training data are governed by the same distribution as the test data which the predictive model will be exposed to at application time. That assumption is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for instance, in the above example of email spam filtering. Here, email service providers employ spam filters and spam senders engineer campaign templates such as to achieve a high rate of successful deliveries despite any filters. Most of the existing work casts such situations as learning robust models which are unsusceptible against small changes of the data generation process. The models are constructed under the worst-case assumption that these changes are performed such to produce the highest possible adverse effect on the performance of the predictive model. However, this approach is not capable to realistically model the true dependency between the model-building process and the process of generating future data. We therefore establish the concept of prediction games: We model the interaction between a learner, who builds the predictive model, and a data generator, who controls the process of data generation, as an one-shot game. The game-theoretic framework enables us to explicitly model the players' interests, their possible actions, their level of knowledge about each other, and the order at which they decide for an action. We model the players' interests as minimizing their own cost function which both depend on both players' actions. The learner's action is to choose the model parameters and the data generator's action is to perturbate the training data which reflects the modification of the data generation process with respect to the past data. We extensively study three instances of prediction games which differ regarding the order in which the players decide for their action. We first assume that both player choose their actions simultaneously, that is, without the knowledge of their opponent's decision. We identify conditions under which this Nash prediction game has a meaningful solution, that is, a unique Nash equilibrium, and derive algorithms that find the equilibrial prediction model. As a second case, we consider a data generator who is potentially fully informed about the move of the learner. This setting establishes a Stackelberg competition. We derive a relaxed optimization criterion to determine the solution of this game and show that this Stackelberg prediction game generalizes existing prediction models. Finally, we study the setting where the learner observes the data generator's action, that is, the (unlabeled) test data, before building the predictive model. As the test data and the training data may be governed by differing probability distributions, this scenario reduces to learning under covariate shift. We derive a new integrated as well as a two-stage method to account for this data set shift. In case studies on email spam filtering we empirically explore properties of all derived models as well as several existing baseline methods. We show that spam filters resulting from the Nash prediction game as well as the Stackelberg prediction game in the majority of cases outperform other existing baseline methods.
Business process models are used within a range of organizational initiatives, where every stakeholder has a unique perspective on a process and demands the respective model. As a consequence, multiple process models capturing the very same business process coexist. Keeping such models in sync is a challenge within an ever changing business environment: once a process is changed, all its models have to be updated. Due to a large number of models and their complex relations, model maintenance becomes error-prone and expensive. Against this background, business process model abstraction emerged as an operation reducing the number of stored process models and facilitating model management. Business process model abstraction is an operation preserving essential process properties and leaving out insignificant details in order to retain information relevant for a particular purpose. Process model abstraction has been addressed by several researchers. The focus of their studies has been on particular use cases and model transformations supporting these use cases. This thesis systematically approaches the problem of business process model abstraction shaping the outcome into a framework. We investigate the current industry demand in abstraction summarizing it in a catalog of business process model abstraction use cases. The thesis focuses on one prominent use case where the user demands a model with coarse-grained activities and overall process ordering constraints. We develop model transformations that support this use case starting with the transformations based on process model structure analysis. Further, abstraction methods considering the semantics of process model elements are investigated. First, we suggest how semantically related activities can be discovered in process models-a barely researched challenge. The thesis validates the designed abstraction methods against sets of industrial process models and discusses the method implementation aspects. Second, we develop a novel model transformation, which combined with the related activity discovery allows flexible non-hierarchical abstraction. In this way this thesis advocates novel model transformations that facilitate business process model management and provides the foundations for innovative tool support.
The constantly growing capacity of reconfigurable devices allows simultaneous execution of complex applications on those devices. The mere diversity of applications deems it impossible to design an interconnection network matching the requirements of every possible application perfectly, leading to suboptimal performance in many cases. However, the architecture of the interconnection network is not the only aspect affecting performance of communication. The resource manager places applications on the device and therefore influences latency between communicating partners and overall network load. Communication protocols affect performance by introducing data and processing overhead putting higher load on the network and increasing resource demand. Approaching communication holistically not only considers the architecture of the interconnect, but communication-aware resource management, communication protocols and resource usage just as well. Incorporation of different parts of a reconfigurable system during design- and runtime and optimizing them with respect to communication demand results in more resource efficient communication. Extensive evaluation shows enhanced performance and flexibility, if communication on reconfigurable devices is regarded in a holistic fashion.
Parsability approaches of several grammar formalisms generating also non-context-free languages are explored. Chomsky grammars, Lindenmayer systems, grammars with controlled derivations, and grammar systems are treated. Formal properties of these mechanisms are investigated, when they are used as language acceptors. Furthermore, cooperating distributed grammar systems are restricted so that efficient deterministic parsing without backtracking becomes possible. For this class of grammar systems, the parsing algorithm is presented and the feature of leftmost derivations is investigated in detail.
Biology has made great progress in identifying and measuring the building blocks of life. The availability of high-throughput methods in molecular biology has dramatically accelerated the growth of biological knowledge for various organisms. The advancements in genomic, proteomic and metabolomic technologies allow for constructing complex models of biological systems. An increasing number of biological repositories is available on the web, incorporating thousands of biochemical reactions and genetic regulations. Systems Biology is a recent research trend in life science, which fosters a systemic view on biology. In Systems Biology one is interested in integrating the knowledge from all these different sources into models that capture the interaction of these entities. By studying these models one wants to understand the emerging properties of the whole system, such as robustness. However, both measurements as well as biological networks are prone to considerable incompleteness, heterogeneity and mutual inconsistency, which makes it highly non-trivial to draw biologically meaningful conclusions in an automated way. Therefore, we want to promote Answer Set Programming (ASP) as a tool for discrete modeling in Systems Biology. ASP is a declarative problem solving paradigm, in which a problem is encoded as a logic program such that its answer sets represent solutions to the problem. ASP has intrinsic features to cope with incompleteness, offers a rich modeling language and highly efficient solving technology. We present ASP solutions, for the analysis of genetic regulatory networks, determining consistency with observed measurements and identifying minimal causes for inconsistency. We extend this approach for computing minimal repairs on model and data that restore consistency. This method allows for predicting unobserved data even in case of inconsistency. Further, we present an ASP approach to metabolic network expansion. This approach exploits the easy characterization of reachability in ASP and its various reasoning methods, to explore the biosynthetic capabilities of metabolic reaction networks and generate hypotheses for extending the network. Finally, we present the BioASP library, a Python library which encapsulates our ASP solutions into the imperative programming paradigm. The library allows for an easy integration of ASP solution into system rich environments, as they exist in Systems Biology.
Most of the microelectronic circuits fabricated today are synchronous, i.e. they are driven by one or several clock signals. Synchronous circuit design faces several fundamental challenges such as high-speed clock distribution, integration of multiple cores operating at different clock rates, reduction of power consumption and dealing with voltage, temperature, manufacturing and runtime variations. Asynchronous or clockless design plays a key role in alleviating these challenges, however the design and test of asynchronous circuits is much more difficult in comparison to their synchronous counterparts. A driving force for a widespread use of asynchronous technology is the availability of mature EDA (Electronic Design Automation) tools which provide an entire automated design flow starting from an HDL (Hardware Description Language) specification yielding the final circuit layout. Even though there was much progress in developing such EDA tools for asynchronous circuit design during the last two decades, the maturity level as well as the acceptance of them is still not comparable with tools for synchronous circuit design. In particular, logic synthesis (which implies the application of Boolean minimisation techniques) for the entire system's control path can significantly improve the efficiency of the resulting asynchronous implementation, e.g. in terms of chip area and performance. However, logic synthesis, in particular for asynchronous circuits, suffers from complexity problems. Signal Transitions Graphs (STGs) are labelled Petri nets which are a widely used to specify the interface behaviour of speed independent (SI) circuits - a robust subclass of asynchronous circuits. STG decomposition is a promising approach to tackle complexity problems like state space explosion in logic synthesis of SI circuits. The (structural) decomposition of STGs is guided by a partition of the output signals and generates a usually much smaller component STG for each partition member, i.e. a component STG with a much smaller state space than the initial specification. However, decomposition can result in component STGs that in isolation have so-called irreducible CSC conflicts (i.e. these components are not SI synthesisable anymore) even if the specification has none of them. A new approach is presented to avoid such conflicts by introducing internal communication between the components. So far, STG decompositions are guided by the finest output partitions, i.e. one output per component. However, this might not yield optimal circuit implementations. Efficient heuristics are presented to determine coarser partitions leading to improved circuits in terms of chip area. For the new algorithms correctness proofs are given and their implementations are incorporated into the decomposition tool DESIJ. The presented techniques are successfully applied to some benchmarks - including 'real-life' specifications arising in the context of control resynthesis - which delivered promising results.
Service-oriented Architectures (SOA) facilitate the provision and orchestration of business services to enable a faster adoption to changing business demands. Web Services provide a technical foundation to implement this paradigm on the basis of XML-messaging. However, the enhanced flexibility of message-based systems comes along with new threats and risks. To face these issues, a variety of security mechanisms and approaches is supported by the Web Service specifications. The usage of these security mechanisms and protocols is configured by stating security requirements in security policies. However, security policy languages for SOA are complex and difficult to create due to the expressiveness of these languages. To facilitate and simplify the creation of security policies, this thesis presents a model-driven approach that enables the generation of complex security policies on the basis of simple security intentions. SOA architects can specify these intentions in system design models and are not required to deal with complex technical security concepts. The approach introduced in this thesis enables the enhancement of any system design modelling languages – for example FMC or BPMN – with security modelling elements. The syntax, semantics, and notion of these elements is defined by our security modelling language SecureSOA. The metamodel of this language provides extension points to enable the integration into system design modelling languages. In particular, this thesis demonstrates the enhancement of FMC block diagrams with SecureSOA. To enable the model-driven generation of security policies, a domain-independent policy model is introduced in this thesis. This model provides an abstraction layer for security policies. Mappings are used to perform the transformation from our model to security policy languages. However, expert knowledge is required to generate instances of this model on the basis of simple security intentions. Appropriate security mechanisms, protocols and options must be chosen and combined to fulfil these security intentions. In this thesis, a formalised system of security patterns is used to represent this knowledge and to enable an automated transformation process. Moreover, a domain-specific language is introduced to state security patterns in an accessible way. On the basis of this language, a system of security configuration patterns is provided to transform security intentions related to data protection and identity management. The formal semantics of the security pattern language enable the verification of the transformation process introduced in this thesis and prove the correctness of the pattern application. Finally, our SOA Security LAB is presented that demonstrates the application of our model-driven approach to facilitate a dynamic creation, configuration, and execution of secure Web Service-based composed applications.
Structuring process models
(2012)
One can fairly adopt the ideas of Donald E. Knuth to conclude that process modeling is both a science and an art. Process modeling does have an aesthetic sense. Similar to composing an opera or writing a novel, process modeling is carried out by humans who undergo creative practices when engineering a process model. Therefore, the very same process can be modeled in a myriad number of ways. Once modeled, processes can be analyzed by employing scientific methods. Usually, process models are formalized as directed graphs, with nodes representing tasks and decisions, and directed arcs describing temporal constraints between the nodes. Common process definition languages, such as Business Process Model and Notation (BPMN) and Event-driven Process Chain (EPC) allow process analysts to define models with arbitrary complex topologies. The absence of structural constraints supports creativity and productivity, as there is no need to force ideas into a limited amount of available structural patterns. Nevertheless, it is often preferable that models follow certain structural rules. A well-known structural property of process models is (well-)structuredness. A process model is (well-)structured if and only if every node with multiple outgoing arcs (a split) has a corresponding node with multiple incoming arcs (a join), and vice versa, such that the set of nodes between the split and the join induces a single-entry-single-exit (SESE) region; otherwise the process model is unstructured. The motivations for well-structured process models are manifold: (i) Well-structured process models are easier to layout for visual representation as their formalizations are planar graphs. (ii) Well-structured process models are easier to comprehend by humans. (iii) Well-structured process models tend to have fewer errors than unstructured ones and it is less probable to introduce new errors when modifying a well-structured process model. (iv) Well-structured process models are better suited for analysis with many existing formal techniques applicable only for well-structured process models. (v) Well-structured process models are better suited for efficient execution and optimization, e.g., when discovering independent regions of a process model that can be executed concurrently. Consequently, there are process modeling languages that encourage well-structured modeling, e.g., Business Process Execution Language (BPEL) and ADEPT. However, the well-structured process modeling implies some limitations: (i) There exist processes that cannot be formalized as well-structured process models. (ii) There exist processes that when formalized as well-structured process models require a considerable duplication of modeling constructs. Rather than expecting well-structured modeling from start, we advocate for the absence of structural constraints when modeling. Afterwards, automated methods can suggest, upon request and whenever possible, alternative formalizations that are "better" structured, preferably well-structured. In this thesis, we study the problem of automatically transforming process models into equivalent well-structured models. The developed transformations are performed under a strong notion of behavioral equivalence which preserves concurrency. The findings are implemented in a tool, which is publicly available.
Answer Set Programming (ASP) is an emerging paradigm for declarative programming, in which a computational problem is specified by a logic program such that particular models, called answer sets, match solutions. ASP faces a growing range of applications, demanding for high-performance tools able to solve complex problems. ASP integrates ideas from a variety of neighboring fields. In particular, automated techniques to search for answer sets are inspired by Boolean Satisfiability (SAT) solving approaches. While the latter have firm proof-theoretic foundations, ASP lacks formal frameworks for characterizing and comparing solving methods. Furthermore, sophisticated search patterns of modern SAT solvers, successfully applied in areas like, e.g., model checking and verification, are not yet established in ASP solving. We address these deficiencies by, for one, providing proof-theoretic frameworks that allow for characterizing, comparing, and analyzing approaches to answer set computation. For another, we devise modern ASP solving algorithms that integrate and extend state-of-the-art techniques for Boolean constraint solving. We thus contribute to the understanding of existing ASP solving approaches and their interconnections as well as to their enhancement by incorporating sophisticated search patterns. The central idea of our approach is to identify atomic as well as composite constituents of a propositional logic program with Boolean variables. This enables us to describe fundamental inference steps, and to selectively combine them in proof-theoretic characterizations of various ASP solving methods. In particular, we show that different concepts of case analyses applied by existing ASP solvers implicate mutual exponential separations regarding their best-case complexities. We also develop a generic proof-theoretic framework amenable to language extensions, and we point out that exponential separations can likewise be obtained due to case analyses on them. We further exploit fundamental inference steps to derive Boolean constraints characterizing answer sets. They enable the conception of ASP solving algorithms including search patterns of modern SAT solvers, while also allowing for direct technology transfers between the areas of ASP and SAT solving. Beyond the search for one answer set of a logic program, we address the enumeration of answer sets and their projections to a subvocabulary, respectively. The algorithms we develop enable repetition-free enumeration in polynomial space without being intrusive, i.e., they do not necessitate any modifications of computations before an answer set is found. Our approach to ASP solving is implemented in clasp, a state-of-the-art Boolean constraint solver that has successfully participated in recent solver competitions. Although we do here not address the implementation techniques of clasp or all of its features, we present the principles of its success in the context of ASP solving.
The modeling and evaluation calculus FMC-QE, the Fundamental Modeling Concepts for Quanti-tative Evaluation [1], extends the Fundamental Modeling Concepts (FMC) for performance modeling and prediction. In this new methodology, the hierarchical service requests are in the main focus, because they are the origin of every service provisioning process. Similar to physics, these service requests are a tuple of value and unit, which enables hierarchical service request transformations at the hierarchical borders and therefore the hierarchical modeling. Through reducing the model complexity of the models by decomposing the system in different hierarchical views, the distinction between operational and control states and the calculation of the performance values on the assumption of the steady state, FMC-QE has a scalable applica-bility on complex systems. According to FMC, the system is modeled in a 3-dimensional hierarchical representation space, where system performance parameters are described in three arbitrarily fine-grained hierarchi-cal bipartite diagrams. The hierarchical service request structures are modeled in Entity Relationship Diagrams. The static server structures, divided into logical and real servers, are de-scribed as Block Diagrams. The dynamic behavior and the control structures are specified as Petri Nets, more precisely Colored Time Augmented Petri Nets. From the structures and pa-rameters of the performance model, a hierarchical set of equations is derived. The calculation of the performance values is done on the assumption of stationary processes and is based on fundamental laws of the performance analysis: Little's Law and the Forced Traffic Flow Law. Little's Law is used within the different hierarchical levels (horizontal) and the Forced Traffic Flow Law is the key to the dependencies among the hierarchical levels (vertical). This calculation is suitable for complex models and allows a fast (re-)calculation of different performance scenarios in order to support development and configuration decisions. Within the Research Group Zorn at the Hasso Plattner Institute, the work is embedded in a broader research in the development of FMC-QE. While this work is concentrated on the theoretical background, description and definition of the methodology as well as the extension and validation of the applicability, other topics are in the development of an FMC-QE modeling and evaluation tool and the usage of FMC-QE in the design of an adaptive transport layer in order to fulfill Quality of Service and Service Level Agreements in volatile service based environments. This thesis contains a state-of-the-art, the description of FMC-QE as well as extensions of FMC-QE in representative general models and case studies. In the state-of-the-art part of the thesis in chapter 2, an overview on existing Queueing Theory and Time Augmented Petri Net models and other quantitative modeling and evaluation languages and methodologies is given. Also other hierarchical quantitative modeling frameworks will be considered. The description of FMC-QE in chapter 3 consists of a summary of the foundations of FMC-QE, basic definitions, the graphical notations, the FMC-QE Calculus and the modeling of open queueing networks as an introductory example. The extensions of FMC-QE in chapter 4 consist of the integration of the summation method in order to support the handling of closed networks and the modeling of multiclass and semaphore scenarios. Furthermore, FMC-QE is compared to other performance modeling and evaluation approaches. In the case study part in chapter 5, proof-of-concept examples, like the modeling of a service based search portal, a service based SAP NetWeaver application and the Axis2 Web service framework will be provided. Finally, conclusions are given by a summary of contributions and an outlook on future work in chapter 6. [1] Werner Zorn. FMC-QE - A New Approach in Quantitative Modeling. In Hamid R. Arabnia, editor, Procee-dings of the International Conference on Modeling, Simulation and Visualization Methods (MSV 2007) within WorldComp ’07, pages 280 – 287, Las Vegas, NV, USA, June 2007. CSREA Press. ISBN 1-60132-029-9.
The programmable network envisioned in the 1990s within standardization and research for the Intelligent Network is currently coming into reality using IPbased Next Generation Networks (NGN) and applying Service-Oriented Architecture (SOA) principles for service creation, execution, and hosting. SOA is the foundation for both next-generation telecommunications and middleware architectures, which are rapidly converging on top of commodity transport services. Services such as triple/quadruple play, multimedia messaging, and presence are enabled by the emerging service-oriented IPMultimedia Subsystem (IMS), and allow telecommunications service providers to maintain, if not improve, their position in the marketplace. SOA becomes the de facto standard in next-generation middleware systems as the system model of choice to interconnect service consumers and providers within and between enterprises. We leverage previous research activities in overlay networking technologies along with recent advances in network abstraction, service exposure, and service creation to develop a paradigm for a service environment providing converged Internet and Telecommunications services that we call Service Broker. Such a Service Broker provides mechanisms to combine and mediate between different service paradigms from the two domains Internet/WWW and telecommunications. Furthermore, it enables the composition of services across these domains and is capable of defining and applying temporal constraints during creation and execution time. By adding network-awareness into the service fabric, such a Service Broker may also act as a next generation network-to-service element allowing the composition of crossdomain and cross-layer network and service resources. The contribution of this research is threefold: first, we analyze and classify principles and technologies from Information Technologies (IT) and telecommunications to identify and discuss issues allowing cross-domain composition in a converging service layer. Second, we discuss service composition methods allowing the creation of converged services on an abstract level; in particular, we present a formalized method for model-checking of such compositions. Finally, we propose a Service Broker architecture converging Internet and Telecom services. This environment enables cross-domain feature interaction in services through formalized obligation policies acting as constraints during service discovery, creation, and execution time.
The exponential expanding of the numbers of web sites and Internet users makes WWW the most important global information resource. From information publishing and electronic commerce to entertainment and social networking, the Web allows an inexpensive and efficient access to the services provided by individuals and institutions. The basic units for distributing these services are the web sites scattered throughout the world. However, the extreme fragility of web services and content, the high competence between similar services supplied by different sites, and the wide geographic distributions of the web users drive the urgent requirement from the web managers to track and understand the usage interest of their web customers. This thesis, "X-tracking the Usage Interest on Web Sites", aims to fulfill this requirement. "X" stands two meanings: one is that the usage interest differs from various web sites, and the other is that usage interest is depicted from multi aspects: internal and external, structural and conceptual, objective and subjective. "Tracking" shows that our concentration is on locating and measuring the differences and changes among usage patterns. This thesis presents the methodologies on discovering usage interest on three kinds of web sites: the public information portal site, e-learning site that provides kinds of streaming lectures and social site that supplies the public discussions on IT issues. On different sites, we concentrate on different issues related with mining usage interest. The educational information portal sites were the first implementation scenarios on discovering usage patterns and optimizing the organization of web services. In such cases, the usage patterns are modeled as frequent page sets, navigation paths, navigation structures or graphs. However, a necessary requirement is to rebuild the individual behaviors from usage history. We give a systematic study on how to rebuild individual behaviors. Besides, this thesis shows a new strategy on building content clusters based on pair browsing retrieved from usage logs. The difference between such clusters and the original web structure displays the distance between the destinations from usage side and the expectations from design side. Moreover, we study the problem on tracking the changes of usage patterns in their life cycles. The changes are described from internal side integrating conceptual and structure features, and from external side for the physical features; and described from local side measuring the difference between two time spans, and global side showing the change tendency along the life cycle. A platform, Web-Cares, is developed to discover the usage interest, to measure the difference between usage interest and site expectation and to track the changes of usage patterns. E-learning site provides the teaching materials such as slides, recorded lecture videos and exercise sheets. We focus on discovering the learning interest on streaming lectures, such as real medias, mp4 and flash clips. Compared to the information portal site, the usage on streaming lectures encapsulates the variables such as viewing time and actions during learning processes. The learning interest is discovered in the form of answering 6 questions, which covers finding the relations between pieces of lectures and the preference among different forms of lectures. We prefer on detecting the changes of learning interest on the same course from different semesters. The differences on the content and structure between two courses leverage the changes on the learning interest. We give an algorithm on measuring the difference on learning interest integrated with similarity comparison between courses. A search engine, TASK-Moniminer, is created to help the teacher query the learning interest on their streaming lectures on tele-TASK site. Social site acts as an online community attracting web users to discuss the common topics and share their interesting information. Compared to the public information portal site and e-learning web site, the rich interactions among users and web content bring the wider range of content quality, on the other hand, provide more possibilities to express and model usage interest. We propose a framework on finding and recommending high reputation articles in a social site. We observed that the reputation is classified into global and local categories; the quality of the articles having high reputation is related with the content features. Based on these observations, our framework is implemented firstly by finding the articles having global or local reputation, and secondly clustering articles based on their content relations, and then the articles are selected and recommended from each cluster based on their reputation ranks.
Companies develop process models to explicitly describe their business operations. In the same time, business operations, business processes, must adhere to various types of compliance requirements. Regulations, e.g., Sarbanes Oxley Act of 2002, internal policies, best practices are just a few sources of compliance requirements. In some cases, non-adherence to compliance requirements makes the organization subject to legal punishment. In other cases, non-adherence to compliance leads to loss of competitive advantage and thus loss of market share. Unlike the classical domain-independent behavioral correctness of business processes, compliance requirements are domain-specific. Moreover, compliance requirements change over time. New requirements might appear due to change in laws and adoption of new policies. Compliance requirements are offered or enforced by different entities that have different objectives behind these requirements. Finally, compliance requirements might affect different aspects of business processes, e.g., control flow and data flow. As a result, it is infeasible to hard-code compliance checks in tools. Rather, a repeatable process of modeling compliance rules and checking them against business processes automatically is needed. This thesis provides a formal approach to support process design-time compliance checking. Using visual patterns, it is possible to model compliance requirements concerning control flow, data flow and conditional flow rules. Each pattern is mapped into a temporal logic formula. The thesis addresses the problem of consistency checking among various compliance requirements, as they might stem from divergent sources. Also, the thesis contributes to automatically check compliance requirements against process models using model checking. We show that extra domain knowledge, other than expressed in compliance rules, is needed to reach correct decisions. In case of violations, we are able to provide a useful feedback to the user. The feedback is in the form of parts of the process model whose execution causes the violation. In some cases, our approach is capable of providing automated remedy of the violation.
Background: The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributedworld of bioinformatics services. Without adequatemanagement and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions. Results: In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI’s model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developermay either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations. Conclusion: We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web
This thesis presents methods for automated synthesis of flexible chip multiprocessor systems from parallel programs targeted at FPGAs to exploit both task-level parallelism and architecture customization. Automated synthesis is necessitated by the complexity of the design space. A detailed description of the design space is provided in order to determine which parameters should be modeled to facilitate automated synthesis by optimizing a cost function, the emphasis being placed on inclusive modeling of parameters from application, architectural and physical subspaces, as well as their joint coverage in order to avoid pre-constraining the design space. Given a parallel program and a set of an IP library, the automated synthesis problem is to simultaneously (i) select processors (ii) map and schedule tasks to them, and (iii) select one or several networks for inter-task communications such that design constraints and optimization objectives are met. The research objective in this thesis is to find a suitable model for automated synthesis, and to evaluate methods of using the model for architectural optimizations. Our contributions are a holistic approach for the design of such systems, corresponding models to facilitate automated synthesis, evaluation of optimization methods using state of the art integer linear and answer set programming, as well as the development of synthesis heuristics to solve runtime challenges.
The difference-list technique is described in literature as effective method for extending lists to the right without using calls of append/3. There exist some proposals for automatic transformation of list programs into differencelist programs. However, we are interested in construction of difference-list programs by the programmer, avoiding the need of a transformation step. In [GG09] it was demonstrated, how left-recursive procedures with a dangling call of append/3 can be transformed into right-recursion using the unfolding technique. For simplification of writing difference-list programs using a new cons/2 procedure was introduced. In the present paper, we investigate how efficieny is influenced using cons/2. We measure the efficiency of procedures using accumulator technique, cons/2, DCG’s, and difference lists and compute the resulting speedup in respect to the simple procedure definition using append/3. Four Prolog systems were investigated and we found different behaviour concerning the speedup by difference lists. A result of our investigations is, that an often advice given in the literature for avoiding calls append/3 could not be confirmed in this strong formulation.
We introduce a simple approach extending the input language of Answer Set Programming (ASP) systems by multi-valued propositions. Our approach is implemented as a (prototypical) preprocessor translating logic programs with multi-valued propositions into logic programs with Boolean propositions only. Our translation is modular and heavily benefits from the expressive input language of ASP. The resulting approach, along with its implementation, allows for solving interesting constraint satisfaction problems in ASP, showing a good performance.
Preface
(2010)
The workshops on (constraint) logic programming (WLP) are the annual meeting of the Society of Logic Programming (GLP e.V.) and bring together researchers interested in logic programming, constraint programming, and related areas like databases, artificial intelligence and operations research. In this decade, previous workshops took place in Dresden (2008), Würzburg (2007), Vienna (2006), Ulm (2005), Potsdam (2004), Dresden (2002), Kiel (2001), and Würzburg (2000). Contributions to workshops deal with all theoretical, experimental, and application aspects of constraint programming (CP) and logic programming (LP), including foundations of constraint/ logic programming. Some of the special topics are constraint solving and optimization, extensions of functional logic programming, deductive databases, data mining, nonmonotonic reasoning, , interaction of CP/LP with other formalisms like agents, XML, JAVA, program analysis, program transformation, program verification, meta programming, parallelism and concurrency, answer set programming, implementation and software techniques (e.g., types, modularity, design patterns), applications (e.g., in production, environment, education, internet), constraint/logic programming for semantic web systems and applications, reasoning on the semantic web, data modelling for the web, semistructured data, and web query languages.
The workshops on (constraint) logic programming (WLP) are the annual meeting of the Society of Logic Programming (GLP e.V.) and bring together researchers interested in logic programming, constraint programming, and related areas like databases, artificial intelligence and operations research. The 23rd WLP was held in Potsdam at September 15 – 16, 2009. The topics of the presentations of WLP2009 were grouped into the major areas: Databases, Answer Set Programming, Theory and Practice of Logic Programming as well as Constraints and Constraint Handling Rules.
With the rise of electronic integration between organizations, the need for a precise specification of interaction behavior increases. Information systems, replacing interaction previously carried out by humans via phone, faxes and emails, require a precise specification for handling all possible situations. Such interaction behavior is described in process choreographies. Choreographies enumerate the roles involved, the allowed interactions, the message contents and the behavioral dependencies between interactions. Choreographies serve as interaction contract and are the starting point for adapting existing business processes and systems or for implementing new software components. As a thorough analysis and comparison of choreography modeling languages is missing in the literature, this thesis introduces a requirements framework for choreography languages and uses it for comparing current choreography languages. Language proposals for overcoming the limitations are given for choreography modeling on the conceptual and on the technical level. Using an interconnection modeling style, behavioral dependencies are defined on a per-role basis and different roles are interconnected using message flow. This thesis reveals a number of modeling "anti-patterns" for interconnection modeling, motivating further investigations on choreography languages following the interaction modeling style. Here, interactions are seen as atomic building blocks and the behavioral dependencies between them are defined globally. Two novel language proposals are put forward for this modeling style which have already influenced industrial standardization initiatives. While avoiding many of the pitfalls of interconnection modeling, new anomalies can arise in interaction models. A choreography might not be realizable, i.e. there does not exist a set of interacting roles that collectively realize the specified behavior. This thesis investigates different dimensions of realizability.
One of the main problems in machine learning is to train a predictive model from training data and to make predictions on test data. Most predictive models are constructed under the assumption that the training data is governed by the exact same distribution which the model will later be exposed to. In practice, control over the data collection process is often imperfect. A typical scenario is when labels are collected by questionnaires and one does not have access to the test population. For example, parts of the test population are underrepresented in the survey, out of reach, or do not return the questionnaire. In many applications training data from the test distribution are scarce because they are difficult to obtain or very expensive. Data from auxiliary sources drawn from similar distributions are often cheaply available. This thesis centers around learning under differing training and test distributions and covers several problem settings with different assumptions on the relationship between training and test distributions-including multi-task learning and learning under covariate shift and sample selection bias. Several new models are derived that directly characterize the divergence between training and test distributions, without the intermediate step of estimating training and test distributions separately. The integral part of these models are rescaling weights that match the rescaled or resampled training distribution to the test distribution. Integrated models are studied where only one optimization problem needs to be solved for learning under differing distributions. With a two-step approximation to the integrated models almost any supervised learning algorithm can be adopted to biased training data. In case studies on spam filtering, HIV therapy screening, targeted advertising, and other applications the performance of the new models is compared to state-of-the-art reference methods.
Many formal descriptions of DPLL-based SAT algorithms either do not include all essential proof techniques applied by modern SAT solvers or are bound to particular heuristics or data structures. This makes it difficult to analyze proof-theoretic properties or the search complexity of these algorithms. In this paper we try to improve this situation by developing a nondeterministic proof calculus that models the functioning of SAT algorithms based on the DPLL calculus with clause learning. This calculus is independent of implementation details yet precise enough to enable a formal analysis of realistic DPLL-based SAT algorithms.
This thesis addresses real-time rendering techniques for 3D information lenses based on the focus & context metaphor. It analyzes, conceives, implements, and reviews its applicability to objects and structures of virtual 3D city models. In contrast to digital terrain models, the application of focus & context visualization to virtual 3D city models is barely researched. However, the purposeful visualization of contextual data of is extreme importance for the interactive exploration and analysis of this field. Programmable hardware enables the implementation of new lens techniques, that allow the augmentation of the perceptive and cognitive quality of the visualization compared to classical perspective projections. A set of 3D information lenses is integrated into a 3D scene-graph system: • Occlusion lenses modify the appearance of virtual 3D city model objects to resolve their occlusion and consequently facilitate the navigation. • Best-view lenses display city model objects in a priority-based manner and mediate their meta information. Thus, they support exploration and navigation of virtual 3D city models. • Color and deformation lenses modify the appearance and geometry of 3D city models to facilitate their perception. The presented techniques for 3D information lenses and their application to virtual 3D city models clarify their potential for interactive visualization and form a base for further development.
Although educational content in electronic form is increasing dramatically, its usage in an educational environment is poor, mainly due to the fact that there is too much of (unreliable) redundant, and not relevant information. Finding appropriate answers is a rather difficult task being reliant on the user filtering of the pertinent information from the noise. Turning knowledge bases like the online tele-TASK archive into useful educational resources requires identifying correct, reliable, and "machine-understandable" information, as well as developing simple but efficient search tools with the ability to reason over this information. Our vision is to create an E-Librarian Service, which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index, or by using a simple keyword search. In our E-Librarian Service, the user can enter his question in a very simple and human way; in natural language (NL). Our premise is that more pertinent results would be retrieved if the search engine understood the sense of the user's query. The returned results are then logical consequences of an inference rather than of keyword matchings. Our E-Librarian Service does not return the answer to the user's question, but it retrieves the most pertinent document(s), in which the user finds the answer to his/her question. Among all the documents that have some common information with the user query, our E-Librarian Service identifies the most pertinent match(es), keeping in mind that the user expects an exhaustive answer while preferring a concise answer with only little or no information overhead. Also, our E-Librarian Service always proposes a solution to the user, even if the system concludes that there is no exhaustive answer. Our E-Librarian Service was implemented prototypically in three different educational tools. A first prototype is CHESt (Computer History Expert System); it has a knowledge base with 300 multimedia clips that cover the main events in computer history. A second prototype is MatES (Mathematics Expert System); it has a knowledge base with 115 clips that cover the topic of fractions in mathematics for secondary school w.r.t. the official school programme. All clips were recorded mainly by pupils. The third and most advanced prototype is the "Lecture Butler's E-Librarain Service"; it has a Web service interface to respect a service oriented architecture (SOA), and was developed in the context of the Web-University project at the Hasso-Plattner-Institute (HPI). Two major experiments in an educational environment - at the Lycée Technique Esch/Alzette in Luxembourg - were made to test the pertinence and reliability of our E-Librarian Service as a complement to traditional courses. The first experiment (in 2005) was made with CHESt in different classes, and covered a single lesson. The second experiment (in 2006) covered a period of 6 weeks of intensive use of MatES in one class. There was no classical mathematics lesson where the teacher gave explanations, but the students had to learn in an autonomous and exploratory way. They had to ask questions to the E-Librarian Service just the way they would if there was a human teacher.
This paper describes the proof calculus LD for clausal propositional logic, which is a linearized form of the well-known DPLL calculus extended by clause learning. It is motivated by the demand to model how current SAT solvers built on clause learning are working, while abstracting from decision heuristics and implementation details. The calculus is proved sound and terminating. Further, it is shown that both the original DPLL calculus and the conflict-directed backtracking calculus with clause learning, as it is implemented in many current SAT solvers, are complete and proof-confluent instances of the LD calculus.
This work introduces novel internal and external memory algorithms for computing voxel skeletons of massive voxel objects with complex network-like architecture and for converting these voxel skeletons to piecewise linear geometry, that is triangle meshes and piecewise straight lines. The presented techniques help to tackle the challenge of visualizing and analyzing 3d images of increasing size and complexity, which are becoming more and more important in, for example, biological and medical research. Section 2.3.1 contributes to the theoretical foundations of thinning algorithms with a discussion of homotopic thinning in the grid cell model. The grid cell model explicitly represents a cell complex built of faces, edges, and vertices shared between voxels. A characterization of pairs of cells to be deleted is much simpler than characterizations of simple voxels were before. The grid cell model resolves topologically unclear voxel configurations at junctions and locked voxel configurations causing, for example, interior voxels in sets of non-simple voxels. A general conclusion is that the grid cell model is superior to indecomposable voxels for algorithms that need detailed control of topology. Section 2.3.2 introduces a noise-insensitive measure based on the geodesic distance along the boundary to compute two-dimensional skeletons. The measure is able to retain thin object structures if they are geometrically important while ignoring noise on the object's boundary. This combination of properties is not known of other measures. The measure is also used to guide erosion in a thinning process from the boundary towards lines centered within plate-like structures. Geodesic distance based quantities seem to be well suited to robustly identify one- and two-dimensional skeletons. Chapter 6 applies the method to visualization of bone micro-architecture. Chapter 3 describes a novel geometry generation scheme for representing voxel skeletons, which retracts voxel skeletons to piecewise linear geometry per dual cube. The generated triangle meshes and graphs provide a link to geometry processing and efficient rendering of voxel skeletons. The scheme creates non-closed surfaces with boundaries, which contain fewer triangles than a representation of voxel skeletons using closed surfaces like small cubes or iso-surfaces. A conclusion is that thinking specifically about voxel skeleton configurations instead of generic voxel configurations helps to deal with the topological implications. The geometry generation is one foundation of the applications presented in Chapter 6. Chapter 5 presents a novel external memory algorithm for distance ordered homotopic thinning. The presented method extends known algorithms for computing chamfer distance transformations and thinning to execute I/O-efficiently when input is larger than the available main memory. The applied block-wise decomposition schemes are quite simple. Yet it was necessary to carefully analyze effects of block boundaries to devise globally correct external memory variants of known algorithms. In general, doing so is superior to naive block-wise processing ignoring boundary effects. Chapter 6 applies the algorithms in a novel method based on confocal microscopy for quantitative study of micro-vascular networks in the field of microcirculation.
An increasing number of applications requires user interfaces that facilitate the handling of large geodata sets. Using virtual 3D city models, complex geospatial information can be communicated visually in an intuitive way. Therefore, real-time visualization of virtual 3D city models represents a key functionality for interactive exploration, presentation, analysis, and manipulation of geospatial data. This thesis concentrates on the development and implementation of concepts and techniques for real-time city model visualization. It discusses rendering algorithms as well as complementary modeling concepts and interaction techniques. Particularly, the work introduces a new real-time rendering technique to handle city models of high complexity concerning texture size and number of textures. Such models are difficult to handle by current technology, primarily due to two problems: - Limited texture memory: The amount of simultaneously usable texture data is limited by the memory of the graphics hardware. - Limited number of textures: Using several thousand different textures simultaneously causes significant performance problems due to texture switch operations during rendering. The multiresolution texture atlases approach, introduced in this thesis, overcomes both problems. During rendering, it permanently maintains a small set of textures that are sufficient for the current view and the screen resolution available. The efficiency of multiresolution texture atlases is evaluated in performance tests. To summarize, the results demonstrate that the following goals have been achieved: - Real-time rendering becomes possible for 3D scenes whose amount of texture data exceeds the main memory capacity. - Overhead due to texture switches is kept permanently low, so that the number of different textures has no significant effect on the rendering frame rate. Furthermore, this thesis introduces two new approaches for real-time city model visualization that use textures as core visualization elements: - An approach for visualization of thematic information. - An approach for illustrative visualization of 3D city models. Both techniques demonstrate that multiresolution texture atlases provide a basic functionality for the development of new applications and systems in the domain of city model visualization.
The innovation of information techniques has changed many aspects of our life. In health care field, we can obtain, manage and communicate high-quality large volumetric image data by computer integrated devices, to support medical care. In this dissertation I propose several promising methods that could assist physicians in processing, observing and communicating the image data. They are included in my three research aspects: telemedicine integration, medical image visualization and image segmentation. And these methods are also demonstrated by the demo software that I developed. One of my research point focuses on medical information storage standard in telemedicine, for example DICOM, which is the predominant standard for the storage and communication of medical images. I propose a novel 3D image data storage method, which was lacking in current DICOM standard. I also created a mechanism to make use of the non-standard or private DICOM files. In this thesis I present several rendering techniques on medical image visualization to offer different display manners, both 2D and 3D, for example, cut through data volume in arbitrary degree, rendering the surface shell of the data, and rendering the semi-transparent volume of the data. A hybrid segmentation approach, designed for semi-automated segmentation of radiological image, such as CT, MRI, etc, is proposed in this thesis to get the organ or interested area from the image. This approach takes advantage of the region-based method and boundary-based methods. Three steps compose the hybrid approach: the first step gets coarse segmentation by fuzzy affinity and generates homogeneity operator; the second step divides the image by Voronoi Diagram and reclassifies the regions by the operator to refine segmentation from the previous step; the third step handles vague boundary by level set model. Topics for future research are mentioned in the end, including new supplement for DICOM standard for segmentation information storage, visualization of multimodal image information, and improvement of the segmentation approach to higher dimension.
Three quantum cryptographic protocols of multiuser quantum networks with embedded authentication, allowing quantum key distribution or quantum direct communication, are discussed in this work. The security of the protocols against different types of attacks is analysed with a focus on various impersonation attacks and the man-in-the-middle attack. On the basis of the security analyses several improvements are suggested and implemented in order to adjust the investigated vulnerabilities. Furthermore, the impact of the eavesdropping test procedure on impersonation attacks is outlined. The framework of a general eavesdropping test is proposed to provide additional protection against security risks in impersonation attacks.
Answer Set Programming (ASP) emerged in the late 1990s as a new logic programming paradigm, having its roots in nonmonotonic reasoning, deductive databases, and logic programming with negation as failure. The basic idea of ASP is to represent a computational problem as a logic program whose answer sets correspond to solutions, and then to use an answer set solver for finding answer sets of the program. ASP is particularly suited for solving NP-complete search problems. Among these, we find applications to product configuration, diagnosis, and graph-theoretical problems, e.g. finding Hamiltonian cycles. On different lines of ASP research, many extensions of the basic formalism have been proposed. The most intensively studied one is the modelling of preferences in ASP. They constitute a natural and effective way of selecting preferred solutions among a plethora of solutions for a problem. For example, preferences have been successfully used for timetabling, auctioning, and product configuration. In this thesis, we concentrate on preferences within answer set programming. Among several formalisms and semantics for preference handling in ASP, we concentrate on ordered logic programs with the underlying D-, W-, and B-semantics. In this setting, preferences are defined among rules of a logic program. They select preferred answer sets among (standard) answer sets of the underlying logic program. Up to now, those preferred answer sets have been computed either via a compilation method or by meta-interpretation. Hence, the question comes up, whether and how preferences can be integrated into an existing ASP solver. To solve this question, we develop an operational graph-based framework for the computation of answer sets of logic programs. Then, we integrate preferences into this operational approach. We empirically observe that our integrative approach performs in most cases better than the compilation method or meta-interpretation. Another research issue in ASP are optimization methods that remove redundancies, as also found in database query optimizers. For these purposes, the rather recently suggested notion of strong equivalence for ASP can be used. If a program is strongly equivalent to a subprogram of itself, then one can always use the subprogram instead of the original program, a technique which serves as an effective optimization method. Up to now, strong equivalence has not been considered for logic programs with preferences. In this thesis, we tackle this issue and generalize the notion of strong equivalence to ordered logic programs. We give necessary and sufficient conditions for the strong equivalence of two ordered logic programs. Furthermore, we provide program transformations for ordered logic programs and show in how far preferences can be simplified. Finally, we present two new applications for preferences within answer set programming. First, we define new procedures for group decision making, which we apply to the problem of scheduling a group meeting. As a second new application, we reconstruct a linguistic problem appearing in German dialects within ASP. Regarding linguistic studies, there is an ongoing debate about how unique the rule systems of language are in human cognition. The reconstruction of grammatical regularities with tools from computer science has consequences for this debate: if grammars can be modelled this way, then they share core properties with other non-linguistic rule systems.
Advances in biotechnologies rapidly increase the number of molecules of a cell which can be observed simultaneously. This includes expression levels of thousands or ten-thousands of genes as well as concentration levels of metabolites or proteins. Such Profile data, observed at different times or at different experimental conditions (e.g., heat or dry stress), show how the biological experiment is reflected on the molecular level. This information is helpful to understand the molecular behaviour and to identify molecules or combination of molecules that characterise specific biological condition (e.g., disease). This work shows the potentials of component extraction algorithms to identify the major factors which influenced the observed data. This can be the expected experimental factors such as the time or temperature as well as unexpected factors such as technical artefacts or even unknown biological behaviour. Extracting components means to reduce the very high-dimensional data to a small set of new variables termed components. Each component is a combination of all original variables. The classical approach for that purpose is the principal component analysis (PCA). It is shown that, in contrast to PCA which maximises the variance only, modern approaches such as independent component analysis (ICA) are more suitable for analysing molecular data. The condition of independence between components of ICA fits more naturally our assumption of individual (independent) factors which influence the data. This higher potential of ICA is demonstrated by a crossing experiment of the model plant Arabidopsis thaliana (Thale Cress). The experimental factors could be well identified and, in addition, ICA could even detect a technical artefact. However, in continuously observations such as in time experiments, the data show, in general, a nonlinear distribution. To analyse such nonlinear data, a nonlinear extension of PCA is used. This nonlinear PCA (NLPCA) is based on a neural network algorithm. The algorithm is adapted to be applicable to incomplete molecular data sets. Thus, it provides also the ability to estimate the missing data. The potential of nonlinear PCA to identify nonlinear factors is demonstrated by a cold stress experiment of Arabidopsis thaliana. The results of component analysis can be used to build a molecular network model. Since it includes functional dependencies it is termed functional network. Applied to the cold stress data, it is shown that functional networks are appropriate to visualise biological processes and thereby reveals molecular dynamics.
This thesis discusses challenges in IT security education, points out a gap between e-learning and practical education, and presents a work to fill the gap. E-learning is a flexible and personalized alternative to traditional education. Nonetheless, existing e-learning systems for IT security education have difficulties in delivering hands-on experience because of the lack of proximity. Laboratory environments and practical exercises are indispensable instruction tools to IT security education, but security education in conventional computer laboratories poses particular problems such as immobility as well as high creation and maintenance costs. Hence, there is a need to effectively transform security laboratories and practical exercises into e-learning forms. In this thesis, we introduce the Tele-Lab IT-Security architecture that allows students not only to learn IT security principles, but also to gain hands-on security experience by exercises in an online laboratory environment. In this architecture, virtual machines are used to provide safe user work environments instead of real computers. Thus, traditional laboratory environments can be cloned onto the Internet by software, which increases accessibility to laboratory resources and greatly reduces investment and maintenance costs. Under the Tele-Lab IT-Security framework, a set of technical solutions is also proposed to provide effective functionalities, reliability, security, and performance. The virtual machines with appropriate resource allocation, software installation, and system configurations are used to build lightweight security laboratories on a hosting computer. Reliability and availability of laboratory platforms are covered by a virtual machine management framework. This management framework provides necessary monitoring and administration services to detect and recover critical failures of virtual machines at run time. Considering the risk that virtual machines can be misused for compromising production networks, we present a security management solution to prevent the misuse of laboratory resources by security isolation at the system and network levels. This work is an attempt to bridge the gap between e-learning/tele-teaching and practical IT security education. It is not to substitute conventional teaching in laboratories but to add practical features to e-learning. This thesis demonstrates the possibility to implement hands-on security laboratories on the Internet reliably, securely, and economically.
With increasing number of applications in Internet and mobile environments, distributed software systems are demanded to be more powerful and flexible, especially in terms of dynamism and security. This dissertation describes my work concerning three aspects: dynamic reconfiguration of component software, security control on middleware applications, and web services dynamic composition. Firstly, I proposed a technology named Routing Based Workflow (RBW) to model the execution and management of collaborative components and realize temporary binding for component instances. The temporary binding means component instances are temporarily loaded into a created execution environment to execute their functions, and then are released to their repository after executions. The temporary binding allows to create an idle execution environment for all collaborative components, on which the change operations can be immediately carried out. The changes on execution environment will result in a new collaboration of all involved components, and also greatly simplifies the classical issues arising from dynamic changes, such as consistency preserving etc. To demonstrate the feasibility of RBW, I created a dynamic secure middleware system - the Smart Data Server Version 3.0 (SDS3). In SDS3, an open source implementation of CORBA is adopted and modified as the communication infrastructure, and three secure components managed by RBW, are created to enhance the security on the access of deployed applications. SDS3 offers multi-level security control on its applications from strategy control to application-specific detail control. For the management by RBW, the strategy control of SDS3 applications could be dynamically changed by reorganizing the collaboration of the three secure components. In addition, I created the Dynamic Services Composer (DSC) based on Apache open source projects, Apache Axis and WSIF. In DSC, RBW is employed to model the interaction and collaboration of web services and to enable the dynamic changes on the flow structure of web services. Finally, overall performance tests were made to evaluate the efficiency of the developed RBW and SDS3. The results demonstrated that temporary binding of component instances makes slight impacts on the execution efficiency of components, and the blackout time arising from dynamic changes can be extremely reduced in any applications.
The goal of a Brain-Computer Interface (BCI) consists of the development of a unidirectional interface between a human and a computer to allow control of a device only via brain signals. While the BCI systems of almost all other groups require the user to be trained over several weeks or even months, the group of Prof. Dr. Klaus-Robert Müller in Berlin and Potsdam, which I belong to, was one of the first research groups in this field which used machine learning techniques on a large scale. The adaptivity of the processing system to the individual brain patterns of the subject confers huge advantages for the user. Thus BCI research is considered a hot topic in machine learning and computer science. It requires interdisciplinary cooperation between disparate fields such as neuroscience, since only by combining machine learning and signal processing techniques based on neurophysiological knowledge will the largest progress be made. In this work I particularly deal with my part of this project, which lies mainly in the area of computer science. I have considered the following three main points: <b>Establishing a performance measure based on information theory:</b> I have critically illuminated the assumptions of Shannon's information transfer rate for application in a BCI context. By establishing suitable coding strategies I was able to show that this theoretical measure approximates quite well to what is practically achieveable. <b>Transfer and development of suitable signal processing and machine learning techniques:</b> One substantial component of my work was to develop several machine learning and signal processing algorithms to improve the efficiency of a BCI. Based on the neurophysiological knowledge that several independent EEG features can be observed for some mental states, I have developed a method for combining different and maybe independent features which improved performance. In some cases the performance of the combination algorithm outperforms the best single performance by more than 50 %. Furthermore, I have theoretically and practically addressed via the development of suitable algorithms the question of the optimal number of classes which should be used for a BCI. It transpired that with BCI performances reported so far, three or four different mental states are optimal. For another extension I have combined ideas from signal processing with those of machine learning since a high gain can be achieved if the temporal filtering, i.e., the choice of frequency bands, is automatically adapted to each subject individually. <b>Implementation of the Berlin brain computer interface and realization of suitable experiments:</b> Finally a further substantial component of my work was to realize an online BCI system which includes the developed methods, but is also flexible enough to allow the simple realization of new algorithms and ideas. So far, bitrates of up to 40 bits per minute have been achieved with this system by absolutely untrained users which, compared to results of other groups, is highly successful.
Modern biological analysis techniques supply scientists with various forms of data. One category of such data are the so called "expression data". These data indicate the quantities of biochemical compounds present in tissue samples. Recently, expression data can be generated at a high speed. This leads in turn to amounts of data no longer analysable by classical statistical techniques. Systems biology is the new field that focuses on the modelling of this information. At present, various methods are used for this purpose. One superordinate class of these methods is machine learning. Methods of this kind had, until recently, predominantly been used for classification and prediction tasks. This neglected a powerful secondary benefit: the ability to induce interpretable models. Obtaining such models from data has become a key issue within Systems biology. Numerous approaches have been proposed and intensively discussed. This thesis focuses on the examination and exploitation of one basic technique: decision trees. The concept of comparing sets of decision trees is developed. This method offers the possibility of identifying significant thresholds in continuous or discrete valued attributes through their corresponding set of decision trees. Finding significant thresholds in attributes is a means of identifying states in living organisms. Knowing about states is an invaluable clue to the understanding of dynamic processes in organisms. Applied to metabolite concentration data, the proposed method was able to identify states which were not found with conventional techniques for threshold extraction. A second approach exploits the structure of sets of decision trees for the discovery of combinatorial dependencies between attributes. Previous work on this issue has focused either on expensive computational methods or the interpretation of single decision trees a very limited exploitation of the data. This has led to incomplete or unstable results. That is why a new method is developed that uses sets of decision trees to overcome these limitations. Both the introduced methods are available as software tools. They can be applied consecutively or separately. That way they make up a package of analytical tools that usefully supplement existing methods. By means of these tools, the newly introduced methods were able to confirm existing knowledge and to suggest interesting and new relationships between metabolites.
Quantified Boolean formulas (QBFs) play an important role in theoretical computer science. QBF extends propositional logic in such a way that many advanced forms of reasoning can be easily formulated and evaluated. In this dissertation we present our ZQSAT, which is an algorithm for evaluating quantified Boolean formulas. ZQSAT is based on ZBDD: Zero-Suppressed Binary Decision Diagram , which is a variant of BDD, and an adopted version of the DPLL algorithm. It has been implemented in C using the CUDD: Colorado University Decision Diagram package. The capability of ZBDDs in storing sets of subsets efficiently enabled us to store the clauses of a QBF very compactly and let us to embed the notion of memoization to the DPLL algorithm. These points led us to implement the search algorithm in such a way that we could store and reuse the results of all previously solved subformulas with a little overheads. ZQSAT can solve some sets of standard QBF benchmark problems (known to be hard for DPLL based algorithms) faster than the best existing solvers. In addition to prenex-CNF, ZQSAT accepts prenex-NNF formulas. We show and prove how this capability can be exponentially beneficial.
This thesis is concerned with the solution of the blind source separation problem (BSS). The BSS problem occurs frequently in various scientific and technical applications. In essence, it consists in separating meaningful underlying components out of a mixture of a multitude of superimposed signals. In the recent research literature there are two related approaches to the BSS problem: The first is known as Independent Component Analysis (ICA), where the goal is to transform the data such that the components become as independent as possible. The second is based on the notion of diagonality of certain characteristic matrices derived from the data. Here the goal is to transform the matrices such that they become as diagonal as possible. In this thesis we study the latter method of approximate joint diagonalization (AJD) to achieve a solution of the BSS problem. After an introduction to the general setting, the thesis provides an overview on particular choices for the set of target matrices that can be used for BSS by joint diagonalization. As the main contribution of the thesis, new algorithms for approximate joint diagonalization of several matrices with non-orthogonal transformations are developed. These newly developed algorithms will be tested on synthetic benchmark datasets and compared to other previous diagonalization algorithms. Applications of the BSS methods to biomedical signal processing are discussed and exemplified with real-life data sets of multi-channel biomagnetic recordings.
Independent component analysis (ICA) is a tool for statistical data analysis and signal processing that is able to decompose multivariate signals into their underlying source components. Although the classical ICA model is highly useful, there are many real-world applications that require powerful extensions of ICA. This thesis presents new methods that extend the functionality of ICA: (1) reliability and grouping of independent components with noise injection, (2) robust and overcomplete ICA with inlier detection, and (3) nonlinear ICA with kernel methods.
In recent years, there has been a dramatic increase in available compute capacities. However, these “Grid resources” are rarely accessible in a continuous stream, but rather appear scattered across various machine types, platforms and operating systems, which are coupled by networks of fluctuating bandwidth. It becomes increasingly difficult for scientists to exploit available resources for their applications. We believe that intelligent, self-governing applications should be able to select resources in a dynamic and heterogeneous environment: Migrating applications determine a resource when old capacities are used up. Spawning simulations launch algorithms on external machines to speed up the main execution. Applications are restarted as soon as a failure is detected. All these actions can be taken without human interaction. A distributed compute environment possesses an intrinsic unreliability. Any application that interacts with such an environment must be able to cope with its failing components: deteriorating networks, crashing machines, failing software. We construct a reliable service infrastructure by endowing a service environment with a peer-to-peer topology. This “Grid Peer Services” infrastructure accommodates high-level services like migration and spawning, as well as fundamental services for application launching, file transfer and resource selection. It utilizes existing Grid technology wherever possible to accomplish its tasks. An Application Information Server acts as a generic information registry to all participants in a service environment. The service environment that we developed, allows applications e.g. to send a relocation requests to a migration server. The server selects a new computer based on the transmitted resource requirements. It transfers the application's checkpoint and binary to the new host and resumes the simulation. Although the Grid's underlying resource substrate is not continuous, we achieve persistent computations on Grids by relocating the application. We show with our real-world examples that a traditional genome analysis program can be easily modified to perform self-determined migrations in this service environment.
In order to face the rapidly increasing need for computational resources of various scientific and engineering applications one has to think of new ways to make more efficient use of the worlds current computational resources. In this respect, the growing speed of wide area networks made a new kind of distributed computing possible: Metacomputing or (distributed) Grid computing. This is a rather new and uncharted field in computational science. The rapidly increasing speed of networks even outperforms the average increase of processor speed: Processor speeds double on average each 18 month whereas network bandwidths double every 9 months. Due to this development of local and wide area networks Grid computing will certainly play a key role in the future of parallel computing. This type of distributed computing, however, distinguishes from the traditional parallel computing in many ways since it has to deal with many problems not occurring in classical parallel computing. Those problems are for example heterogeneity, authentication and slow networks to mention only a few. Some of those problems, e.g. the allocation of distributed resources along with the providing of information about these resources to the application have been already attacked by the Globus software. Unfortunately, as far as we know, hardly any application or middle-ware software takes advantage of this information, since most parallelizing algorithms for finite differencing codes are implicitly designed for single supercomputer or cluster execution. We show that although it is possible to apply classical parallelizing algorithms in a Grid environment, in most cases the observed efficiency of the executed code is very poor. In this work we are closing this gap. In our thesis, we will - show that an execution of classical parallel codes in Grid environments is possible but very slow - analyze this situation of bad performance, nail down bottlenecks in communication, remove unnecessary overhead and other reasons for low performance - develop new and advanced algorithms for parallelisation that are aware of a Grid environment in order to generelize the traditional parallelization schemes - implement and test these new methods, replace and compare with the classical ones - introduce dynamic strategies that automatically adapt the running code to the nature of the underlying Grid environment. The higher the performance one can achieve for a single application by manual tuning for a Grid environment, the lower the chance that those changes are widely applicable to other programs. In our analysis as well as in our implementation we tried to keep the balance between high performance and generality. None of our changes directly affect code on the application level which makes our algorithms applicable to a whole class of real world applications. The implementation of our work is done within the Cactus framework using the Globus toolkit, since we think that these are the most reliable and advanced programming frameworks for supporting computations in Grid environments. On the other hand, however, we tried to be as general as possible, i.e. all methods and algorithms discussed in this thesis are independent of Cactus or Globus.