Refine
Has Fulltext
- yes (85) (remove)
Year of publication
Document Type
- Doctoral Thesis (85) (remove)
Keywords
- Maschinelles Lernen (7)
- Antwortmengenprogrammierung (6)
- Machine Learning (5)
- Modellierung (4)
- answer set programming (4)
- Ontologie (3)
- Semantic Web (3)
- machine learning (3)
- Algorithmen (2)
- Algorithms (2)
Institute
- Institut für Informatik und Computational Science (85) (remove)
Die Projektierung und Abwicklung sowie die statische und dynamische Analyse von Geschäftsprozessen im Bereich des Verwaltens und Regierens auf kommunaler, Länder- wie auch Bundesebene mit Hilfe von Informations- und Kommunikationstechniken beschäftigen Politiker und Strategen für Informationstechnologie ebenso wie die Öffentlichkeit seit Langem.
Der hieraus entstandene Begriff E-Government wurde in der Folge aus den unterschiedlichsten technischen, politischen und semantischen Blickrichtungen beleuchtet.
Die vorliegende Arbeit konzentriert sich dabei auf zwei Schwerpunktthemen:
• Das erste Schwerpunktthema behandelt den Entwurf eines hierarchischen Architekturmodells, für welches sieben hierarchische Schichten identifiziert werden können. Diese erscheinen notwendig, aber auch hinreichend, um den allgemeinen Fall zu beschreiben.
Den Hintergrund hierfür liefert die langjährige Prozess- und Verwaltungserfahrung als Leiter der EDV-Abteilung der Stadtverwaltung Landshut, eine kreisfreie Stadt mit rund 69.000 Einwohnern im Nordosten von München. Sie steht als Repräsentant für viele Verwaltungsvorgänge in der Bundesrepublik Deutschland und ist dennoch als Analyseobjekt in der Gesamtkomplexität und Prozessquantität überschaubar.
Somit können aus der Analyse sämtlicher Kernabläufe statische und dynamische Strukturen extrahiert und abstrakt modelliert werden.
Die Schwerpunkte liegen in der Darstellung der vorhandenen Bedienabläufe in einer Kommune. Die Transformation der Bedienanforderung in einem hierarchischen System, die Darstellung der Kontroll- und der Operationszustände in allen Schichten wie auch die Strategie der Fehlererkennung und Fehlerbehebung schaffen eine transparente Basis für umfassende Restrukturierungen und Optimierungen.
Für die Modellierung wurde FMC-eCS eingesetzt, eine am Hasso-Plattner-Institut für Softwaresystemtechnik GmbH (HPI) im Fachgebiet Kommunikationssysteme entwickelte Methodik zur Modellierung zustandsdiskreter Systeme unter Berücksichtigung möglicher Inkonsistenzen (Betreuer: Prof. Dr.-Ing. Werner Zorn [ZW07a, ZW07b]).
• Das zweite Schwerpunktthema widmet sich der quantitativen Modellierung und Optimierung von E-Government-Bediensystemen, welche am Beispiel des Bürgerbüros der Stadt Landshut im Zeitraum 2008 bis 2015 durchgeführt wurden. Dies erfolgt auf Basis einer kontinuierlichen Betriebsdatenerfassung mit aufwendiger Vorverarbeitung zur Extrahierung mathematisch beschreibbarer Wahrscheinlichkeitsverteilungen.
Der hieraus entwickelte Dienstplan wurde hinsichtlich der erzielbaren Optimierungen im dauerhaften Echteinsatz verifiziert.
[ZW07a] Zorn, Werner: «FMC-QE A New Approach in Quantitative Modeling», Vortrag anlässlich: MSV'07- The 2007 International Conference on Modeling, Simulation and Visualization Methods WorldComp2007, Las Vegas, 28.6.2007.
[ZW07b] Zorn, Werner: «FMC-QE, A New Approach in Quantitative Modeling», Veröffentlichung, Hasso-Plattner-Institut für Softwaresystemtechnik an der Universität Potsdam, 28.6.2007.
In this thesis we introduce the concept of the degree of formality. It is directed against a dualistic point of view, which only distinguishes between formal and informal proofs. This dualistic attitude does not respect the differences between the argumentations classified as informal and it is unproductive because the individual potential of the respective argumentation styles cannot be appreciated and remains untapped.
This thesis has two parts. In the first of them we analyse the concept of the degree of formality (including a discussion about the respective benefits for each degree) while in the second we demonstrate its usefulness in three case studies. In the first case study we will repair Haskell B. Curry's view of mathematics, which incidentally is of great importance in the first part of this thesis, in light of the different degrees of formality. In the second case study we delineate how awareness of the different degrees of formality can be used to help students to learn how to prove. Third, we will show how the advantages of proofs of different degrees of formality can be combined by the development of so called tactics having a medium degree of formality. Together the three case studies show that the degrees of formality provide a convincing solution to the problem of untapped potential.
Physical computing covers the design and realization of interactive objects and installations and allows learners to develop concrete, tangible products of the real world, which arise from their imagination. This can be used in computer science education to provide learners with interesting and motivating access to the different topic areas of the subject in constructionist and creative learning environments. However, if at all, physical computing has so far mostly been taught in afternoon clubs or other extracurricular settings. Thus, for the majority of students so far there are no opportunities to design and create their own interactive objects in regular school lessons.
Despite its increasing popularity also for schools, the topic has not yet been clearly and sufficiently characterized in the context of computer science education. The aim of this doctoral thesis therefore is to clarify physical computing from the perspective of computer science education and to adequately prepare the topic both content-wise and methodologically for secondary school teaching. For this purpose, teaching examples, activities, materials and guidelines for classroom use are developed, implemented and evaluated in schools.
In the theoretical part of the thesis, first the topic is examined from a technical point of view. A structured literature analysis shows that basic concepts used in physical computing can be derived from embedded systems, which are the core of a large field of different application areas and disciplines. Typical methods of physical computing in professional settings are analyzed and, from an educational perspective, elements suitable for computer science teaching in secondary schools are extracted, e. g. tinkering and prototyping. The investigation and classification of suitable tools for school teaching show that microcontrollers and mini computers, often with extensions that greatly facilitate the handling of additional components, are particularly attractive tools for secondary education. Considering the perspectives of science, teachers, students and society, in addition to general design principles, exemplary teaching approaches for school education and suitable learning materials are developed and the design, production and evaluation of a physical computing construction kit suitable for teaching is described.
In the practical part of this thesis, with “My Interactive Garden”, an exemplary approach to integrate physical computing in computer science teaching is tested and evaluated in different courses and refined based on the findings in a design-based research approach. In a series of workshops on physical computing, which is based on a concept for constructionist professional development that is developed specifically for this purpose, teachers are empowered and encouraged to develop and conduct physical computing lessons suitable for their particular classroom settings. Based on their in-class experiences, a process model of physical computing teaching is derived. Interviews with those teachers illustrate that benefits of physical computing, including the tangibility of crafted objects and creativity in the classroom, outweigh possible drawbacks like longer preparation times, technical difficulties or difficult assessment. Hurdles in the classroom are identified and possible solutions discussed.
Empirical investigations in the different settings reveal that “My Interactive Garden” and physical computing in general have a positive impact, among others, on learner motivation, fun and interest in class and perceived competencies.
Finally, the results from all evaluations are combined to evaluate the design principles for physical computing teaching and to provide a perspective on the development of decision-making aids for physical computing activities in school education.
Answer Set Programming (ASP) is a declarative problem solving approach, combining a rich yet simple modeling language with high-performance solving capabilities. Although this has already resulted in various applications, certain aspects of such applications are more naturally modeled using variables over finite domains, for accounting for resources, fine timings, coordinates, or functions. Our goal is thus to extend ASP with constraints over integers while preserving its declarative nature. This allows for fast prototyping and elaboration tolerant problem descriptions of resource related applications. The resulting paradigm is called Constraint Answer Set Programming (CASP).
We present three different approaches for solving CASP problems. The first one, a lazy, modular approach combines an ASP solver with an external system for handling constraints. This approach has the advantage that two state of the art technologies work hand in hand to solve the problem, each concentrating on its part of the problem. The drawback is that inter-constraint dependencies cannot be communicated back to the ASP solver, impeding its learning algorithm. The second approach translates all constraints to ASP. Using the appropriate encoding techniques, this results in a very fast, monolithic system. Unfortunately, due to the large, explicit representation of constraints and variables, translation techniques are restricted to small and mid-sized domains. The third approach merges the lazy and the translational approach, combining the strength of both while removing their weaknesses. To this end, we enhance the dedicated learning techniques of an ASP solver with the inferences of the translating approach in a lazy way. That is, the important knowledge is only made explicit when needed.
By using state of the art techniques from neighboring fields, we provide ways to tackle real world, industrial size problems. By extending CASP to reactive solving, we open up new application areas such as online planning with continuous domains and durations.
Das Thema der vorliegenden Arbeit ist die semantische Suche im Kontext heutiger Informationsmanagementsysteme. Zu diesen Systemen zählen Intranets, Web 3.0-Anwendungen sowie viele Webportale, die Informationen in heterogenen Formaten und Strukturen beinhalten. Auf diesen befinden sich einerseits Daten in strukturierter Form und andererseits Dokumente, die inhaltlich mit diesen Daten in Beziehung stehen. Diese Dokumente sind jedoch in der Regel nur teilweise strukturiert oder vollständig unstrukturiert. So beschreiben beispielsweise Reiseportale durch strukturierte Daten den Zeitraum, das Reiseziel, den Preis einer Reise und geben in unstrukturierter Form weitere Informationen, wie Beschreibungen zum Hotel, Zielort, Ausflugsziele an.
Der Fokus heutiger semantischer Suchmaschinen liegt auf dem Finden von Wissen entweder in strukturierter Form, auch Faktensuche genannt, oder in semi- bzw. unstrukturierter Form, was üblicherweise als semantische Dokumentensuche bezeichnet wird. Einige wenige Suchmaschinen versuchen die Lücke zwischen diesen beiden Ansätzen zu schließen. Diese durchsuchen zwar gleichzeitig strukturierte sowie unstrukturierte Daten, werten diese jedoch entweder weitgehend voneinander unabhängig aus oder schränken die Suchmöglichkeiten stark ein, indem sie beispielsweise nur bestimmte Fragemuster unterstützen. Hierdurch werden die im System verfügbaren Informationen nicht ausgeschöpft und gleichzeitig unterbunden, dass Zusammenhänge zwischen einzelnen Inhalten der jeweiligen Informationssysteme und sich ergänzende Informationen den Benutzer erreichen.
Um diese Lücke zu schließen, wurde in der vorliegenden Arbeit ein neuer hybrider semantischer Suchansatz entwickelt und untersucht, der strukturierte und semi- bzw. unstrukturierte Inhalte während des gesamten Suchprozesses kombiniert. Durch diesen Ansatz werden nicht nur sowohl Fakten als auch Dokumente gefunden, es werden auch Zusammenhänge, die zwischen den unterschiedlich strukturierten Daten bestehen, in jeder Phase der Suche genutzt und fließen in die Suchergebnisse mit ein. Liegt die Antwort zu einer Suchanfrage nicht vollständig strukturiert, in Form von Fakten, oder unstrukturiert, in Form von Dokumenten vor, so liefert dieser Ansatz eine Kombination der beiden. Die Berücksichtigung von unterschiedlich Inhalten während des gesamten Suchprozesses stellt jedoch besondere Herausforderungen an die Suchmaschine. Diese muss in der Lage sein, Fakten und Dokumente in Abhängigkeit voneinander zu durchsuchen, sie zu kombinieren sowie die unterschiedlich strukturierten Ergebnisse in eine geeignete Rangordnung zu bringen. Weiterhin darf die Komplexität der Daten nicht an die Endnutzer weitergereicht werden. Die Darstellung der Inhalte muss vielmehr sowohl bei der Anfragestellung als auch bei der Darbietung der Ergebnisse verständlich und leicht interpretierbar sein.
Die zentrale Fragestellung der Arbeit ist, ob ein hybrider Ansatz auf einer vorgegebenen Datenbasis die Suchanfragen besser beantworten kann als die semantische Dokumentensuche und die Faktensuche für sich genommen, bzw. als eine Suche die diese Ansätze im Rahmen des Suchprozesses nicht kombiniert. Die durchgeführten Evaluierungen aus System- und aus Benutzersicht zeigen, dass die im Rahmen der Arbeit entwickelte hybride semantische Suchlösung durch die Kombination von strukturierten und unstrukturierten Inhalten im Suchprozess bessere Antworten liefert als die oben genannten Verfahren und somit Vorteile gegenüber bisherigen Ansätzen bietet. Eine Befragung von Benutzern macht deutlich, dass die hybride semantische Suche als verständlich empfunden und für heterogen strukturierte Datenmengen bevorzugt wird.
Contemporary multi-core processors are parallel systems that also provide shared memory for programs running on them. Both the increasing number of cores in so-called many-core systems and the still growing computational power of the cores demand for memory systems that are able to deliver high bandwidths. Caches are essential components to satisfy this requirement. Nevertheless, hardware-based cache coherence in many-core chips faces practical limits to provide both coherence and high memory bandwidths. In addition, a shift away from global coherence can be observed. As a result, alternative architectures and suitable programming models need to be investigated.
This thesis focuses on fast communication for non-cache-coherent many-core architectures. Experiments are conducted on the Single-Chip Cloud Computer (SCC), a non-cache-coherent many-core processor with 48 mesh-connected cores. Although originally designed for message passing, the results of this thesis show that shared memory can be efficiently used for one-sided communication on this kind of architecture. One-sided communication enables data exchanges between processes where the receiver is not required to know the details of the performed communication. In the notion of the Message Passing Interface (MPI) standard, this type of communication allows to access memory of remote processes. In order to support this communication scheme on non-cache-coherent architectures, both an efficient process synchronization and a communication scheme with software-managed cache coherence are designed and investigated.
The process synchronization realizes the concept of the general active target synchronization scheme from the MPI standard. An existing classification of implementation approaches is extended and used to identify an appropriate class for the non-cache-coherent shared memory platform. Based on this classification, existing implementations are surveyed in order to find beneficial concepts, which are then used to design a lightweight synchronization protocol for the SCC that uses shared memory and uncached memory accesses. The proposed scheme is not prone to process skew and also enables direct communication as soon as both communication partners are ready. Experimental results show very good scaling properties and up to five times lower synchronization latency compared to a tuned message-based MPI implementation for the SCC.
For the communication, SCOSCo, a shared memory approach with software-managed cache coherence, is presented. According requirements for the coherence that fulfill MPI's separate memory model are formulated, and a lightweight implementation exploiting SCC hard- and software features is developed. Despite a discovered malfunction in the SCC's memory subsystem, the experimental evaluation of the design reveals up to five times better bandwidths and nearly four times lower latencies in micro-benchmarks compared to the SCC-tuned but message-based MPI library. For application benchmarks, like a parallel 3D fast Fourier transform, the runtime share of communication can be reduced by a factor of up to five. In addition, this thesis postulates beneficial hardware concepts that would support software-managed coherence for one-sided communication on future non-cache-coherent architectures where coherence might be only available in local subdomains but not on a global processor level.
Although it has become common practice to build applications based on the reuse of existing components or services, technical complexity and semantic challenges constitute barriers to ensuring a successful and wide reuse of components and services. In the geospatial application domain, the barriers are self-evident due to heterogeneous geographic data, a lack of interoperability and complex analysis processes.
Constructing workflows manually and discovering proper services and data that match user intents and preferences is difficult and time-consuming especially for users who are not trained in software development. Furthermore, considering the multi-objective nature of environmental modeling for the assessment of climate change impacts and the various types of geospatial data (e.g., formats, scales, and georeferencing systems) increases the complexity challenges.
Automatic service composition approaches that provide semantics-based assistance in the process of workflow design have proven to be a solution to overcome these challenges and have become a frequent demand especially by end users who are not IT experts. In this light, the major contributions of this thesis are:
(i) Simplification of service reuse and workflow design of applications for climate impact analysis by following the eXtreme Model-Driven Development (XMDD) paradigm.
(ii) Design of a semantic domain model for climate impact analysis applications that comprises specifically designed services, ontologies that provide domain-specific vocabulary for referring to types and services, and the input/output annotation of the services using the terms defined in the ontologies.
(iii) Application of a constraint-driven method for the automatic composition of workflows for analyzing the impacts of sea-level rise. The application scenario demonstrates the impact of domain modeling decisions on the results and the performance of the synthesis algorithm.
Services that operate over the Internet are under constant threat of being exposed to fraudulent use. Maintaining good user experience for legitimate users often requires the classification of entities as malicious or legitimate in order to initiate countermeasures. As an example, inbound email spam filters decide for spam or non-spam. They can base their decision on both the content of each email as well as on features that summarize prior emails received from the sending server. In general, discriminative classification methods learn to distinguish positive from negative entities. Each decision for a label may be based on features of the entity and related entities. When labels of related entities have strong interdependencies---as can be assumed e.g. for emails being delivered by the same user---classification decisions should not be made independently and dependencies should be modeled in the decision function. This thesis addresses the formulation of discriminative classification problems that are tailored for the specific demands of the following three Internet security applications. Theoretical and algorithmic solutions are devised to protect an email service against flooding of user inboxes, to mitigate abusive usage of outbound email servers, and to protect web servers against distributed denial of service attacks.
In the application of filtering an inbound email stream for unsolicited emails, utilizing features that go beyond each individual email's content can be valuable. Information about each sending mail server can be aggregated over time and may help in identifying unwanted emails. However, while this information will be available to the deployed email filter, some parts of the training data that are compiled by third party providers may not contain this information. The missing features have to be estimated at training time in order to learn a classification model. In this thesis an algorithm is derived that learns a decision function that integrates over a distribution of values for each missing entry. The distribution of missing values is a free parameter that is optimized to learn an optimal decision function.
The outbound stream of emails of an email service provider can be separated by the customer IDs that ask for delivery. All emails that are sent by the same ID in the same period of time are related, both in content and in label. Hijacked customer accounts may send batches of unsolicited emails to other email providers, which in turn might blacklist the sender's email servers after detection of incoming spam emails. The risk of being blocked from further delivery depends on the rate of outgoing unwanted emails and the duration of high spam sending rates. An optimization problem is developed that minimizes the expected cost for the email provider by learning a decision function that assigns a limit on the sending rate to customers based on the each customer's email stream.
Identifying attacking IPs during HTTP-level DDoS attacks allows to block those IPs from further accessing the web servers. DDoS attacks are usually carried out by infected clients that are members of the same botnet and show similar traffic patterns. HTTP-level attacks aim at exhausting one or more resources of the web server infrastructure, such as CPU time. If the joint set of attackers cannot increase resource usage close to the maximum capacity, no effect will be experienced by legitimate users of hosted web sites. However, if the additional load raises the computational burden towards the critical range, user experience will degrade until service may be unavailable altogether. As the loss of missing one attacker depends on block decisions for other attackers---if most other attackers are detected, not blocking one client will likely not be harmful---a structured output model has to be learned. In this thesis an algorithm is developed that learns a structured prediction decoder that searches the space of label assignments, guided by a policy.
Each model is evaluated on real-world data and is compared to reference methods. The results show that modeling each classification problem according to the specific demands of the task improves performance over solutions that do not consider the constraints inherent to an application.
Personal fabrication tools, such as 3D printers, are on the way of enabling a future in which non-technical users will be able to create custom objects. However, while the hardware is there, the current interaction model behind existing design tools is not suitable for non-technical users. Today, 3D printers are operated by fabricating the object in one go, which tends to take overnight due to the slow 3D printing technology. Consequently, the current interaction model requires users to think carefully before printing as every mistake may imply another overnight print. Planning every step ahead, however, is not feasible for non-technical users as they lack the experience to reason about the consequences of their design decisions.
In this dissertation, we propose changing the interaction model around personal fabrication tools to better serve this user group. We draw inspiration from personal computing and argue that the evolution of personal fabrication may resemble the evolution of personal computing: Computing started with machines that executed a program in one go before returning the result to the user. By decreasing the interaction unit to single requests, turn-taking systems such as the command line evolved, which provided users with feedback after every input. Finally, with the introduction of direct-manipulation interfaces, users continuously interacted with a program receiving feedback about every action in real-time. In this dissertation, we explore whether these interaction concepts can be applied to personal fabrication as well.
We start with fabricating an object in one go and investigate how to tighten the feedback-cycle on an object-level: We contribute a method called low-fidelity fabrication, which saves up to 90% fabrication time by creating objects as fast low-fidelity previews, which are sufficient to evaluate key design aspects. Depending on what is currently being tested, we propose different conversions that enable users to focus on different parts: faBrickator allows for a modular design in the early stages of prototyping; when users move on WirePrint allows quickly testing an object's shape, while Platener allows testing an object's technical function. We present an interactive editor for each technique and explain the underlying conversion algorithms.
By interacting on smaller units, such as a single element of an object, we explore what it means to transition from systems that fabricate objects in one go to turn-taking systems. We start with a 2D system called constructable: Users draw with a laser pointer onto the workpiece inside a laser cutter. The drawing is captured with an overhead camera. As soon as the the user finishes drawing an element, such as a line, the constructable system beautifies the path and cuts it--resulting in physical output after every editing step. We extend constructable towards 3D editing by developing a novel laser-cutting technique for 3D objects called LaserOrigami that works by heating up the workpiece with the defocused laser until the material becomes compliant and bends down under gravity. While constructable and LaserOrigami allow for fast physical feedback, the interaction is still best described as turn-taking since it consists of two discrete steps: users first create an input and afterwards the system provides physical output.
By decreasing the interaction unit even further to a single feature, we can achieve real-time physical feedback: Input by the user and output by the fabrication device are so tightly coupled that no visible lag exists. This allows us to explore what it means to transition from turn-taking interfaces, which only allow exploring one option at a time, to direct manipulation interfaces with real-time physical feedback, which allow users to explore the entire space of options continuously with a single interaction. We present a system called FormFab, which allows for such direct control. FormFab is based on the same principle as LaserOrigami: It uses a workpiece that when warmed up becomes compliant and can be reshaped. However, FormFab achieves the reshaping not based on gravity, but through a pneumatic system that users can control interactively. As users interact, they see the shape change in real-time.
We conclude this dissertation by extrapolating the current evolution into a future in which large numbers of people use the new technology to create objects. We see two additional challenges on the horizon: sustainability and intellectual property. We investigate sustainability by demonstrating how to print less and instead patch physical objects. We explore questions around intellectual property with a system called Scotty that transfers objects without creating duplicates, thereby preserving the designer's copyright.
Computer Security deals with the detection and mitigation of threats to computer networks, data, and computing hardware. This
thesis addresses the following two computer security problems: email spam campaign and malware detection.
Email spam campaigns can easily be generated using popular dissemination tools by specifying simple grammars that serve as message templates. A grammar is disseminated to nodes of a bot net, the nodes create messages by instantiating the grammar at random. Email spam campaigns can encompass huge data volumes and therefore pose a threat to the stability of the infrastructure of email service providers that have to store them. Malware -software that serves a malicious purpose- is affecting web servers, client computers via active content, and client computers through executable files. Without the help of malware detection systems it would be easy for malware creators to collect sensitive information or to infiltrate computers.
The detection of threats -such as email-spam messages, phishing messages, or malware- is an adversarial and therefore intrinsically
difficult problem. Threats vary greatly and evolve over time. The detection of threats based on manually-designed rules is therefore
difficult and requires a constant engineering effort. Machine-learning is a research area that revolves around the analysis of data and the discovery of patterns that describe aspects of the data. Discriminative learning methods extract prediction models from data that are optimized to predict a target attribute as accurately as possible. Machine-learning methods hold the promise of automatically identifying patterns that robustly and accurately detect threats. This thesis focuses on the design and analysis of discriminative learning methods for the two computer-security problems under investigation: email-campaign and malware detection.
The first part of this thesis addresses email-campaign detection. We focus on regular expressions as a syntactic framework, because regular expressions are intuitively comprehensible by security engineers and administrators, and they can be applied as a detection mechanism in an extremely efficient manner. In this setting, a prediction model is provided with exemplary messages from an email-spam campaign. The prediction model has to generate a regular expression that reveals the syntactic pattern that underlies the entire campaign, and that a security engineers finds comprehensible and feels confident enough to use the expression to blacklist further messages at the email server. We model this problem as two-stage learning problem with structured input and output spaces which can be solved using standard cutting plane methods. Therefore we develop an appropriate loss function, and derive a decoder for the resulting optimization problem.
The second part of this thesis deals with the problem of predicting whether a given JavaScript or PHP file is malicious or benign. Recent malware analysis techniques use static or dynamic features, or both. In fully dynamic analysis, the software or script is executed and observed for malicious behavior in a sandbox environment. By contrast, static analysis is based on features that can be extracted directly from the program file. In order to bypass static detection mechanisms, code obfuscation techniques are used to spread a malicious program file in many different syntactic variants. Deobfuscating the code before applying a static classifier can be subjected to mostly static code analysis and can overcome the problem of obfuscated malicious code, but on the other hand increases the computational costs of malware detection by an order of magnitude. In this thesis we present a cascaded architecture in which a classifier first performs a static analysis of the original code and -based on the outcome of this first classification step- the code may be deobfuscated and classified again. We explore several types of features including token $n$-grams, orthogonal sparse bigrams, subroutine-hashings, and syntax-tree features and study the robustness of detection methods and feature types against the evolution of malware over time. The developed tool scans very large file collections quickly and accurately.
Each model is evaluated on real-world data and compared to reference methods. Our approach of inferring regular expressions to filter emails belonging to an email spam campaigns leads to models with a high true-positive rate at a very low false-positive rate that is an order of magnitude lower than that of a commercial content-based filter. Our presented system -REx-SVMshort- is being used by a commercial email service provider and complements content-based and IP-address based filtering.
Our cascaded malware detection system is evaluated on a high-quality data set of almost 400,000 conspicuous PHP files and a collection of more than 1,00,000 JavaScript files. From our case study we can conclude that our system can quickly and accurately process large data collections at a low false-positive rate.
Geospatial data has become a natural part of a growing number of information systems and services in the economy, society, and people's personal lives. In particular, virtual 3D city and landscape models constitute valuable information sources within a wide variety of applications such as urban planning, navigation, tourist information, and disaster management. Today, these models are often visualized in detail to provide realistic imagery. However, a photorealistic rendering does not automatically lead to high image quality, with respect to an effective information transfer, which requires important or prioritized information to be interactively highlighted in a context-dependent manner.
Approaches in non-photorealistic renderings particularly consider a user's task and camera perspective when attempting optimal expression, recognition, and communication of important or prioritized information. However, the design and implementation of non-photorealistic rendering techniques for 3D geospatial data pose a number of challenges, especially when inherently complex geometry, appearance, and thematic data must be processed interactively. Hence, a promising technical foundation is established by the programmable and parallel computing architecture of graphics processing units.
This thesis proposes non-photorealistic rendering techniques that enable both the computation and selection of the abstraction level of 3D geospatial model contents according to user interaction and dynamically changing thematic information. To achieve this goal, the techniques integrate with hardware-accelerated rendering pipelines using shader technologies of graphics processing units for real-time image synthesis. The techniques employ principles of artistic rendering, cartographic generalization, and 3D semiotics—unlike photorealistic rendering—to synthesize illustrative renditions of geospatial feature type entities such as water surfaces, buildings, and infrastructure networks. In addition, this thesis contributes a generic system that enables to integrate different graphic styles—photorealistic and non-photorealistic—and provide their seamless transition according to user tasks, camera view, and image resolution.
Evaluations of the proposed techniques have demonstrated their significance to the field of geospatial information visualization including topics such as spatial perception, cognition, and mapping. In addition, the applications in illustrative and focus+context visualization have reflected their potential impact on optimizing the information transfer regarding factors such as cognitive load, integration of non-realistic information, visualization of uncertainty, and visualization on small displays.
The main objective of this dissertation is to analyse prerequisites, expectations, apprehensions, and attitudes of students studying computer science, who are willing to gain a bachelor degree. The research will also investigate in the students’ learning style according to the Felder-Silverman model. These investigations fall in the attempt to make an impact on reducing the “dropout”/shrinkage rate among students, and to suggest a better learning environment.
The first investigation starts with a survey that has been made at the computer science department at the University of Baghdad to investigate the attitudes of computer science students in an environment dominated by women, showing the differences in attitudes between male and female students in different study years. Students are accepted to university studies via a centrally controlled admission procedure depending mainly on their final score at school. This leads to a high percentage of students studying subjects they do not want. Our analysis shows that 75% of the female students do not regret studying computer science although it was not their first choice. And according to statistics over previous years, women manage to succeed in their study and often graduate on top of their class. We finish with a comparison of attitudes between the freshman students of two different cultures and two different university enrolment procedures (University of Baghdad, in Iraq, and the University of Potsdam, in Germany) both with opposite gender majority.
The second step of investigation took place at the department of computer science at the University of Potsdam in Germany and analyzes the learning styles of students studying the three major fields of study offered by the department (computer science, business informatics, and computer science teaching). Investigating the differences in learning styles between the students of those study fields who usually take some joint courses is important to be aware of which changes are necessary to be adopted in the teaching methods to address those different students. It was a two stage study using two questionnaires; the main one is based on the Index of Learning Styles Questionnaire of B. A. Solomon and R. M. Felder, and the second questionnaire was an investigation on the students’ attitudes towards the findings of their personal first questionnaire. Our analysis shows differences in the preferences of learning style between male and female students of the different study fields, as well as differences between students with the different specialties (computer science, business informatics, and computer science teaching).
The third investigation looks closely into the difficulties, issues, apprehensions and expectations of freshman students studying computer science. The study took place at the computer science department at the University of Potsdam with a volunteer sample of students. The goal is to determine and discuss the difficulties and issues that they are facing in their study that may lead them to think in dropping-out, changing the study field, or changing the university. The research continued with the same sample of students (with business informatics students being the majority) through more than three semesters. Difficulties and issues during the study were documented, as well as students’ attitudes, apprehensions, and expectations. Some of the professors and lecturers opinions and solutions to some students’ problems were also documented. Many participants had apprehensions and difficulties, especially towards informatics subjects. Some business informatics participants began to think of changing the university, in particular when they reached their third semester, others thought about changing their field of study. Till the end of this research, most of the participants continued in their studies (the study they have started with or the new study they have changed to) without leaving the higher education system.
Boolean constraint solving technology has made tremendous progress over the last decade, leading to industrial-strength solvers, for example, in the areas of answer set programming (ASP), the constraint satisfaction problem (CSP), propositional satisfiability (SAT) and satisfiability of quantified Boolean formulas (QBF). However, in all these areas, there exist multiple solving strategies that work well on different applications; no strategy dominates all other strategies. Therefore, no individual solver shows robust state-of-the-art performance in all kinds of applications. Additionally, the question arises how to choose a well-performing solving strategy for a given application; this is a challenging question even for solver and domain experts. One way to address this issue is the use of portfolio solvers, that is, a set of different solvers or solver configurations. We present three new automatic portfolio methods: (i) automatic construction of parallel portfolio solvers (ACPP) via algorithm configuration,(ii) solving the $NP$-hard problem of finding effective algorithm schedules with Answer Set Programming (aspeed), and (iii) a flexible algorithm selection framework (claspfolio2) allowing for fair comparison of different selection approaches. All three methods show improved performance and robustness in comparison to individual solvers on heterogeneous instance sets from many different applications. Since parallel solvers are important to effectively solve hard problems on parallel computation systems (e.g., multi-core processors), we extend all three approaches to be effectively applicable in parallel settings. We conducted extensive experimental studies different instance sets from ASP, CSP, MAXSAT, Operation Research (OR), SAT and QBF that indicate an improvement in the state-of-the-art solving heterogeneous instance sets. Last but not least, from our experimental studies, we deduce practical advice regarding the question when to apply which of our methods.
Deciphering the functioning of biological networks is one of the central tasks in systems biology. In particular, signal transduction networks are crucial for the understanding of the cellular response to external and internal perturbations. Importantly, in order to cope with the complexity of these networks, mathematical and computational modeling is required. We propose a computational modeling framework in order to achieve more robust discoveries in the context of logical signaling networks. More precisely, we focus on modeling the response of logical signaling networks by means of automated reasoning using Answer Set Programming (ASP). ASP provides a declarative language for modeling various knowledge representation and reasoning problems. Moreover, available ASP solvers provide several reasoning modes for assessing the multitude of answer sets. Therefore, leveraging its rich modeling language and its highly efficient solving capacities, we use ASP to address three challenging problems in the context of logical signaling networks: learning of (Boolean) logical networks, experimental design, and identification of intervention strategies. Overall, the contribution of this thesis is three-fold. Firstly, we introduce a mathematical framework for characterizing and reasoning on the response of logical signaling networks. Secondly, we contribute to a growing list of successful applications of ASP in systems biology. Thirdly, we present a software providing a complete pipeline for automated reasoning on the response of logical signaling networks.
The objective and motivation behind this research is to provide applications with easy-to-use interfaces to communities of deaf and functionally illiterate users, which enables them to work without any human assistance. Although recent years have witnessed technological advancements, the availability of technology does not ensure accessibility to information and communication technologies (ICT). Extensive use of text from menus to document contents means that deaf or functionally illiterate can not access services implemented on most computer software. Consequently, most existing computer applications pose an accessibility barrier to those who are unable to read fluently. Online technologies intended for such groups should be developed in continuous partnership with primary users and include a thorough investigation into their limitations, requirements and usability barriers. In this research, I investigated existing tools in voice, web and other multimedia technologies to identify learning gaps and explored ways to enhance the information literacy for deaf and functionally illiterate users. I worked on the development of user-centered interfaces to increase the capabilities of deaf and low literacy users by enhancing lexical resources and by evaluating several multimedia interfaces for them. The interface of the platform-independent Italian Sign Language (LIS) Dictionary has been developed to enhance the lexical resources for deaf users. The Sign Language Dictionary accepts Italian lemmas as input and provides their representation in the Italian Sign Language as output. The Sign Language dictionary has 3082 signs as set of Avatar animations in which each sign is linked to a corresponding Italian lemma. I integrated the LIS lexical resources with MultiWordNet (MWN) database to form the first LIS MultiWordNet(LMWN). LMWN contains information about lexical relations between words, semantic relations between lexical concepts (synsets), correspondences between Italian and sign language lexical concepts and semantic fields (domains). The approach enhances the deaf users’ understanding of written Italian language and shows that a relatively small set of lexicon can cover a significant portion of MWN. Integration of LIS signs with MWN made it useful tool for computational linguistics and natural language processing. The rule-based translation process from written Italian text to LIS has been transformed into service-oriented system. The translation process is composed of various modules including parser, semantic interpreter, generator, and spatial allocation planner. This translation procedure has been implemented in the Java Application Building Center (jABC), which is a framework for extreme model driven design (XMDD). The XMDD approach focuses on bringing software development closer to conceptual design, so that the functionality of a software solution could be understood by someone who is unfamiliar with programming concepts. The transformation addresses the heterogeneity challenge and enhances the re-usability of the system. For enhancing the e-participation of functionally illiterate users, two detailed studies were conducted in the Republic of Rwanda. In the first study, the traditional (textual) interface was compared with the virtual character-based interactive interface. The study helped to identify usability barriers and users evaluated these interfaces according to three fundamental areas of usability, i.e. effectiveness, efficiency and satisfaction. In another study, we developed four different interfaces to analyze the usability and effects of online assistance (consistent help) for functionally illiterate users and compared different help modes including textual, vocal and virtual character on the performance of semi-literate users. In our newly designed interfaces the instructions were automatically translated in Swahili language. All the interfaces were evaluated on the basis of task accomplishment, time consumption, System Usability Scale (SUS) rating and number of times the help was acquired. The results show that the performance of semi-literate users improved significantly when using the online assistance. The dissertation thus introduces a new development approach in which virtual characters are used as additional support for barely literate or naturally challenged users. Such components enhanced the application utility by offering a variety of services like translating contents in local language, providing additional vocal information, and performing automatic translation from text to sign language. Obviously, there is no such thing as one design solution that fits for all in the underlying domain. Context sensitivity, literacy and mental abilities are key factors on which I concentrated and the results emphasize that computer interfaces must be based on a thoughtful definition of target groups, purposes and objectives.
Learning a model for the relationship between the attributes and the annotated labels of data examples serves two purposes. Firstly, it enables the prediction of the label for examples without annotation. Secondly, the parameters of the model can provide useful insights into the structure of the data. If the data has an inherent partitioned structure, it is natural to mirror this structure in the model. Such mixture models predict by combining the individual predictions generated by the mixture components which correspond to the partitions in the data. Often the partitioned structure is latent, and has to be inferred when learning the mixture model. Directly evaluating the accuracy of the inferred partition structure is, in many cases, impossible because the ground truth cannot be obtained for comparison. However it can be assessed indirectly by measuring the prediction accuracy of the mixture model that arises from it. This thesis addresses the interplay between the improvement of predictive accuracy by uncovering latent cluster structure in data, and further addresses the validation of the estimated structure by measuring the accuracy of the resulting predictive model. In the application of filtering unsolicited emails, the emails in the training set are latently clustered into advertisement campaigns. Uncovering this latent structure allows filtering of future emails with very low false positive rates. In order to model the cluster structure, a Bayesian clustering model for dependent binary features is developed in this thesis. Knowing the clustering of emails into campaigns can also aid in uncovering which emails have been sent on behalf of the same network of captured hosts, so-called botnets. This association of emails to networks is another layer of latent clustering. Uncovering this latent structure allows service providers to further increase the accuracy of email filtering and to effectively defend against distributed denial-of-service attacks. To this end, a discriminative clustering model is derived in this thesis that is based on the graph of observed emails. The partitionings inferred using this model are evaluated through their capacity to predict the campaigns of new emails. Furthermore, when classifying the content of emails, statistical information about the sending server can be valuable. Learning a model that is able to make use of it requires training data that includes server statistics. In order to also use training data where the server statistics are missing, a model that is a mixture over potentially all substitutions thereof is developed. Another application is to predict the navigation behavior of the users of a website. Here, there is no a priori partitioning of the users into clusters, but to understand different usage scenarios and design different layouts for them, imposing a partitioning is necessary. The presented approach simultaneously optimizes the discriminative as well as the predictive power of the clusters. Each model is evaluated on real-world data and compared to baseline methods. The results show that explicitly modeling the assumptions about the latent cluster structure leads to improved predictions compared to the baselines. It is beneficial to incorporate a small number of hyperparameters that can be tuned to yield the best predictions in cases where the prediction accuracy can not be optimized directly.
This thesis presents novel ideas and research findings for the Web of Data – a global data space spanning many so-called Linked Open Data sources. Linked Open Data adheres to a set of simple principles to allow easy access and reuse for data published on the Web. Linked Open Data is by now an established concept and many (mostly academic) publishers adopted the principles building a powerful web of structured knowledge available to everybody. However, so far, Linked Open Data does not yet play a significant role among common web technologies that currently facilitate a high-standard Web experience. In this work, we thoroughly discuss the state-of-the-art for Linked Open Data and highlight several shortcomings – some of them we tackle in the main part of this work. First, we propose a novel type of data source meta-information, namely the topics of a dataset. This information could be published with dataset descriptions and support a variety of use cases, such as data source exploration and selection. For the topic retrieval, we present an approach coined Annotated Pattern Percolation (APP), which we evaluate with respect to topics extracted from Wikipedia portals. Second, we contribute to entity linking research by presenting an optimization model for joint entity linking, showing its hardness, and proposing three heuristics implemented in the LINked Data Alignment (LINDA) system. Our first solution can exploit multi-core machines, whereas the second and third approach are designed to run in a distributed shared-nothing environment. We discuss and evaluate the properties of our approaches leading to recommendations which algorithm to use in a specific scenario. The distributed algorithms are among the first of their kind, i.e., approaches for joint entity linking in a distributed fashion. Also, we illustrate that we can tackle the entity linking problem on the very large scale with data comprising more than 100 millions of entity representations from very many sources. Finally, we approach a sub-problem of entity linking, namely the alignment of concepts. We again target a method that looks at the data in its entirety and does not neglect existing relations. Also, this concept alignment method shall execute very fast to serve as a preprocessing for further computations. Our approach, called Holistic Concept Matching (HCM), achieves the required speed through grouping the input by comparing so-called knowledge representations. Within the groups, we perform complex similarity computations, relation conclusions, and detect semantic contradictions. The quality of our result is again evaluated on a large and heterogeneous dataset from the real Web. In summary, this work contributes a set of techniques for enhancing the current state of the Web of Data. All approaches have been tested on large and heterogeneous real-world input.
Cloud computing is a model for enabling on-demand access to a shared pool of computing resources. With virtually limitless on-demand resources, a cloud environment enables the hosted Internet application to quickly cope when there is an increase in the workload. However, the overhead of provisioning resources exposes the Internet application to periods of under-provisioning and performance degradation. Moreover, the performance interference, due to the consolidation in the cloud environment, complicates the performance management of the Internet applications. In this dissertation, we propose two approaches to mitigate the impact of the resources provisioning overhead. The first approach employs control theory to scale resources vertically and cope fast with workload. This approach assumes that the provider has knowledge and control over the platform running in the virtual machines (VMs), which limits it to Platform as a Service (PaaS) and Software as a Service (SaaS) providers. The second approach is a customer-side one that deals with the horizontal scalability in an Infrastructure as a Service (IaaS) model. It addresses the trade-off problem between cost and performance with a multi-goal optimization solution. This approach finds the scale thresholds that achieve the highest performance with the lowest increase in the cost. Moreover, the second approach employs a proposed time series forecasting algorithm to scale the application proactively and avoid under-utilization periods. Furthermore, to mitigate the interference impact on the Internet application performance, we developed a system which finds and eliminates the VMs suffering from performance interference. The developed system is a light-weight solution which does not imply provider involvement. To evaluate our approaches and the designed algorithms at large-scale level, we developed a simulator called (ScaleSim). In the simulator, we implemented scalability components acting as the scalability components of Amazon EC2. The current scalability implementation in Amazon EC2 is used as a reference point for evaluating the improvement in the scalable application performance. ScaleSim is fed with realistic models of the RUBiS benchmark extracted from the real environment. The workload is generated from the access logs of the 1998 world cup website. The results show that optimizing the scalability thresholds and adopting proactive scalability can mitigate 88% of the resources provisioning overhead impact with only a 9% increase in the cost.
This thesis proposes a privacy protection framework for the controlled distribution and use of personal private data. The framework is based on the idea that privacy policies can be set directly by the data owner and can be automatically enforced against the data user. Data privacy continues to be a very important topic, as our dependency on electronic communication maintains its current growth, and private data is shared between multiple devices, users and locations. The growing amount and the ubiquitous availability of personal private data increases the likelihood of data misuse. Early privacy protection techniques, such as anonymous email and payment systems have focused on data avoidance and anonymous use of services. They did not take into account that data sharing cannot be avoided when people participate in electronic communication scenarios that involve social interactions. This leads to a situation where data is shared widely and uncontrollably and in most cases the data owner has no control over further distribution and use of personal private data. Previous efforts to integrate privacy awareness into data processing workflows have focused on the extension of existing access control frameworks with privacy aware functions or have analysed specific individual problems such as the expressiveness of policy languages. So far, very few implementations of integrated privacy protection mechanisms exist and can be studied to prove their effectiveness for privacy protection. Second level issues that stem from practical application of the implemented mechanisms, such as usability, life-time data management and changes in trustworthiness have received very little attention so far, mainly because they require actual implementations to be studied. Most existing privacy protection schemes silently assume that it is the privilege of the data user to define the contract under which personal private data is released. Such an approach simplifies policy management and policy enforcement for the data user, but leaves the data owner with a binary decision to submit or withhold his or her personal data based on the provided policy. We wanted to empower the data owner to express his or her privacy preferences through privacy policies that follow the so-called Owner-Retained Access Control (ORAC) model. ORAC has been proposed by McCollum, et al. as an alternate access control mechanism that leaves the authority over access decisions by the originator of the data. The data owner is given control over the release policy for his or her personal data, and he or she can set permissions or restrictions according to individually perceived trust values. Such a policy needs to be expressed in a coherent way and must allow the deterministic policy evaluation by different entities. The privacy policy also needs to be communicated from the data owner to the data user, so that it can be enforced. Data and policy are stored together as a Protected Data Object that follows the Sticky Policy paradigm as defined by Mont, et al. and others. We developed a unique policy combination approach that takes usability aspects for the creation and maintenance of policies into consideration. Our privacy policy consists of three parts: A Default Policy provides basic privacy protection if no specific rules have been entered by the data owner. An Owner Policy part allows the customisation of the default policy by the data owner. And a so-called Safety Policy guarantees that the data owner cannot specify disadvantageous policies, which, for example, exclude him or her from further access to the private data. The combined evaluation of these three policy-parts yields the necessary access decision. The automatic enforcement of privacy policies in our protection framework is supported by a reference monitor implementation. We started our work with the development of a client-side protection mechanism that allows the enforcement of data-use restrictions after private data has been released to the data user. The client-side enforcement component for data-use policies is based on a modified Java Security Framework. Privacy policies are translated into corresponding Java permissions that can be automatically enforced by the Java Security Manager. When we later extended our work to implement server-side protection mechanisms, we found several drawbacks for the privacy enforcement through the Java Security Framework. We solved this problem by extending our reference monitor design to use Aspect-Oriented Programming (AOP) and the Java Reflection API to intercept data accesses in existing applications and provide a way to enforce data owner-defined privacy policies for business applications.
3D from 2D touch
(2013)
While interaction with computers used to be dominated by mice and keyboards, new types of sensors now allow users to interact through touch, speech, or using their whole body in 3D space. These new interaction modalities are often referred to as "natural user interfaces" or "NUIs." While 2D NUIs have experienced major success on billions of mobile touch devices sold, 3D NUI systems have so far been unable to deliver a mobile form factor, mainly due to their use of cameras. The fact that cameras require a certain distance from the capture volume has prevented 3D NUI systems from reaching the flat form factor mobile users expect. In this dissertation, we address this issue by sensing 3D input using flat 2D sensors. The systems we present observe the input from 3D objects as 2D imprints upon physical contact. By sampling these imprints at very high resolutions, we obtain the objects' textures. In some cases, a texture uniquely identifies a biometric feature, such as the user's fingerprint. In other cases, an imprint stems from the user's clothing, such as when walking on multitouch floors. By analyzing from which part of the 3D object the 2D imprint results, we reconstruct the object's pose in 3D space. While our main contribution is a general approach to sensing 3D input on 2D sensors upon physical contact, we also demonstrate three applications of our approach. (1) We present high-accuracy touch devices that allow users to reliably touch targets that are a third of the size of those on current touch devices. We show that different users and 3D finger poses systematically affect touch sensing, which current devices perceive as random input noise. We introduce a model for touch that compensates for this systematic effect by deriving the 3D finger pose and the user's identity from each touch imprint. We then investigate this systematic effect in detail and explore how users conceptually touch targets. Our findings indicate that users aim by aligning visual features of their fingers with the target. We present a visual model for touch input that eliminates virtually all systematic effects on touch accuracy. (2) From each touch, we identify users biometrically by analyzing their fingerprints. Our prototype Fiberio integrates fingerprint scanning and a display into the same flat surface, solving a long-standing problem in human-computer interaction: secure authentication on touchscreens. Sensing 3D input and authenticating users upon touch allows Fiberio to implement a variety of applications that traditionally require the bulky setups of current 3D NUI systems. (3) To demonstrate the versatility of 3D reconstruction on larger touch surfaces, we present a high-resolution pressure-sensitive floor that resolves the texture of objects upon touch. Using the same principles as before, our system GravitySpace analyzes all imprints and identifies users based on their shoe soles, detects furniture, and enables accurate touch input using feet. By classifying all imprints, GravitySpace detects the users' body parts that are in contact with the floor and then reconstructs their 3D body poses using inverse kinematics. GravitySpace thus enables a range of applications for future 3D NUI systems based on a flat sensor, such as smart rooms in future homes. We conclude this dissertation by projecting into the future of mobile devices. Focusing on the mobility aspect of our work, we explore how NUI devices may one day augment users directly in the form of implanted devices.