004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Doctoral Thesis (150) (remove)
Is part of the Bibliography
- yes (150) (remove)
Keywords
- Maschinelles Lernen (8)
- machine learning (8)
- Machine Learning (7)
- Antwortmengenprogrammierung (5)
- Geschäftsprozessmanagement (4)
- Prozessmodellierung (4)
- Visualisierung (4)
- answer set programming (4)
- 3D visualization (3)
- Bildverarbeitung (3)
Institute
- Institut für Informatik und Computational Science (80)
- Hasso-Plattner-Institut für Digital Engineering GmbH (37)
- Hasso-Plattner-Institut für Digital Engineering gGmbH (36)
- Digital Engineering Fakultät (6)
- Extern (4)
- Institut für Umweltwissenschaften und Geographie (3)
- Wirtschaftswissenschaften (3)
- Department Linguistik (1)
- Interdisziplinäres Zentrum für Kognitive Studien (1)
- Mathematisch-Naturwissenschaftliche Fakultät (1)
With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.
These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.
In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately.
To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency.
For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
This thesis is concerned with the solution of the blind source separation problem (BSS). The BSS problem occurs frequently in various scientific and technical applications. In essence, it consists in separating meaningful underlying components out of a mixture of a multitude of superimposed signals. In the recent research literature there are two related approaches to the BSS problem: The first is known as Independent Component Analysis (ICA), where the goal is to transform the data such that the components become as independent as possible. The second is based on the notion of diagonality of certain characteristic matrices derived from the data. Here the goal is to transform the matrices such that they become as diagonal as possible. In this thesis we study the latter method of approximate joint diagonalization (AJD) to achieve a solution of the BSS problem. After an introduction to the general setting, the thesis provides an overview on particular choices for the set of target matrices that can be used for BSS by joint diagonalization. As the main contribution of the thesis, new algorithms for approximate joint diagonalization of several matrices with non-orthogonal transformations are developed. These newly developed algorithms will be tested on synthetic benchmark datasets and compared to other previous diagonalization algorithms. Applications of the BSS methods to biomedical signal processing are discussed and exemplified with real-life data sets of multi-channel biomagnetic recordings.
Recently, due to an increasing demand on functionality and flexibility, beforehand isolated systems have become interconnected to gain powerful adaptive Systems of Systems (SoS) solutions with an overall robust, flexible and emergent behavior. The adaptive SoS comprises a variety of different system types ranging from small embedded to adaptive cyber-physical systems. On the one hand, each system is independent, follows a local strategy and optimizes its behavior to reach its goals. On the other hand, systems must cooperate with each other to enrich the overall functionality to jointly perform on the SoS level reaching global goals, which cannot be satisfied by one system alone. Due to difficulties of local and global behavior optimizations conflicts may arise between systems that have to be solved by the adaptive SoS.
This thesis proposes a modeling language that facilitates the description of an adaptive SoS by considering the adaptation capabilities in form of feedback loops as first class entities. Moreover, this thesis adopts the Models@runtime approach to integrate the available knowledge in the systems as runtime models into the modeled adaptation logic. Furthermore, the modeling language focuses on the description of system interactions within the adaptive SoS to reason about individual system functionality and how it emerges via collaborations to an overall joint SoS behavior. Therefore, the modeling language approach enables the specification of local adaptive system behavior, the integration of knowledge in form of runtime models and the joint interactions via collaboration to place the available adaptive behavior in an overall layered, adaptive SoS architecture.
Beside the modeling language, this thesis proposes analysis rules to investigate the modeled adaptive SoS, which enables the detection of architectural patterns as well as design flaws and pinpoints to possible system threats. Moreover, a simulation framework is presented, which allows the direct execution of the modeled SoS architecture. Therefore, the analysis rules and the simulation framework can be used to verify the interplay between systems as well as the modeled adaptation effects within the SoS. This thesis realizes the proposed concepts of the modeling language by mapping them to a state of the art standard from the automotive domain and thus, showing their applicability to actual systems. Finally, the modeling language approach is evaluated by remodeling up to date research scenarios from different domains, which demonstrates that the modeling language concepts are powerful enough to cope with a broad range of existing research problems.
This thesis presents methods, techniques and tools for developing three-dimensional representations of tactical intelligence assessments. Techniques from GIScience are combined with crime mapping methods. The range of methods applied in this study provides spatio-temporal GIS analysis as well as 3D geovisualisation and GIS programming. The work presents methods to enhance digital three-dimensional city models with application specific thematic information. This information facilitates further geovisual analysis, for instance, estimations of urban risks exposure. Specific methods and workflows are developed to facilitate the integration of spatio-temporal crime scene analysis results into 3D tactical intelligence assessments. Analysis comprises hotspot identification with kernel-density-estimation techniques (KDE), LISA-based verification of KDE hotspots as well as geospatial hotspot area characterisation and repeat victimisation analysis. To visualise the findings of such extensive geospatial analysis, three-dimensional geovirtual environments are created. Workflows are developed to integrate analysis results into these environments and to combine them with additional geospatial data. The resulting 3D visualisations allow for an efficient communication of complex findings of geospatial crime scene analysis.
3D point clouds are a universal and discrete digital representation of three-dimensional objects and environments. For geospatial applications, 3D point clouds have become a fundamental type of raw data acquired and generated using various methods and techniques. In particular, 3D point clouds serve as raw data for creating digital twins of the built environment.
This thesis concentrates on the research and development of concepts, methods, and techniques for preprocessing, semantically enriching, analyzing, and visualizing 3D point clouds for applications around transport infrastructure. It introduces a collection of preprocessing techniques that aim to harmonize raw 3D point cloud data, such as point density reduction and scan profile detection. Metrics such as, e.g., local density, verticality, and planarity are calculated for later use. One of the key contributions tackles the problem of analyzing and deriving semantic information in 3D point clouds. Three different approaches are investigated: a geometric analysis, a machine learning approach operating on synthetically generated 2D images, and a machine learning approach operating on 3D point clouds without intermediate representation.
In the first application case, 2D image classification is applied and evaluated for mobile mapping data focusing on road networks to derive road marking vector data. The second application case investigates how 3D point clouds can be merged with ground-penetrating radar data for a combined visualization and to automatically identify atypical areas in the data. For example, the approach detects pavement regions with developing potholes. The third application case explores the combination of a 3D environment based on 3D point clouds with panoramic imagery to improve visual representation and the detection of 3D objects such as traffic signs.
The presented methods were implemented and tested based on software frameworks for 3D point clouds and 3D visualization. In particular, modules for metric computation, classification procedures, and visualization techniques were integrated into a modular pipeline-based C++ research framework for geospatial data processing, extended by Python machine learning scripts. All visualization and analysis techniques scale to large real-world datasets such as road networks of entire cities or railroad networks.
The thesis shows that some use cases allow taking advantage of established image vision methods to analyze images rendered from mobile mapping data efficiently. The two presented semantic classification methods working directly on 3D point clouds are use case independent and show similar overall accuracy when compared to each other. While the geometry-based method requires less computation time, the machine learning-based method supports arbitrary semantic classes but requires training the network with ground truth data. Both methods can be used in combination to gradually build this ground truth with manual corrections via a respective annotation tool.
This thesis contributes results for IT system engineering of applications, systems, and services that require spatial digital twins of transport infrastructure such as road networks and railroad networks based on 3D point clouds as raw data. It demonstrates the feasibility of fully automated data flows that map captured 3D point clouds to semantically classified models. This provides a key component for seamlessly integrated spatial digital twins in IT solutions that require up-to-date, object-based, and semantically enriched information about the built environment.
Most of the microelectronic circuits fabricated today are synchronous, i.e. they are driven by one or several clock signals. Synchronous circuit design faces several fundamental challenges such as high-speed clock distribution, integration of multiple cores operating at different clock rates, reduction of power consumption and dealing with voltage, temperature, manufacturing and runtime variations. Asynchronous or clockless design plays a key role in alleviating these challenges, however the design and test of asynchronous circuits is much more difficult in comparison to their synchronous counterparts. A driving force for a widespread use of asynchronous technology is the availability of mature EDA (Electronic Design Automation) tools which provide an entire automated design flow starting from an HDL (Hardware Description Language) specification yielding the final circuit layout. Even though there was much progress in developing such EDA tools for asynchronous circuit design during the last two decades, the maturity level as well as the acceptance of them is still not comparable with tools for synchronous circuit design. In particular, logic synthesis (which implies the application of Boolean minimisation techniques) for the entire system's control path can significantly improve the efficiency of the resulting asynchronous implementation, e.g. in terms of chip area and performance. However, logic synthesis, in particular for asynchronous circuits, suffers from complexity problems. Signal Transitions Graphs (STGs) are labelled Petri nets which are a widely used to specify the interface behaviour of speed independent (SI) circuits - a robust subclass of asynchronous circuits. STG decomposition is a promising approach to tackle complexity problems like state space explosion in logic synthesis of SI circuits. The (structural) decomposition of STGs is guided by a partition of the output signals and generates a usually much smaller component STG for each partition member, i.e. a component STG with a much smaller state space than the initial specification. However, decomposition can result in component STGs that in isolation have so-called irreducible CSC conflicts (i.e. these components are not SI synthesisable anymore) even if the specification has none of them. A new approach is presented to avoid such conflicts by introducing internal communication between the components. So far, STG decompositions are guided by the finest output partitions, i.e. one output per component. However, this might not yield optimal circuit implementations. Efficient heuristics are presented to determine coarser partitions leading to improved circuits in terms of chip area. For the new algorithms correctness proofs are given and their implementations are incorporated into the decomposition tool DESIJ. The presented techniques are successfully applied to some benchmarks - including 'real-life' specifications arising in the context of control resynthesis - which delivered promising results.
Die stetige Weiterentwicklung von VR-Systemen bietet neue Möglichkeiten der Interaktion mit virtuellen Objekten im dreidimensionalen Raum, stellt Entwickelnde von VRAnwendungen aber auch vor neue Herausforderungen. Selektions- und Manipulationstechniken müssen unter Berücksichtigung des Anwendungsszenarios, der Zielgruppe und der zur Verfügung stehenden Ein- und Ausgabegeräte ausgewählt werden. Diese Arbeit leistet einen Beitrag dazu, die Auswahl von passenden Interaktionstechniken zu unterstützen. Hierfür wurde eine repräsentative Menge von Selektions- und Manipulationstechniken untersucht und, unter Berücksichtigung existierender Klassifikationssysteme, eine Taxonomie entwickelt, die die Analyse der Techniken hinsichtlich interaktionsrelevanter Eigenschaften ermöglicht. Auf Basis dieser Taxonomie wurden Techniken ausgewählt, die in einer explorativen Studie verglichen wurden, um Rückschlüsse auf die Dimensionen der Taxonomie zu ziehen und neue Indizien für Vor- und Nachteile der Techniken in spezifischen Anwendungsszenarien zu generieren. Die Ergebnisse der Arbeit münden in eine Webanwendung, die Entwickelnde von VR-Anwendungen gezielt dabei unterstützt, passende Selektions- und Manipulationstechniken für ein Anwendungsszenario auszuwählen, indem Techniken auf Basis der Taxonomie gefiltert und unter Verwendung der Resultate aus der Studie sortiert werden können.
Intuitive Modelle der Informatik sind gedankliche Vorstellungen über informatische Konzepte, die mit subjektiver Gewissheit verbunden sind. Menschen verwenden sie, wenn sie die Arbeitsweise von Computerprogrammen nachvollziehen oder anderen erklären, die logische Korrektheit eines Programms prüfen oder in einem kreativen Prozess selbst Programme entwickeln. Intuitive Modelle können auf verschiedene Weise repräsentiert und kommuniziert werden, etwa verbal-abstrakt, durch ablauf- oder strukturorientierte Abbildungen und Filme oder konkrete Beispiele. Diskutiert werden in dieser Arbeit grundlegende intuitive Modelle für folgende inhaltliche Aspekte einer Programmausführung: Allokation von Aktivität bei einer Programmausführung, Benennung von Entitäten, Daten, Funktionen, Verarbeitung, Kontrollstrukturen zur Steuerung von Programmläufen, Rekursion, Klassen und Objekte. Mit Hilfe eines Systems von Online-Spielen, der Python Visual Sandbox, werden die psychische Realität verschiedener intuitiver Modelle bei Programmieranfängern nachgewiesen und fehlerhafte Anwendungen (Fehlvorstellungen) identifiziert.
Business Process Management (BPM) emerged as a means to control, analyse, and optimise business operations. Conceptual models are of central importance for BPM. Most prominently, process models define the behaviour that is performed to achieve a business value. In essence, a process model is a mapping of properties of the original business process to the model, created for a purpose. Different modelling purposes, therefore, result in different models of a business process. Against this background, the misalignment of process models often observed in the field of BPM is no surprise. Even if the same business scenario is considered, models created for strategic decision making differ in content significantly from models created for process automation. Despite their differences, process models that refer to the same business process should be consistent, i.e., free of contradictions. Apparently, there is a trade-off between strictness of a notion of consistency and appropriateness of process models serving different purposes. Existing work on consistency analysis builds upon behaviour equivalences and hierarchical refinements between process models. Hence, these approaches are computationally hard and do not offer the flexibility to gradually relax consistency requirements towards a certain setting. This thesis presents a framework for the analysis of behaviour consistency that takes a fundamentally different approach. As a first step, an alignment between corresponding elements of related process models is constructed. Then, this thesis conducts behavioural analysis grounded on a relational abstraction of the behaviour of a process model, its behavioural profile. Different variants of these profiles are proposed, along with efficient computation techniques for a broad class of process models. Using behavioural profiles, consistency of an alignment between process models is judged by different notions and measures. The consistency measures are also adjusted to assess conformance of process logs that capture the observed execution of a process. Further, this thesis proposes various complementary techniques to support consistency management. It elaborates on how to implement consistent change propagation between process models, addresses the exploration of behavioural commonalities and differences, and proposes a model synthesis for behavioural profiles.