004 Datenverarbeitung; Informatik
Refine
Has Fulltext
- yes (3)
Year of publication
- 2017 (3) (remove)
Document Type
- Monograph/Edited Volume (1)
- Doctoral Thesis (1)
- Postprint (1)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- Debugging (1)
- Gewinnung benannter Entitäten (1)
- Informationsextraktion (1)
- Interpreter (1)
- Programmiererlebnis (1)
- Python (1)
- Ruby (1)
- Smalltalk (1)
- code generation (1)
- debugging (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering GmbH (3) (remove)
Squimera
(2017)
Software development tools that work and behave consistently across different programming languages are helpful for developers, because they do not have to familiarize themselves with new tooling whenever they decide to use a new language. Also, being able to combine multiple programming languages in a program increases reusability, as developers do not have to recreate software frameworks and libraries in the language they develop in and can reuse existing software instead.
However, developers often have a broad choice with regard to tools, some of which are designed for only one specific programming language. Various Integrated Development Environments have support for multiple languages, but are usually unable to provide a consistent programming experience due to different features of language runtimes. Furthermore, common mechanisms that allow reuse of software written in other languages usually use the operating system or a network connection as the abstract layer. Tools, however, often cannot support such indirections well and are therefore less useful in debugging scenarios for example.
In this report, we present a novel approach that aims to improve the programming experience with regard to working with multiple high-level programming languages. As part of this approach, we reuse the tools of a Smalltalk programming environment for other languages and build a multi-language virtual execution environment which is able to provide the same runtime capabilities for all languages.
The prototype system Squimera is an implementation of our approach and demonstrates that it is possible to reuse development tools, so that they behave in the same way across all supported programming languages. In addition, it provides convenient means to reuse and even mix software libraries and frameworks written in different languages without breaking the debugging experience.
With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.
These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.
In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately.
To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency.
For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
Advanced mechatronic systems have to integrate existing technologies from mechanical, electrical and software engineering. They must be able to adapt their structure and behavior at runtime by reconfiguration to react flexibly to changes in the environment. Therefore, a tight integration of structural and behavioral models of the different domains is required. This integration results in complex reconfigurable hybrid systems, the execution logic of which cannot be addressed directly with existing standard modeling, simulation, and code-generation techniques. We present in this paper how our component-based approach for reconfigurable mechatronic systems, M ECHATRONIC UML, efficiently handles the complex interplay of discrete behavior and continuous behavior in a modular manner. In addition, its extension to even more flexible reconfiguration cases is presented.