004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (148)
- Doctoral Thesis (122)
- Monograph/Edited Volume (119)
- Postprint (49)
- Conference Proceeding (22)
- Other (5)
- Part of a Book (2)
- Bachelor Thesis (1)
- Habilitation Thesis (1)
- Master's Thesis (1)
Language
- English (471) (remove)
Is part of the Bibliography
- yes (471) (remove)
Keywords
- machine learning (18)
- answer set programming (12)
- Hasso-Plattner-Institut (9)
- cloud computing (9)
- maschinelles Lernen (9)
- Cloud Computing (8)
- Forschungskolleg (8)
- Forschungsprojekte (8)
- Future SOC Lab (8)
- Hasso Plattner Institute (8)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (162)
- Institut für Informatik und Computational Science (117)
- Hasso-Plattner-Institut für Digital Engineering GmbH (102)
- Fachgruppe Betriebswirtschaftslehre (25)
- Mathematisch-Naturwissenschaftliche Fakultät (23)
- Wirtschaftswissenschaften (17)
- Extern (13)
- Institut für Physik und Astronomie (8)
- Digital Engineering Fakultät (7)
- Institut für Mathematik (6)
- Department Erziehungswissenschaft (4)
- Department Linguistik (4)
- Institut für Biochemie und Biologie (4)
- Institut für Umweltwissenschaften und Geographie (3)
- Department Sport- und Gesundheitswissenschaften (2)
- Sozialwissenschaften (2)
- Fachgruppe Politik- & Verwaltungswissenschaft (1)
- Fachgruppe Soziologie (1)
- Institut für Geowissenschaften (1)
- Potsdam Transfer - Zentrum für Gründung, Innovation, Wissens- und Technologietransfer (1)
- Theodor-Fontane-Archiv (1)
- Öffentliches Recht (1)
With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.
These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.
In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately.
To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency.
For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
This thesis is concerned with the solution of the blind source separation problem (BSS). The BSS problem occurs frequently in various scientific and technical applications. In essence, it consists in separating meaningful underlying components out of a mixture of a multitude of superimposed signals. In the recent research literature there are two related approaches to the BSS problem: The first is known as Independent Component Analysis (ICA), where the goal is to transform the data such that the components become as independent as possible. The second is based on the notion of diagonality of certain characteristic matrices derived from the data. Here the goal is to transform the matrices such that they become as diagonal as possible. In this thesis we study the latter method of approximate joint diagonalization (AJD) to achieve a solution of the BSS problem. After an introduction to the general setting, the thesis provides an overview on particular choices for the set of target matrices that can be used for BSS by joint diagonalization. As the main contribution of the thesis, new algorithms for approximate joint diagonalization of several matrices with non-orthogonal transformations are developed. These newly developed algorithms will be tested on synthetic benchmark datasets and compared to other previous diagonalization algorithms. Applications of the BSS methods to biomedical signal processing are discussed and exemplified with real-life data sets of multi-channel biomagnetic recordings.
The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
This technical report presents results of research projects executed in 2017. Selected projects have presented their results on April 25th and November 15th 2017 at the Future SOC Lab Day events.
Motivation:
Constraint-based modeling approaches allow the estimation of maximal in vivo enzyme catalytic rates that can serve as proxies for enzyme turnover numbers. Yet, genome-scale flux profiling remains a challenge in deploying these approaches to catalogue proxies for enzyme catalytic rates across organisms.
Results:
Here, we formulate a constraint-based approach, termed NIDLE-flux, to estimate fluxes at a genome-scale level by using the principle of efficient usage of expressed enzymes. Using proteomics data from Escherichia coli, we show that the fluxes estimated by NIDLE-flux and the existing approaches are in excellent qualitative agreement (Pearson correlation > 0.9). We also find that the maximal in vivo catalytic rates estimated by NIDLE-flux exhibits a Pearson correlation of 0.74 with in vitro enzyme turnover numbers. However, NIDLE-flux results in a 1.4-fold increase in the size of the estimated maximal in vivo catalytic rates in comparison to the contenders. Integration of the maximum in vivo catalytic rates with publically available proteomics and metabolomics data provide a better match to fluxes estimated by NIDLE-flux. Therefore, NIDLE-flux facilitates more effective usage of proteomics data to estimate proxies for kcatomes.
An increasing demand on functionality and flexibility leads to an integration of beforehand isolated system solutions building a so-called System of Systems (SoS). Furthermore, the overall SoS should be adaptive to react on changing requirements and environmental conditions. Due SoS are composed of different independent systems that may join or leave the overall SoS at arbitrary point in times, the SoS structure varies during the systems lifetime and the overall SoS behavior emerges from the capabilities of the contained subsystems. In such complex system ensembles new demands of understanding the interaction among subsystems, the coupling of shared system knowledge and the influence of local adaptation strategies to the overall resulting system behavior arise. In this report, we formulate research questions with the focus of modeling interactions between system parts inside a SoS. Furthermore, we define our notion of important system types and terms by retrieving the current state of the art from literature. Having a common understanding of SoS, we discuss a set of typical SoS characteristics and derive general requirements for a collaboration modeling language. Additionally, we retrieve a broad spectrum of real scenarios and frameworks from literature and discuss how these scenarios cope with different characteristics of SoS. Finally, we discuss the state of the art for existing modeling languages that cope with collaborations for different system types such as SoS.
Recently, due to an increasing demand on functionality and flexibility, beforehand isolated systems have become interconnected to gain powerful adaptive Systems of Systems (SoS) solutions with an overall robust, flexible and emergent behavior. The adaptive SoS comprises a variety of different system types ranging from small embedded to adaptive cyber-physical systems. On the one hand, each system is independent, follows a local strategy and optimizes its behavior to reach its goals. On the other hand, systems must cooperate with each other to enrich the overall functionality to jointly perform on the SoS level reaching global goals, which cannot be satisfied by one system alone. Due to difficulties of local and global behavior optimizations conflicts may arise between systems that have to be solved by the adaptive SoS.
This thesis proposes a modeling language that facilitates the description of an adaptive SoS by considering the adaptation capabilities in form of feedback loops as first class entities. Moreover, this thesis adopts the Models@runtime approach to integrate the available knowledge in the systems as runtime models into the modeled adaptation logic. Furthermore, the modeling language focuses on the description of system interactions within the adaptive SoS to reason about individual system functionality and how it emerges via collaborations to an overall joint SoS behavior. Therefore, the modeling language approach enables the specification of local adaptive system behavior, the integration of knowledge in form of runtime models and the joint interactions via collaboration to place the available adaptive behavior in an overall layered, adaptive SoS architecture.
Beside the modeling language, this thesis proposes analysis rules to investigate the modeled adaptive SoS, which enables the detection of architectural patterns as well as design flaws and pinpoints to possible system threats. Moreover, a simulation framework is presented, which allows the direct execution of the modeled SoS architecture. Therefore, the analysis rules and the simulation framework can be used to verify the interplay between systems as well as the modeled adaptation effects within the SoS. This thesis realizes the proposed concepts of the modeling language by mapping them to a state of the art standard from the automotive domain and thus, showing their applicability to actual systems. Finally, the modeling language approach is evaluated by remodeling up to date research scenarios from different domains, which demonstrates that the modeling language concepts are powerful enough to cope with a broad range of existing research problems.
While the role of and consequences of being a bystander to face-to-face bullying has received some attention in the literature, to date, little is known about the effects of being a bystander to cyberbullying. It is also unknown how empathy might impact the negative consequences associated with being a bystander of cyberbullying. The present study focused on examining the longitudinal association between bystander of cyberbullying depression, and anxiety, and the moderating role of empathy in the relationship between bystander of cyberbullying and subsequent depression and anxiety. There were 1,090 adolescents (M-age = 12.19; 50% female) from the United States included at Time 1, and they completed questionnaires on empathy, cyberbullying roles (bystander, perpetrator, victim), depression, and anxiety. One year later, at Time 2, 1,067 adolescents (M-age = 13.76; 51% female) completed questionnaires on depression and anxiety. Results revealed a positive association between bystander of cyberbullying and depression and anxiety. Further, empathy moderated the positive relationship between bystander of cyberbullying and depression, but not for anxiety. Implications for intervention and prevention programs are discussed.
While the role of and consequences of being a bystander to face-to-face bullying has received some attention in the literature, to date, little is known about the effects of being a bystander to cyberbullying. It is also unknown how empathy might impact the negative consequences associated with being a bystander of cyberbullying. The present study focused on examining the longitudinal association between bystander of cyberbullying depression, and anxiety, and the moderating role of empathy in the relationship between bystander of cyberbullying and subsequent depression and anxiety. There were 1,090 adolescents (M-age = 12.19; 50% female) from the United States included at Time 1, and they completed questionnaires on empathy, cyberbullying roles (bystander, perpetrator, victim), depression, and anxiety. One year later, at Time 2, 1,067 adolescents (M-age = 13.76; 51% female) completed questionnaires on depression and anxiety. Results revealed a positive association between bystander of cyberbullying and depression and anxiety. Further, empathy moderated the positive relationship between bystander of cyberbullying and depression, but not for anxiety. Implications for intervention and prevention programs are discussed.
This thesis presents methods, techniques and tools for developing three-dimensional representations of tactical intelligence assessments. Techniques from GIScience are combined with crime mapping methods. The range of methods applied in this study provides spatio-temporal GIS analysis as well as 3D geovisualisation and GIS programming. The work presents methods to enhance digital three-dimensional city models with application specific thematic information. This information facilitates further geovisual analysis, for instance, estimations of urban risks exposure. Specific methods and workflows are developed to facilitate the integration of spatio-temporal crime scene analysis results into 3D tactical intelligence assessments. Analysis comprises hotspot identification with kernel-density-estimation techniques (KDE), LISA-based verification of KDE hotspots as well as geospatial hotspot area characterisation and repeat victimisation analysis. To visualise the findings of such extensive geospatial analysis, three-dimensional geovirtual environments are created. Workflows are developed to integrate analysis results into these environments and to combine them with additional geospatial data. The resulting 3D visualisations allow for an efficient communication of complex findings of geospatial crime scene analysis.