publish.UP Suchen

Filtern

Volltext vorhanden

ja (2)

2 Treffer

1 bis 2

Sortieren nach

Representation and curation of knowledge graphs with embeddings (2022)

Jain, Nitisha

Knowledge graphs are structured repositories of knowledge that store facts about the general world or a particular domain in terms of entities and their relationships. Owing to the heterogeneity of use cases that are served by them, there arises a need for the automated construction of domain- specific knowledge graphs from texts. While there have been many research efforts towards open information extraction for automated knowledge graph construction, these techniques do not perform well in domain-specific settings. Furthermore, regardless of whether they are constructed automatically from specific texts or based on real-world facts that are constantly evolving, all knowledge graphs inherently suffer from incompleteness as well as errors in the information they hold. This thesis investigates the challenges encountered during knowledge graph construction and proposes techniques for their curation (a.k.a. refinement) including the correction of semantic ambiguities and the completion of missing facts. Firstly, we leverage existing approaches for the automatic construction of a knowledge graph in the art domain with open information extraction techniques and analyse their limitations. In particular, we focus on the challenging task of named entity recognition for artwork titles and show empirical evidence of performance improvement with our proposed solution for the generation of annotated training data. Towards the curation of existing knowledge graphs, we identify the issue of polysemous relations that represent different semantics based on the context. Having concrete semantics for relations is important for downstream appli- cations (e.g. question answering) that are supported by knowledge graphs. Therefore, we define the novel task of finding fine-grained relation semantics in knowledge graphs and propose FineGReS, a data-driven technique that discovers potential sub-relations with fine-grained meaning from existing pol- ysemous relations. We leverage knowledge representation learning methods that generate low-dimensional vectors (or embeddings) for knowledge graphs to capture their semantics and structure. The efficacy and utility of the proposed technique are demonstrated by comparing it with several baselines on the entity classification use case. Further, we explore the semantic representations in knowledge graph embed- ding models. In the past decade, these models have shown state-of-the-art results for the task of link prediction in the context of knowledge graph comple- tion. In view of the popularity and widespread application of the embedding techniques not only for link prediction but also for different semantic tasks, this thesis presents a critical analysis of the embeddings by quantitatively measuring their semantic capabilities. We investigate and discuss the reasons for the shortcomings of embeddings in terms of the characteristics of the underlying knowledge graph datasets and the training techniques used by popular models. Following up on this, we propose ReasonKGE, a novel method for generating semantically enriched knowledge graph embeddings by taking into account the semantics of the facts that are encapsulated by an ontology accompanying the knowledge graph. With a targeted, reasoning-based method for generating negative samples during the training of the models, ReasonKGE is able to not only enhance the link prediction performance, but also reduce the number of semantically inconsistent predictions made by the resultant embeddings, thus improving the quality of knowledge graphs.

Knowledge base construction with machine learning methods (2021)

Loster, Michael

Modern knowledge bases contain and organize knowledge from many different topic areas. Apart from speciﬁc entity information, they also store information about their relationships amongst each other. Combining this information results in a knowledge graph that can be particularly helpful in cases where relationships are of central importance. Among other applications, modern risk assessment in the ﬁnancial sector can beneﬁt from the inherent network structure of such knowledge graphs by assessing the consequences and risks of certain events, such as corporate insolvencies or fraudulent behavior, based on the underlying network structure. As public knowledge bases often do not contain the necessary information for the analysis of such scenarios, the need arises to create and maintain dedicated domain-speciﬁc knowledge bases. This thesis investigates the process of creating domain-speciﬁc knowledge bases from structured and unstructured data sources. In particular, it addresses the topics of named entity recognition (NER), duplicate detection, and knowledge validation, which represent essential steps in the construction of knowledge bases. As such, we present a novel method for duplicate detection based on a Siamese neural network that is able to learn a dataset-speciﬁc similarity measure which is used to identify duplicates. Using the specialized network architecture, we design and implement a knowledge transfer between two deduplication networks, which leads to signiﬁcant performance improvements and a reduction of required training data. Furthermore, we propose a named entity recognition approach that is able to identify company names by integrating external knowledge in the form of dictionaries into the training process of a conditional random ﬁeld classiﬁer. In this context, we study the eﬀects of diﬀerent dictionaries on the performance of the NER classiﬁer. We show that both the inclusion of domain knowledge as well as the generation and use of alias names results in signiﬁcant performance improvements. For the validation of knowledge represented in a knowledge base, we introduce Colt, a framework for knowledge validation based on the interactive quality assessment of logical rules. In its most expressive implementation, we combine Gaussian processes with neural networks to create Colt-GP, an interactive algorithm for learning rule models. Unlike other approaches, Colt-GP uses knowledge graph embeddings and user feedback to cope with data quality issues of knowledge bases. The learned rule model can be used to conditionally apply a rule and assess its quality. Finally, we present CurEx, a prototypical system for building domain-speciﬁc knowledge bases from structured and unstructured data sources. Its modular design is based on scalable technologies, which, in addition to processing large datasets, ensures that the modules can be easily exchanged or extended. CurEx oﬀers multiple user interfaces, each tailored to the individual needs of a speciﬁc user group and is fully compatible with the Colt framework, which can be used as part of the system. We conduct a wide range of experiments with diﬀerent datasets to determine the strengths and weaknesses of the proposed methods. To ensure the validity of our results, we compare the proposed methods with competing approaches.

1 bis 2

Filtern

Volltext vorhanden

Autor*in

Erscheinungsjahr

Dokumenttyp

Sprache

Gehört zur Bibliographie

Schlagworte

Institut

2 Treffer