publish.UP Search

Refine

Has Fulltext

yes (2)
no (1)

3 search hits

1 to 3

Sort by

Effective and efficient similarity search in databases (2013)

Lange, Dustin

Given a large set of records in a database and a query record, similarity search aims to find all records sufficiently similar to the query record. To solve this problem, two main aspects need to be considered: First, to perform effective search, the set of relevant records is defined using a similarity measure. Second, an efficient access method is to be found that performs only few database accesses and comparisons using the similarity measure. This thesis solves both aspects with an emphasis on the latter. In the first part of this thesis, a frequency-aware similarity measure is introduced. Compared record pairs are partitioned according to frequencies of attribute values. For each partition, a different similarity measure is created: machine learning techniques combine a set of base similarity measures into an overall similarity measure. After that, a similarity index for string attributes is proposed, the State Set Index (SSI), which is based on a trie (prefix tree) that is interpreted as a nondeterministic finite automaton. For processing range queries, the notion of query plans is introduced in this thesis to describe which similarity indexes to access and which thresholds to apply. The query result should be as complete as possible under some cost threshold. Two query planning variants are introduced: (1) Static planning selects a plan at compile time that is used for all queries. (2) Query-specific planning selects a different plan for each query. For answering top-k queries, the Bulk Sorted Access Algorithm (BSA) is introduced, which retrieves large chunks of records from the similarity indexes using fixed thresholds, and which focuses its efforts on records that are ranked high in more than one attribute and thus promising candidates. The described components form a complete similarity search system. Based on prototypical implementations, this thesis shows comparative evaluation results for all proposed approaches on different real-world data sets, one of which is a large person data set from a German credit rating agency.

Ex-situ priors: A Bayesian hierarchical framework for defining informative prior distributions in hydrogeology (2019)

Cucchi, Karma ; Hesse, Falk ; Kawa, Nura ; Wang, Changhong ; Rubin, Yoram

Stochastic modeling is a common practice for modeling uncertainty in hydrogeology. In stochastic modeling, aquifer properties are characterized by their probability density functions (PDFs). The Bayesian approach for inverse modeling is often used to assimilate information from field measurements collected at a site into properties’ posterior PDFs. This necessitates the definition of a prior PDF, characterizing the knowledge of hydrological properties before undertaking any investigation at the site, and usually coming from previous studies at similar sites. In this paper, we introduce a Bayesian hierarchical algorithm capable of assimilating various information–like point measurements, bounds and moments–into a single, informative PDF that we call ex-situ prior. This informative PDF summarizes the ex-situ information available about a hydrogeological parameter at a site of interest, which can then be used as a prior PDF in future studies at the site. We demonstrate the behavior of the algorithm on several synthetic case studies, compare it to other methods described in the literature, and illustrate the approach by applying it to a public open-access hydrogeological dataset.

Teaching Data Management (2015)

Grillenberger, Andreas ; Romeike, Ralf

Data management is a central topic in computer science as well as in computer science education. Within the last years, this topic is changing tremendously, as its impact on daily life becomes increasingly visible. Nowadays, everyone not only needs to manage data of various kinds, but also continuously generates large amounts of data. In addition, Big Data and data analysis are intensively discussed in public dialogue because of their influences on society. For the understanding of such discussions and for being able to participate in them, fundamental knowledge on data management is necessary. Especially, being aware of the threats accompanying the ability to analyze large amounts of data in nearly real-time becomes increasingly important. This raises the question, which key competencies are necessary for daily dealings with data and data management. In this paper, we will first point out the importance of data management and of Big Data in daily life. On this basis, we will analyze which are the key competencies everyone needs concerning data management to be able to handle data in a proper way in daily life. Afterwards, we will discuss the impact of these changes in data management on computer science education and in particular database education.

1 to 3

Refine

Has Fulltext

Author

Year of publication

Document Type

Language

Is part of the Bibliography

Keywords

Institute

3 search hits