Refine
Has Fulltext
- yes (8) (remove)
Document Type
Language
- English (8) (remove)
Is part of the Bibliography
- yes (8)
Keywords
- Datenintegration (3)
- Data Integration (2)
- data profiling (2)
- Apriori (1)
- Association Rule Mining (1)
- Assoziationsregeln (1)
- Bedingte Inklusionsabhängigkeiten (1)
- Conditional Inclusion Dependency (1)
- Data Dependency (1)
- Data Profiling (1)
- Data Quality (1)
- Data Warehouse (1)
- Datenabhängigkeiten (1)
- Datenanalyse (1)
- Datenqualität (1)
- Duplicate Detection (1)
- Duplikaterkennung (1)
- Erkennen von Meta-Daten (1)
- Extract-Transform-Load (ETL) (1)
- Forschungskolleg (1)
- Hasso Plattner Institute (1)
- Hasso-Plattner-Institut (1)
- Information Extraction (1)
- Information Systems (1)
- Informationsextraktion (1)
- Informationssysteme (1)
- Klausurtagung (1)
- Link Discovery (1)
- Link-Entdeckung (1)
- Linked Data (1)
- Linked Open Data (1)
- Metadata Discovery (1)
- Metadatenentdeckung (1)
- Metadatenqualität (1)
- Ph.D. Retreat (1)
- Research School (1)
- Schemaentdeckung (1)
- Schlüsselentdeckung (1)
- Service-oriented Systems Engineering (1)
- Wikipedia (1)
- apriori (1)
- data integration (1)
- eindeutig (1)
- functional dependency (1)
- funktionale Abhängigkeit (1)
- key discovery (1)
- metadata discovery (1)
- metadata quality (1)
- schema discovery (1)
- unique (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (8) (remove)
Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown GORDIAN algorithm and "Apriori-based" algorithms are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCAGORDIAN combines the advantages of GORDIAN and our new algorithm HCA, and it significantly outperforms all previous work in many situations.