Refine
Year of publication
- 2012 (22) (remove)
Document Type
- Monograph/Edited Volume (22) (remove)
Language
- English (22) (remove)
Is part of the Bibliography
- yes (22)
Keywords
- AUTOSAR (2)
- Data Integration (2)
- Datenintegration (2)
- Model Synchronisation (2)
- Model Transformation (2)
- SysML (2)
- Tripel-Graph-Grammatik (2)
- Absolute Advantage (1)
- Absoluter Kostenvorteil (1)
- Adam Smith (1)
- Aspect-oriented Programming (1)
- Association Rule Mining (1)
- Assoziationsregeln (1)
- Außenhandel (1)
- Bedingte Inklusionsabhängigkeiten (1)
- Comparative Advantage (1)
- Conditional Inclusion Dependency (1)
- Context-oriented Programming (1)
- Curriculum Framework (1)
- Cyber-Physical-Systeme (1)
- Cyber-physical-systems (1)
- Data Dependency (1)
- Data Quality (1)
- Data Warehouse (1)
- Datenabhängigkeiten (1)
- Datenqualität (1)
- Duplicate Detection (1)
- Duplikaterkennung (1)
- Echtzeitsysteme (1)
- Erkennen von Meta-Daten (1)
- European values education (1)
- Europäische Werteerziehung (1)
- Extract-Transform-Load (ETL) (1)
- Formale Verifikation (1)
- Information Systems (1)
- Informationssysteme (1)
- International trade (1)
- JCop (1)
- Java (1)
- Kognitionswissenschaften (1)
- Komparativer Kostenvorteil (1)
- Lehrevaluation (1)
- Link Discovery (1)
- Link-Entdeckung (1)
- Linked Open Data (1)
- Mehrkernsysteme (1)
- Metadata Discovery (1)
- Model Synchronization (1)
- Modellierung (1)
- Programming Languages (1)
- Prozessoren (1)
- Psycholinguistik (1)
- Psychologie (1)
- Psychology (1)
- Quantitative Analysen (1)
- Religion (1)
- Service-orientierte Systme (1)
- Studierendenaustausch (1)
- Triple Graph Grammar (1)
- Unterrichtseinheiten (1)
- Verbindungsnetzwerke (1)
- Verifikation (1)
- Virtualisierung (1)
- Worterkennung (1)
- beschreibende Feldstudie (1)
- cognitive sciences (1)
- curriculum framework (1)
- formal verification methods (1)
- human language processing (1)
- hybrid graph-transformation-systems (1)
- hybride Graph-Transformations-Systeme (1)
- interconnect (1)
- lesson evaluation (1)
- lexical databases (1)
- lexikalische Datenbanken (1)
- many-core (1)
- menschliche Sprachverarbeitung (1)
- modeling (1)
- modellgetriebene Entwicklung (1)
- multi-core (1)
- parallel computing (1)
- paralleles Rechnen (1)
- processor hardware (1)
- psycholinguistics (1)
- quantitative analysis (1)
- real-time systems (1)
- religion (1)
- runtime adaptations (1)
- service-oriented systems (1)
- student exchange (1)
- teaching units (1)
- verification (1)
- virtualization (1)
- word recognition (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (9)
- Institut für Romanistik (4)
- Institut für Informatik und Computational Science (2)
- Strukturbereich Kognitionswissenschaften (2)
- Extern (1)
- Institut für Anglistik und Amerikanistik (1)
- Institut für Umweltwissenschaften und Geographie (1)
- WeltTrends e.V. Potsdam (1)
- Wirtschaftswissenschaften (1)
- Öffentliches Recht (1)
In continuation of a successful series of events, the 4th Many-core Applications Research Community (MARC) symposium took place at the HPI in Potsdam on December 8th and 9th 2011. Over 60 researchers from different fields presented their work on many-core hardware architectures, their programming models, and the resulting research questions for the upcoming generation of heterogeneous parallel systems.
Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons).