• Treffer 3 von 4
Zurück zur Trefferliste

Experience: Enhancing address matching with geocoding and similarity measure selection

  • Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieveGiven a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case.zeige mehrzeige weniger

Metadaten exportieren

Weitere Dienste

Suche bei Google Scholar Statistik - Anzahl der Zugriffe auf das Dokument
Metadaten
Verfasserangaben:Ioannis KoumarelasORCiDGND, Axel Kroschk, Clifford Mosley, Felix NaumannORCiDGND
DOI:https://doi.org/10.1145/3232852
ISSN:1936-1955
Titel des übergeordneten Werks (Englisch):Journal of Data and Information Quality
Verlag:Association for Computing Machinery
Verlagsort:New York
Publikationstyp:Wissenschaftlicher Artikel
Sprache:Englisch
Jahr der Erstveröffentlichung:2018
Erscheinungsjahr:2018
Datum der Freischaltung:07.10.2021
Freies Schlagwort / Tag:Address matching; address normalization; address parsing; conditional functional dependencies; duplicate detection; geocoding; geographic information systems; random forest; record linkage; similarity measures
Band:10
Ausgabe:2
Seitenanzahl:16
Erste Seite:1
Letzte Seite:16
Organisationseinheiten:An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Peer Review:Referiert
Verstanden ✔
Diese Webseite verwendet technisch erforderliche Session-Cookies. Durch die weitere Nutzung der Webseite stimmen Sie diesem zu. Unsere Datenschutzerklärung finden Sie hier.