• search hit 1 of 1
Back to Result List

Frost: a platform for benchmarking and exploring data matching results

  • "Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential. Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform. It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand matching results. Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential. Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform. It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand matching results. Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching results, as shown in Figure 1.show moreshow less

Export metadata

Additional Services

Search Google Scholar Statistics
Metadaten
Author details:Martin Graf, Lukas LaskowskiORCiD, Florian Papsdorf, Florian Sold, Roland Gremmelspacher, Felix NaumannORCiDGND, Fabian Panse
DOI:https://doi.org/10.14778/3554821.3554823
ISSN:2150-8097
Title of parent work (English):Proceedings of the VLDB Endowment
Publisher:Association for Computing Machinery
Place of publishing:New York
Publication type:Article
Language:English
Date of first publication:2022/08/01
Publication year:2022
Release date:2024/08/12
Volume:15
Issue:12
Number of pages:14
First page:3292
Last Page:3305
Organizational units:Digital Engineering Fakultät / Hasso-Plattner-Institut für Digital Engineering GmbH
DDC classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Peer review:Referiert
Publishing method:Open Access / Hybrid Open-Access
License (German):License LogoCC-BY-NC-ND - Namensnennung, nicht kommerziell, keine Bearbeitungen 4.0 International
Accept ✔
This website uses technically necessary session cookies. By continuing to use the website, you agree to this. You can find our privacy policy here.