TY - JOUR A1 - Koumarelas, Ioannis A1 - Kroschk, Axel A1 - Mosley, Clifford A1 - Naumann, Felix T1 - Experience: Enhancing address matching with geocoding and similarity measure selection JF - Journal of Data and Information Quality N2 - Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case. KW - Address matching KW - record linkage KW - duplicate detection KW - similarity measures KW - conditional functional dependencies KW - address normalization KW - address parsing KW - geocoding KW - geographic information systems KW - random forest Y1 - 2018 U6 - https://doi.org/10.1145/3232852 SN - 1936-1955 VL - 10 IS - 2 SP - 1 EP - 16 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Hellwig, Niels A1 - Tatti, Dylan A1 - Sartori, Giacomo A1 - Anschlag, Kerstin A1 - Graefe, Ulfert A1 - Egli, Markus A1 - Gobat, Jean-Michel A1 - Broll, Gabriele T1 - Modeling spatial patterns of humus forms in montane and subalpine forests BT - implications of local variability for upscaling JF - Sustainability N2 - Humus forms are a distinctive morphological indicator of soil organic matter decomposition. The spatial distribution of humus forms depends on environmental factors such as topography, climate and vegetation. In montane and subalpine forests, environmental influences show a high spatial heterogeneity, which is reflected by a high spatial variability of humus forms. This study aims at examining spatial patterns of humus forms and their dependence on the spatial scale in a high mountain forest environment (Val di Sole/Val di Rabbi, Trentino, Italian Alps). On the basis of the distributions of environmental covariates across the study area, we described humus forms at the local scale (six sampling sites), slope scale (60 sampling sites) and landscape scale (30 additional sampling sites). The local variability of humus forms was analyzed with regard to the ground cover type. At the slope and landscape scale, spatial patterns of humus forms were modeled applying random forests and ordinary kriging of the model residuals. The results indicate that the occurrence of the humus form classes Mull, Mullmoder, Moder, Amphi and Eroded Moder generally depends on the topographical position. Local-scale patterns are mostly related to micro-topography (local accumulation and erosion sites) and ground cover, whereas slope-scale patterns are mainly connected with slope exposure and elevation. Patterns at the landscape scale show a rather irregular distribution, as spatial models at this scale do not account for local to slope-scale variations of humus forms. Moreover, models at the slope scale perform distinctly better than at the landscape scale. In conclusion, the results of this study highlight that landscape-scale predictions of humus forms should be accompanied by local- and slope-scale studies in order to enhance the general understanding of humus form patterns. KW - soil organic matter decomposition KW - spatial modeling KW - random forest KW - multi-scale analysis KW - forest soils KW - Italian Alps Y1 - 2018 U6 - https://doi.org/10.3390/su11010048 SN - 2071-1050 VL - 11 IS - 1 PB - MDPI CY - Basel ER -