TY - BOOK A1 - Draisbach, Uwe A1 - Naumann, Felix A1 - Szott, Sascha A1 - Wonneberg, Oliver T1 - Adaptive windows for duplicate detection N2 - Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons). N2 - Duplikaterkennung beschreibt das Auffinden von mehreren Datensätzen, die das gleiche Realwelt-Objekt repräsentieren. Diese Aufgabe ist nicht trivial, da sich (i) die Datensätze geringfügig unterscheiden können, so dass Ähnlichkeitsmaße für einen paarweisen Vergleich benötigt werden, und (ii) aufgrund der Datenmenge ein vollständiger, paarweiser Vergleich nicht möglich ist. Zur Lösung des zweiten Problems existieren verschiedene Algorithmen, die die Datenmenge partitionieren und nur noch innerhalb der Partitionen Vergleiche durchführen. Einer dieser Algorithmen ist die Sorted-Neighborhood-Methode (SNM), welche Daten anhand eines Schlüssels sortiert und dann ein Fenster über die sortierten Daten schiebt. Vergleiche werden nur innerhalb dieses Fensters durchgeführt. Wir beschreiben verschiedene Variationen der Sorted-Neighborhood-Methode, die auf variierenden Fenstergrößen basieren. Diese Ansätze basieren auf der Intuition, dass Bereiche mit größerer und geringerer Ähnlichkeiten innerhalb der sortierten Datensätze existieren, für die entsprechend größere bzw. kleinere Fenstergrößen sinnvoll sind. Wir beschreiben und evaluieren verschiedene Adaptierungs-Strategien, von denen nachweislich einige bezüglich Effizienz besser sind als die originale Sorted-Neighborhood-Methode (gleiches Ergebnis bei weniger Vergleichen). T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 49 KW - Informationssysteme KW - Datenqualität KW - Datenintegration KW - Duplikaterkennung KW - Duplicate Detection KW - Data Quality KW - Data Integration KW - Information Systems Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-53007 SN - 978-3-86956-143-1 SN - 1613-5652 SN - 2191-1665 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK A1 - Abedjan, Ziawasch A1 - Naumann, Felix T1 - Advancing the discovery of unique column combinations N2 - Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown GORDIAN algorithm and "Apriori-based" algorithms are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCAGORDIAN combines the advantages of GORDIAN and our new algorithm HCA, and it significantly outperforms all previous work in many situations. N2 - Unique-Spaltenkombinationen sind Spaltenkombinationen einer Datenbanktabelle, die nur einzigartige Werte beinhalten. Das Finden von Unique-Spaltenkombinationen spielt sowohl eine wichtige Rolle im Bereich der Grundlagenforschung von Informationssystemen als auch in Anwendungsgebieten wie dem Datenmanagement und der Erkenntnisgewinnung aus Datenbeständen. Vorhandene Algorithmen, die dieses Problem angehen, sind entweder Brute-Force oder benötigen zu viel Hauptspeicher. Deshalb können diese Algorithmen nur auf kleine Datenmengen angewendet werden. In dieser Arbeit werden der bekannte GORDIAN-Algorithmus und Apriori-basierte Algorithmen zum Zwecke weiterer Optimierung analysiert. Wir verbessern die Apriori Algorithmen durch eine effiziente Kandidatengenerierung und Heuristikbasierten Kandidatenfilter. Eine Hybride Lösung, HCA-GORDIAN, kombiniert die Vorteile von GORDIAN und unserem neuen Algorithmus HCA, welche die bisherigen Algorithmen hinsichtlich der Effizienz in vielen Situationen übertrifft. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 51 KW - Apriori KW - eindeutig KW - funktionale Abhängigkeit KW - Schlüsselentdeckung KW - Data Profiling KW - apriori KW - unique KW - functional dependency KW - key discovery KW - data profiling Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-53564 SN - 978-3-86956-148-6 SN - 1613-5652 SN - 2191-1665 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - JOUR A1 - Abramowski, Attila A1 - Acero, F. A1 - Aharonian, Felix A. A1 - Benkhali, Faical Ait A1 - Akhperjanian, A. G. A1 - Angüner, Ekrem Oǧuzhan A1 - Anton, Gisela A1 - Balenderan, Shangkari A1 - Balzer, Arnim A1 - Barnacka, Anna A1 - Becherini, Yvonne A1 - Tjus, J. Becker A1 - Bernlöhr, K. A1 - Birsin, E. A1 - Bissaldi, E. A1 - Biteau, Jonathan A1 - Boisson, Catherine A1 - Bolmont, J. A1 - Bordas, Pol A1 - Brucker, J. A1 - Brun, Francois A1 - Brun, Pierre A1 - Bulik, Tomasz A1 - Carrigan, Svenja A1 - Casanova, Sabrina A1 - Cerruti, M. A1 - Chadwick, Paula M. A1 - Chalme-Calvet, R. A1 - Chaves, Ryan C. G. A1 - Cheesebrough, A. A1 - Chretien, M. A1 - Colafrancesco, Sergio A1 - Cologna, Gabriele A1 - Conrad, Jan A1 - Couturier, C. A1 - Dalton, M. A1 - Daniel, M. K. A1 - Davids, I. D. A1 - Degrange, B. A1 - Deil, C. A1 - deWilt, P. A1 - Dickinson, H. J. A1 - Djannati-Ataï, A. A1 - Domainko, W. A1 - Drury, L. O'C. A1 - Dubus, G. A1 - Dutson, K. A1 - Dyks, J. A1 - Dyrda, M. A1 - Edwards, T. A1 - Egberts, Kathrin A1 - Eger, P. A1 - Espigat, P. A1 - Farnier, C. A1 - Fegan, S. A1 - Feinstein, F. A1 - Fernandes, M. V. A1 - Fernandez, D. A1 - Fiasson, A. A1 - Fontaine, G. A1 - Foerster, A. A1 - Fuessling, M. A1 - Gajdus, M. A1 - Gallant, Y. A. A1 - Garrigoux, T. A1 - Gast, H. A1 - Giebels, B. A1 - Glicenstein, J. F. A1 - Goering, D. A1 - Grondin, M. -H. A1 - Grudzinska, M. A1 - Haeffner, S. A1 - Hague, J. D. A1 - Hahn, J. A1 - Harris, J. A1 - Heinzelmann, G. A1 - Henri, G. A1 - Hermann, G. A1 - Hervet, O. A1 - Hillert, A. A1 - Hinton, James Anthony A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Holler, Markus A1 - Horns, D. A1 - Jacholkowska, A. A1 - Jahn, C. A1 - Jamrozy, M. A1 - Janiak, M. A1 - Jankowsky, F. A1 - Jung, I. A1 - Kastendieck, M. A. A1 - Katarzynski, K. A1 - Katz, U. A1 - Kaufmann, S. A1 - Khelifi, B. A1 - Kieffer, M. A1 - Klepser, S. A1 - Klochkov, D. A1 - Kluzniak, W. A1 - Kneiske, T. A1 - Kolitzus, D. A1 - Komin, Nu. A1 - Kosack, K. A1 - Krakau, S. A1 - Krayzel, F. A1 - Krueger, P. P. A1 - Laffon, H. A1 - Lamanna, G. A1 - Lefaucheur, J. A1 - Lemoine-Goumard, M. A1 - Lenain, J-P. A1 - Lennarz, D. A1 - Lohse, T. A1 - Lopatin, A. A1 - Lu, C-C. A1 - Marandon, V. A1 - Marcowith, Alexandre A1 - Marx, R. A1 - Maurin, G. A1 - Maxted, N. A1 - Mayer, M. A1 - McComb, T. J. L. A1 - Medina, M. C. A1 - Mehault, J. A1 - Menzler, U. A1 - Meyer, M. A1 - Moderski, R. A1 - Mohamed, M. A1 - Moulin, Emmanuel A1 - Murach, T. A1 - Naumann, C. L. A1 - de Naurois, M. A1 - Nedbal, D. A1 - Niemiec, J. A1 - Nolan, S. J. A1 - Oakes, L. A1 - Ohm, S. A1 - Wilhelmi, E. de Ona A1 - Opitz, B. A1 - Ostrowski, M. A1 - Oya, I. A1 - Panter, M. A1 - Parsons, R. D. A1 - Arribas, M. Paz A1 - Pekeur, N. W. A1 - Pelletier, G. A1 - Perez, J. A1 - Petrucci, P-O. A1 - Peyaud, B. A1 - Pita, S. A1 - Poon, H. A1 - Puehlhofer, G. A1 - Punch, M. A1 - Quirrenbach, A. A1 - Raab, S. A1 - Raue, M. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - de los Reyes, R. A1 - Rieger, F. A1 - Rob, L. A1 - Rosier-Lees, S. A1 - Rowell, G. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Sahakian, V. A1 - Sanchez, David M. A1 - Santangelo, Andrea A1 - Schlickeiser, R. A1 - Schuessler, F. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schwemmer, S. A1 - Sol, H. A1 - Spengler, G. A1 - Spiess, F. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Stinzing, F. A1 - Stycz, K. A1 - Sushch, Iurii A1 - Szostek, A. A1 - Tavernet, J-P. A1 - Terrier, R. A1 - Tluczykont, M. A1 - Trichard, C. A1 - Valerius, K. A1 - van Eldik, C. A1 - Vasileiadis, G. A1 - Venter, C. A1 - Viana, A. A1 - Vincent, P. A1 - Voelk, H. J. A1 - Volpe, F. A1 - Vorster, M. A1 - Wagner, S. J. A1 - Wagner, P. A1 - Ward, M. A1 - Weidinger, M. A1 - Weitzel, Q. A1 - White, R. A1 - Wierzcholska, A. A1 - Willmann, P. A1 - Woernlein, A. A1 - Wouters, D. A1 - Zacharias, M. A1 - Zajczyk, A. A1 - Zdziarski, A. A. A1 - Zech, Alraune A1 - Zechlin, H-S. T1 - Constraints on axionlike particles with HESS from the irregularity of the PKS 2155-304 energy spectrum JF - Physical review : D, Particles, fields, gravitation, and cosmology N2 - Axionlike particles (ALPs) are hypothetical light (sub-eV) bosons predicted in some extensions of the Standard Model of particle physics. In astrophysical environments comprising high-energy gamma rays and turbulent magnetic fields, the existence of ALPs can modify the energy spectrum of the gamma rays for a sufficiently large coupling between ALPs and photons. This modification would take the form of an irregular behavior of the energy spectrum in a limited energy range. Data from the H. E. S. S. observations of the distant BL Lac object PKS 2155 - 304 (z = 0.116) are used to derive upper limits at the 95% C. L. on the strength of the ALP coupling to photons, g(gamma a) < 2.1 x 10(-11) GeV-1 for an ALP mass between 15 and 60 neV. The results depend on assumptions on the magnetic field around the source, which are chosen conservatively. The derived constraints apply to both light pseudoscalar and scalar bosons that couple to the electromagnetic field. Y1 - 2013 U6 - https://doi.org/10.1103/PhysRevD.88.102003 SN - 1550-7998 SN - 1550-2368 VL - 88 IS - 10 PB - American Physical Society CY - College Park ER - TY - BOOK A1 - Bauckmann, Jana A1 - Abedjan, Ziawasch A1 - Leser, Ulf A1 - Müller, Heiko A1 - Naumann, Felix T1 - Covering or complete? : Discovering conditional inclusion dependencies N2 - Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case. N2 - Datenabhängigkeiten (wie zum Beispiel Integritätsbedingungen), werden verwendet, um die Qualität eines Datenbankschemas zu erhöhen, um Anfragen zu optimieren und um Konsistenz in einer Datenbank sicherzustellen. In den letzten Jahren wurden bedingte Abhängigkeiten (conditional dependencies) vorgestellt, die die Qualität von Daten analysieren und verbessern sollen. Eine bedingte Abhängigkeit ist eine Abhängigkeit mit begrenztem Gültigkeitsbereich, der über Bedingungen auf einem oder mehreren Attributen definiert wird. In diesem Bericht betrachten wir bedingte Inklusionsabhängigkeiten (conditional inclusion dependencies; CINDs). Wir generalisieren die Definition von CINDs anhand der Unterscheidung von überdeckenden (covering) und vollständigen (completeness) Bedingungen. Wir stellen einen Anwendungsfall für solche CINDs vor, der den Nutzen von CINDs bei der Lösung komplexer Datenqualitätsprobleme aufzeigt. Darüber hinaus definieren wir Qualitätsmaße für Bedingungen basierend auf Sensitivität und Genauigkeit. Wir stellen effiziente Algorithmen vor, die überdeckende und vollständige Bedingungen innerhalb vorgegebener Schwellwerte finden. Unsere Algorithmen wählen nicht nur die Werte der Bedingungen, sondern finden auch die Bedingungsattribute automatisch. Abschließend zeigen wir, dass unser Ansatz effizient sinnvolle und hilfreiche Ergebnisse für den vorgestellten Anwendungsfall liefert. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 62 KW - Datenabhängigkeiten KW - Bedingte Inklusionsabhängigkeiten KW - Erkennen von Meta-Daten KW - Linked Open Data KW - Link-Entdeckung KW - Assoziationsregeln KW - Data Dependency KW - Conditional Inclusion Dependency KW - Metadata Discovery KW - Linked Open Data KW - Link Discovery KW - Association Rule Mining Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-62089 SN - 978-3-86956-212-4 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - GEN A1 - Loster, Michael A1 - Naumann, Felix A1 - Ehmueller, Jan A1 - Feldmann, Benjamin T1 - CurEx BT - a system for extracting, curating, and exploring domain-specific knowledge graphs from text T2 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management N2 - The integration of diverse structured and unstructured information sources into a unified, domain-specific knowledge base is an important task in many areas. A well-maintained knowledge base enables data analysis in complex scenarios, such as risk analysis in the financial sector or investigating large data leaks, such as the Paradise or Panama papers. Both the creation of such knowledge bases, as well as their continuous maintenance and curation involves many complex tasks and considerable manual effort. With CurEx, we present a modular system that allows structured and unstructured data sources to be integrated into a domain-specific knowledge base. In particular, we (i) enable the incremental improvement of each individual integration component; (ii) enable the selective generation of multiple knowledge graphs from the information contained in the knowledge base; and (iii) provide two distinct user interfaces tailored to the needs of data engineers and end-users respectively. The former has curation capabilities and controls the integration process, whereas the latter focuses on the exploration of the generated knowledge graph. Y1 - 2018 SN - 978-1-4503-6014-2 U6 - https://doi.org/10.1145/3269206.3269229 SP - 1883 EP - 1886 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Koßmann, Jan A1 - Papenbrock, Thorsten A1 - Naumann, Felix T1 - Data dependencies for query optimization BT - a survey JF - The VLDB journal : the international journal on very large data bases / publ. on behalf of the VLDB Endowment N2 - Effective query optimization is a core feature of any database management system. While most query optimization techniques make use of simple metadata, such as cardinalities and other basic statistics, other optimization techniques are based on more advanced metadata including data dependencies, such as functional, uniqueness, order, or inclusion dependencies. This survey provides an overview, intuitive descriptions, and classifications of query optimization and execution strategies that are enabled by data dependencies. We consider the most popular types of data dependencies and focus on optimization strategies that target the optimization of relational database queries. The survey supports database vendors to identify optimization opportunities as well as DBMS researchers to find related work and open research questions. KW - Query optimization KW - Query execution KW - Data dependencies KW - Data profiling KW - Unique column combinations KW - Functional dependencies KW - Order dependencies KW - Inclusion dependencies KW - Relational data KW - SQL Y1 - 2021 U6 - https://doi.org/10.1007/s00778-021-00676-3 SN - 1066-8888 SN - 0949-877X VL - 31 IS - 1 SP - 1 EP - 22 PB - Springer CY - Berlin ; Heidelberg ; New York ER - TY - JOUR A1 - Hameed, Mazhar A1 - Naumann, Felix T1 - Data Preparation BT - a survey of commercial tools JF - SIGMOD record N2 - Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day.
To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques. KW - data quality KW - data cleaning KW - data wrangling Y1 - 2020 U6 - https://doi.org/10.1145/3444831.3444835 SN - 0163-5808 SN - 1943-5835 VL - 49 IS - 3 SP - 18 EP - 29 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Koumarelas, Ioannis A1 - Jiang, Lan A1 - Naumann, Felix T1 - Data preparation for duplicate detection JF - Journal of data and information quality : (JDIQ) N2 - Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal. However, the particular error of duplicate records, where multiple records refer to the same entity, is usually eliminated independently with specialized techniques. Our work is the first to bring these two areas together by applying data preparation operations under a systematic approach prior to performing duplicate detection.
Our process workflow can be summarized as follows: It begins with the user providing as input a sample of the gold standard, the actual dataset, and optionally some constraints to domain-specific data preparations, such as address normalization. The preparation selection operates in two consecutive phases. First, to vastly reduce the search space of ineffective data preparations, decisions are made based on the improvement or worsening of pair similarities. Second, using the remaining data preparations an iterative leave-one-out classification process removes preparations one by one and determines the redundant preparations based on the achieved area under the precision-recall curve (AUC-PR). Using this workflow, we manage to improve the results of duplicate detection up to 19% in AUC-PR. KW - data preparation KW - data wrangling KW - record linkage KW - duplicate detection KW - similarity measures Y1 - 2020 U6 - https://doi.org/10.1145/3377878 SN - 1936-1955 SN - 1936-1963 VL - 12 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - BOOK A1 - Abedjan, Ziawasch A1 - Golab, Lukasz A1 - Naumann, Felix A1 - Papenbrock, Thorsten T1 - Data Profiling T3 - Synthesis lectures on data management, 52 Y1 - 2019 SN - 978-1-68173-446-0 PB - Morgan & Claypool Publishers CY - San Rafael ER - TY - JOUR A1 - Actis, M. A1 - Agnetta, G. A1 - Aharonian, Felix A. A1 - Akhperjanian, A. G. A1 - Aleksic, J. A1 - Aliu, E. A1 - Allan, D. A1 - Allekotte, I. A1 - Antico, F. A1 - Antonelli, L. A. A1 - Antoranz, P. A1 - Aravantinos, A. A1 - Arlen, T. A1 - Arnaldi, H. A1 - Artmann, S. A1 - Asano, K. A1 - Asorey, H. G. A1 - Baehr, J. A1 - Bais, A. A1 - Baixeras, C. A1 - Bajtlik, S. A1 - Balis, D. A1 - Bamba, A. A1 - Barbier, C. A1 - Barcelo, M. A1 - Barnacka, Anna A1 - Barnstedt, Jürgen A1 - de Almeida, U. Barres A1 - Barrio, J. A. A1 - Basso, S. A1 - Bastieri, D. A1 - Bauer, C. A1 - Becerra Gonzalez, J. A1 - Becherini, Yvonne A1 - Bechtol, K. C. A1 - Becker, J. A1 - Beckmann, Volker A1 - Bednarek, W. A1 - Behera, B. A1 - Beilicke, M. A1 - Belluso, M. A1 - Benallou, M. A1 - Benbow, W. A1 - Berdugo, J. A1 - Berger, K. A1 - Bernardino, T. A1 - Bernlöhr, K. A1 - Biland, A. A1 - Billotta, S. A1 - Bird, T. A1 - Birsin, E. A1 - Bissaldi, E. A1 - Blake, S. A1 - Blanch Bigas, O. A1 - Bobkov, A. A. A1 - Bogacz, L. A1 - Bogdan, M. A1 - Boisson, Catherine A1 - Boix Gargallo, J. A1 - Bolmont, J. A1 - Bonanno, G. A1 - Bonardi, A. A1 - Bonev, T. A1 - Borkowski, Janett A1 - Botner, O. A1 - Bottani, A. A1 - Bourgeat, M. A1 - Boutonnet, C. A1 - Bouvier, A. A1 - Brau-Nogue, S. A1 - Braun, I. A1 - Bretz, T. A1 - Briggs, M. S. A1 - Brun, Pierre A1 - Brunetti, L. A1 - Buckley, H. A1 - Bugaev, V. A1 - Buehler, R. A1 - Bulik, Tomasz A1 - Busetto, G. A1 - Buson, S. A1 - Byrum, K. A1 - Cailles, M. A1 - Cameron, R. A. A1 - Canestrari, R. A1 - Cantu, S. A1 - Carmona, E. A1 - Carosi, A. A1 - Carr, John A1 - Carton, P. H. A1 - Casiraghi, M. A1 - Castarede, H. A1 - Catalano, O. A1 - Cavazzani, S. A1 - Cazaux, S. A1 - Cerruti, B. A1 - Cerruti, M. A1 - Chadwick, M. A1 - Chiang, J. A1 - Chikawa, M. A1 - Cieslar, M. A1 - Ciesielska, M. A1 - Cillis, A. N. A1 - Clerc, C. A1 - Colin, P. A1 - Colome, J. A1 - Compin, M. A1 - Conconi, P. A1 - Connaughton, V. A1 - Conrad, Jan A1 - Contreras, J. L. A1 - Coppi, P. A1 - Corlier, M. A1 - Corona, P. A1 - Corpace, O. A1 - Corti, D. A1 - Cortina, J. A1 - Costantini, H. A1 - Cotter, G. A1 - Courty, B. A1 - Couturier, S. A1 - Covino, S. A1 - Croston, J. A1 - Cusumano, G. A1 - Daniel, M. K. A1 - Dazzi, F. A1 - Deangelis, A. A1 - de Cea del Pozo, E. A1 - Dal Pino, E. M. de Gouveia A1 - de Jager, O. A1 - de la Calle Perez, I. A1 - De La Vega, G. A1 - De Lotto, B. A1 - de Naurois, M. A1 - Wilhelmi, E. de Ona A1 - de Souza, V. A1 - Decerprit, B. A1 - Deil, C. A1 - Delagnes, E. A1 - Deleglise, G. A1 - Delgado, C. A1 - Dettlaff, T. A1 - Di Paolo, A. A1 - Di Pierro, F. A1 - Diaz, C. A1 - Dick, J. A1 - Dickinson, H. A1 - Digel, S. W. A1 - Dimitrov, D. A1 - Disset, G. A1 - Djannati-Ataï, A. A1 - Doert, M. A1 - Domainko, W. A1 - Dorner, D. A1 - Doro, M. A1 - Dournaux, J. -L. A1 - Dravins, D. A1 - Drury, L. A1 - Dubois, F. A1 - Dubois, R. A1 - Dubus, G. A1 - Dufour, C. A1 - Durand, D. A1 - Dyks, J. A1 - Dyrda, M. A1 - Edy, E. A1 - Egberts, Kathrin A1 - Eleftheriadis, C. A1 - Elles, S. A1 - Emmanoulopoulos, D. A1 - Enomoto, R. A1 - Ernenwein, J. -P. A1 - Errando, M. A1 - Etchegoyen, A. A1 - Falcone, A. D. A1 - Farakos, K. A1 - Farnier, C. A1 - Federici, S. A1 - Feinstein, F. A1 - Ferenc, D. A1 - Fillin-Martino, E. A1 - Fink, D. A1 - Finley, C. A1 - Finley, J. P. A1 - Firpo, R. A1 - Florin, D. A1 - Foehr, C. A1 - Fokitis, E. A1 - Font, Ll. A1 - Fontaine, G. A1 - Fontana, A. A1 - Foerster, A. A1 - Fortson, L. A1 - Fouque, N. A1 - Fransson, C. A1 - Fraser, G. W. A1 - Fresnillo, L. A1 - Fruck, C. A1 - Fujita, Y. A1 - Fukazawa, Y. A1 - Funk, S. A1 - Gaebele, W. A1 - Gabici, S. A1 - Gadola, A. A1 - Galante, N. A1 - Gallant, Y. A1 - Garcia, B. A1 - Garcia Lopez, R. J. A1 - Garrido, D. A1 - Garrido, L. A1 - Gascon, D. A1 - Gasq, C. A1 - Gaug, M. A1 - Gaweda, J. A1 - Geffroy, N. A1 - Ghag, C. A1 - Ghedina, A. A1 - Ghigo, M. A1 - Gianakaki, E. A1 - Giarrusso, S. A1 - Giavitto, G. A1 - Giebels, B. A1 - Giro, E. A1 - Giubilato, P. A1 - Glanzman, T. A1 - Glicenstein, J. -F. A1 - Gochna, M. A1 - Golev, V. A1 - Gomez Berisso, M. A1 - Gonzalez, A. A1 - Gonzalez, F. A1 - Granena, F. A1 - Graciani, R. A1 - Granot, J. A1 - Gredig, R. A1 - Green, A. A1 - Greenshaw, T. A1 - Grimm, O. A1 - Grube, J. A1 - Grudzinska, M. A1 - Grygorczuk, J. A1 - Guarino, V. A1 - Guglielmi, L. A1 - Guilloux, F. A1 - Gunji, S. A1 - Gyuk, G. A1 - Hadasch, D. A1 - Haefner, D. A1 - Hagiwara, R. A1 - Hahn, J. A1 - Hallgren, A. A1 - Hara, S. A1 - Hardcastle, M. J. A1 - Hassan, T. A1 - Haubold, T. A1 - Hauser, M. A1 - Hayashida, M. A1 - Heller, R. A1 - Henri, G. A1 - Hermann, G. A1 - Herrero, A. A1 - Hinton, James Anthony A1 - Hoffmann, D. A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Horns, D. A1 - Hrupec, D. A1 - Huan, H. A1 - Huber, B. A1 - Huet, J. -M. A1 - Hughes, G. A1 - Hultquist, K. A1 - Humensky, T. B. A1 - Huppert, J. -F. A1 - Ibarra, A. A1 - Illa, J. M. A1 - Ingjald, J. A1 - Inoue, S. A1 - Inoue, Y. A1 - Ioka, K. A1 - Jablonski, C. A1 - Jacholkowska, A. A1 - Janiak, M. A1 - Jean, P. A1 - Jensen, H. A1 - Jogler, T. A1 - Jung, I. A1 - Kaaret, P. A1 - Kabuki, S. A1 - Kakuwa, J. A1 - Kalkuhl, C. A1 - Kankanyan, R. A1 - Kapala, M. A1 - Karastergiou, A. A1 - Karczewski, M. A1 - Karkar, S. A1 - Karlsson, N. A1 - Kasperek, J. A1 - Katagiri, H. A1 - Katarzynski, K. A1 - Kawanaka, N. A1 - Kedziora, B. A1 - Kendziorra, E. A1 - Khelifi, B. A1 - Kieda, D. A1 - Kifune, T. A1 - Kihm, T. A1 - Klepser, S. A1 - Kluzniak, W. A1 - Knapp, J. A1 - Knappy, A. R. A1 - Kneiske, T. A1 - Knoedlseder, J. A1 - Koeck, F. A1 - Kodani, K. A1 - Kohri, K. A1 - Kokkotas, K. A1 - Komin, N. A1 - Konopelko, A. A1 - Kosack, K. A1 - Kossakowski, R. A1 - Kostka, P. A1 - Kotula, J. A1 - Kowal, G. A1 - Koziol, J. A1 - Kraehenbuehl, T. A1 - Krause, J. A1 - Krawczynski, H. A1 - Krennrich, F. A1 - Kretzschmann, A. A1 - Kubo, H. A1 - Kudryavtsev, V. A. A1 - Kushida, J. A1 - La Barbera, N. A1 - La Parola, V. A1 - La Rosa, G. A1 - Lopez, A. A1 - Lamanna, G. A1 - Laporte, P. A1 - Lavalley, C. A1 - Le Flour, T. A1 - Le Padellec, A. A1 - Lenain, J. -P. A1 - Lessio, L. A1 - Lieunard, B. A1 - Lindfors, E. A1 - Liolios, A. A1 - Lohse, T. A1 - Lombardi, S. A1 - Lopatin, A. A1 - Lorenz, E. A1 - Lubinski, P. A1 - Luz, O. A1 - Lyard, E. A1 - Maccarone, M. C. A1 - Maccarone, T. A1 - Maier, G. A1 - Majumdar, P. A1 - Maltezos, S. A1 - Malkiewicz, P. A1 - Mana, C. A1 - Manalaysay, A. A1 - Maneva, G. A1 - Mangano, A. A1 - Manigot, P. A1 - Marin, J. A1 - Mariotti, M. A1 - Markoff, S. A1 - Martinez, G. A1 - Martinez, M. A1 - Mastichiadis, A. A1 - Matsumoto, H. A1 - Mattiazzo, S. A1 - Mazin, D. A1 - McComb, T. J. L. A1 - McCubbin, N. A1 - McHardy, I. A1 - Medina, C. A1 - Melkumyan, D. A1 - Mendes, A. A1 - Mertsch, P. A1 - Meucci, M. A1 - Michalowski, J. A1 - Micolon, P. A1 - Mineo, T. A1 - Mirabal, N. A1 - Mirabel, F. A1 - Miranda, J. M. A1 - Mirzoyan, R. A1 - Mizuno, T. A1 - Moal, B. A1 - Moderski, R. A1 - Molinari, E. A1 - Monteiro, I. A1 - Moralejo, A. A1 - Morello, C. A1 - Mori, K. A1 - Motta, G. A1 - Mottez, F. A1 - Moulin, Emmanuel A1 - Mukherjee, R. A1 - Munar, P. A1 - Muraishi, H. A1 - Murase, K. A1 - Murphy, A. Stj. A1 - Nagataki, S. A1 - Naito, T. A1 - Nakamori, T. A1 - Nakayama, K. A1 - Naumann, C. L. A1 - Naumann, D. A1 - Nayman, P. A1 - Nedbal, D. A1 - Niedzwiecki, A. A1 - Niemiec, J. A1 - Nikolaidis, A. A1 - Nishijima, K. A1 - Nolan, S. J. A1 - Nowak, N. A1 - O'Brien, P. T. A1 - Ochoa, I. A1 - Ohira, Y. A1 - Ohishi, M. A1 - Ohka, H. A1 - Okumura, A. A1 - Olivetto, C. A1 - Ong, R. A. A1 - Orito, R. A1 - Orr, M. A1 - Osborne, J. P. A1 - Ostrowski, M. A1 - Otero, L. A1 - Otte, A. N. A1 - Ovcharov, E. A1 - Oya, I. A1 - Ozieblo, A. A1 - Paiano, S. A1 - Pallota, J. A1 - Panazol, J. L. A1 - Paneque, D. A1 - Panter, M. A1 - Paoletti, R. A1 - Papyan, G. A1 - Paredes, J. M. A1 - Pareschi, G. A1 - Parsons, R. D. A1 - Arribas, M. Paz A1 - Pedaletti, G. A1 - Pepato, A. A1 - Persic, M. A1 - Petrucci, P. O. A1 - Peyaud, B. A1 - Piechocki, W. A1 - Pita, S. A1 - Pivato, G. A1 - Platos, L. A1 - Platzer, R. A1 - Pogosyan, L. A1 - Pohl, Martin A1 - Pojmanski, G. A1 - Ponz, J. D. A1 - Potter, W. A1 - Prandini, E. A1 - Preece, R. A1 - Prokoph, H. A1 - Puehlhofer, G. A1 - Punch, M. A1 - Quel, E. A1 - Quirrenbach, A. A1 - Rajda, P. A1 - Rando, R. A1 - Rataj, M. A1 - Raue, M. A1 - Reimann, C. A1 - Reimann, O. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - Renner, S. A1 - Reymond, J. -M. A1 - Rhode, W. A1 - Ribo, M. A1 - Ribordy, M. A1 - Rico, J. A1 - Rieger, F. A1 - Ringegni, P. A1 - Ripken, J. A1 - Ristori, P. A1 - Rivoire, S. A1 - Rob, L. A1 - Rodriguez, S. A1 - Roeser, U. A1 - Romano, Patrizia A1 - Romero, G. E. A1 - Rosier-Lees, S. A1 - Rovero, A. C. A1 - Roy, F. A1 - Royer, S. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Ruppel, J. A1 - Russo, F. A1 - Ryde, F. A1 - Sacco, B. A1 - Saggion, A. A1 - Sahakian, V. A1 - Saito, K. A1 - Saito, T. A1 - Sakaki, N. A1 - Salazar, E. A1 - Salini, A. A1 - Sanchez, F. A1 - Sanchez Conde, M. A. A1 - Santangelo, Andrea A1 - Santos, E. M. A1 - Sanuy, A. A1 - Sapozhnikov, L. A1 - Sarkar, S. A1 - Scalzotto, V. A1 - Scapin, V. A1 - Scarcioffolo, M. A1 - Schanz, T. A1 - Schlenstedt, S. A1 - Schlickeiser, R. A1 - Schmidt, T. A1 - Schmoll, J. A1 - Schroedter, M. A1 - Schultz, C. A1 - Schultze, J. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schweizer, T. A1 - Seiradakis, J. A1 - Selmane, S. A1 - Seweryn, K. A1 - Shayduk, M. A1 - Shellard, R. C. A1 - Shibata, T. A1 - Sikora, M. A1 - Silk, J. A1 - Sillanpaa, A. A1 - Sitarek, J. A1 - Skole, C. A1 - Smith, N. A1 - Sobczynska, D. A1 - Sofo Haro, M. A1 - Sol, H. A1 - Spanier, F. A1 - Spiga, D. A1 - Spyrou, S. A1 - Stamatescu, V. A1 - Stamerra, A. A1 - Starling, R. L. C. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Steiner, S. A1 - Stergioulas, N. A1 - Sternberger, R. A1 - Stinzing, F. A1 - Stodulski, M. A1 - Straumann, U. A1 - Suarez, A. A1 - Suchenek, M. A1 - Sugawara, R. A1 - Sulanke, K. H. A1 - Sun, S. A1 - Supanitsky, A. D. A1 - Sutcliffe, P. A1 - Szanecki, M. A1 - Szepieniec, T. A1 - Szostek, A. A1 - Szymkowiak, A. A1 - Tagliaferri, G. A1 - Tajima, H. A1 - Takahashi, H. A1 - Takahashi, K. A1 - Takalo, L. A1 - Takami, H. A1 - Talbot, R. G. A1 - Tam, P. H. A1 - Tanaka, M. A1 - Tanimori, T. A1 - Tavani, M. A1 - Tavernet, J. -P. A1 - Tchernin, C. A1 - Tejedor, L. A. A1 - Telezhinsky, Igor O. A1 - Temnikov, P. A1 - Tenzer, C. A1 - Terada, Y. A1 - Terrier, R. A1 - Teshima, M. A1 - Testa, V. A1 - Tibaldo, L. A1 - Tibolla, O. A1 - Tluczykont, M. A1 - Peixoto, C. J. Todero A1 - Tokanai, F. A1 - Tokarz, M. A1 - Toma, K. A1 - Torres, D. F. A1 - Tosti, G. A1 - Totani, T. A1 - Toussenel, F. A1 - Vallania, P. A1 - Vallejo, G. A1 - van der Walt, J. A1 - van Eldik, C. A1 - Vandenbroucke, J. A1 - Vankov, H. A1 - Vasileiadis, G. A1 - Vassiliev, V. V. A1 - Vegas, I. A1 - Venter, L. A1 - Vercellone, S. A1 - Veyssiere, C. A1 - Vialle, J. P. A1 - Videla, M. A1 - Vincent, P. A1 - Vink, J. A1 - Vlahakis, N. A1 - Vlahos, L. A1 - Vogler, P. A1 - Vollhardt, A. A1 - Volpe, F. A1 - Von Gunten, H. P. A1 - Vorobiov, S. A1 - Wagner, S. A1 - Wagner, R. M. A1 - Wagner, B. A1 - Wakely, S. P. A1 - Walter, P. A1 - Walter, R. A1 - Warwick, R. A1 - Wawer, P. A1 - Wawrzaszek, R. A1 - Webb, N. A1 - Wegner, P. A1 - Weinstein, A. A1 - Weitzel, Q. A1 - Welsing, R. A1 - Wetteskind, H. A1 - White, R. A1 - Wierzcholska, A. A1 - Wilkinson, M. I. A1 - Williams, D. A. A1 - Winde, M. A1 - Wischnewski, R. A1 - Wisniewski, L. A1 - Wolczko, A. A1 - Wood, M. A1 - Xiong, Q. A1 - Yamamoto, T. A1 - Yamaoka, K. A1 - Yamazaki, R. A1 - Yanagita, S. A1 - Yoffo, B. A1 - Yonetani, M. A1 - Yoshida, A. A1 - Yoshida, T. A1 - Yoshikoshi, T. A1 - Zabalza, V. A1 - Zagdanski, A. A1 - Zajczyk, A. A1 - Zdziarski, A. A1 - Zech, Alraune A1 - Zietara, K. A1 - Ziolkowski, P. A1 - Zitelli, V. A1 - Zychowski, P. T1 - Design concepts for the Cherenkov Telescope Array CTA an advanced facility for ground-based high-energy gamma-ray astronomy JF - Experimental astronomy : an international journal on astronomical instrumentation and data analysis N2 - Ground-based gamma-ray astronomy has had a major breakthrough with the impressive results obtained using systems of imaging atmospheric Cherenkov telescopes. Ground-based gamma-ray astronomy has a huge potential in astrophysics, particle physics and cosmology. CTA is an international initiative to build the next generation instrument, with a factor of 5-10 improvement in sensitivity in the 100 GeV-10 TeV range and the extension to energies well below 100 GeV and above 100 TeV. CTA will consist of two arrays (one in the north, one in the south) for full sky coverage and will be operated as open observatory. The design of CTA is based on currently available technology. This document reports on the status and presents the major design concepts of CTA. KW - Ground based gamma ray astronomy KW - Next generation Cherenkov telescopes KW - Design concepts Y1 - 2011 U6 - https://doi.org/10.1007/s10686-011-9247-0 SN - 0922-6435 SN - 1572-9508 VL - 32 IS - 3 SP - 193 EP - 316 PB - Springer CY - Dordrecht ER - TY - JOUR A1 - Vitagliano, Gerardo A1 - Jiang, Lan A1 - Naumann, Felix T1 - Detecting layout templates in complex multiregion files JF - Proceedings of the VLDB Endowment N2 - Spreadsheets are among the most commonly used file formats for data management, distribution, and analysis. Their widespread employment makes it easy to gather large collections of data, but their flexible canvas-based structure makes automated analysis difficult without heavy preparation. One of the common problems that practitioners face is the presence of multiple, independent regions in a single spreadsheet, possibly separated by repeated empty cells. We define such files as "multiregion" files. In collections of various spreadsheets, we can observe that some share the same layout. We present the Mondrian approach to automatically identify layout templates across multiple files and systematically extract the corresponding regions. Our approach is composed of three phases: first, each file is rendered as an image and inspected for elements that could form regions; then, using a clustering algorithm, the identified elements are grouped to form regions; finally, every file layout is represented as a graph and compared with others to find layout templates. We compare our method to state-of-the-art table recognition algorithms on two corpora of real-world enterprise spreadsheets. Our approach shows the best performances in detecting reliable region boundaries within each file and can correctly identify recurring layouts across files. Y1 - 2022 U6 - https://doi.org/10.14778/3494124.3494145 SN - 2150-8097 VL - 15 IS - 3 SP - 646 EP - 658 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Caruccio, Loredana A1 - Deufemia, Vincenzo A1 - Naumann, Felix A1 - Polese, Giuseppe T1 - Discovering relaxed functional dependencies based on multi-attribute dominance JF - IEEE transactions on knowledge and data engineering N2 - With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (fds). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through fds, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed fds (rfds) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose Domino, a new discovery algorithm for rfds that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring rfds. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm. KW - Complexity theory KW - Approximation algorithms KW - Big Data KW - Distributed KW - databases KW - Semantics KW - Lakes KW - Functional dependencies KW - data profiling KW - data cleansing Y1 - 2020 U6 - https://doi.org/10.1109/TKDE.2020.2967722 SN - 1041-4347 SN - 1558-2191 VL - 33 IS - 9 SP - 3212 EP - 3228 PB - Institute of Electrical and Electronics Engineers CY - New York, NY ER - TY - JOUR A1 - Berti-Equille, Laure A1 - Harmouch, Nazar A1 - Naumann, Felix A1 - Novelli, Noel A1 - Saravanan, Thirumuruganathan T1 - Discovery of genuine functional dependencies from relational data with missing values JF - Proceedings of the VLDB Endowment N2 - Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs. Y1 - 2018 U6 - https://doi.org/10.14778/3204028.3204032 SN - 2150-8097 VL - 11 IS - 8 SP - 880 EP - 892 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Abramowski, Attila A1 - Acero, F. A1 - Aharonian, Felix A. A1 - Akhperjanian, A. G. A1 - Angüner, Ekrem Oǧuzhan A1 - Anton, Gisela A1 - Balenderan, Shangkari A1 - Balzer, Arnim A1 - Barnacka, Anna A1 - Becherini, Yvonne A1 - Tjus, J. Becker A1 - Bernlöhr, K. A1 - Birsin, E. A1 - Bissaldi, E. A1 - Biteau, Jonathan A1 - Boisson, Catherine A1 - Bolmont, J. A1 - Bordas, Pol A1 - Brucker, J. A1 - Brun, Francois A1 - Brun, Pierre A1 - Bulik, Tomasz A1 - Carrigan, Svenja A1 - Casanova, Sabrina A1 - Cerruti, M. A1 - Chadwick, Paula M. A1 - Chalme-Calvet, R. A1 - Chaves, Ryan C. G. A1 - Cheesebrough, A. A1 - Chretien, M. A1 - Colafrancesco, Sergio A1 - Cologna, Gabriele A1 - Conrad, Jan A1 - Couturier, C. A1 - Dalton, M. A1 - Daniel, M. K. A1 - Davids, I. D. A1 - Degrange, B. A1 - Deil, C. A1 - deWilt, P. A1 - Dickinson, H. J. A1 - Djannati-Ataï, A. A1 - Domainko, W. A1 - Drury, L. O'C. A1 - Dubus, G. A1 - Dutson, K. A1 - Dyks, J. A1 - Dyrda, M. A1 - Edwards, T. A1 - Egberts, Kathrin A1 - Eger, P. A1 - Espigat, P. A1 - Farnier, C. A1 - Fegan, S. A1 - Feinstein, F. A1 - Fernandes, M. V. A1 - Fernandez, D. A1 - Fiasson, A. A1 - Fontaine, G. A1 - Foerster, A. A1 - Fuessling, M. A1 - Gajdus, M. A1 - Gallant, Y. A. A1 - Garrigoux, T. A1 - Gast, H. A1 - Giebels, B. A1 - Glicenstein, J. F. A1 - Goering, D. A1 - Grondin, M. -H. A1 - Grudzinska, M. A1 - Haeffner, S. A1 - Hague, J. D. A1 - Hahn, J. A1 - Harris, J. A1 - Heinzelmann, G. A1 - Henri, G. A1 - Hermann, G. A1 - Hervet, O. A1 - Hillert, A. A1 - Hinton, James Anthony A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Holler, Markus A1 - Horns, D. A1 - Jacholkowska, A. A1 - Jahn, C. A1 - Jamrozy, M. A1 - Janiak, M. A1 - Jankowsky, F. A1 - Jung, I. A1 - Kastendieck, M. A. A1 - Katarzynski, K. A1 - Katz, U. A1 - Kaufmann, S. A1 - Khelifi, B. A1 - Kieffer, M. A1 - Klepser, S. A1 - Klochkov, D. A1 - Kluzniak, W. A1 - Kneiske, T. A1 - Kolitzus, D. A1 - Komin, Nu. A1 - Kosack, K. A1 - Krakau, S. A1 - Krayzel, F. A1 - Krueger, P. P. A1 - Laffon, H. A1 - Lamanna, G. A1 - Lefaucheur, J. A1 - Lemoine-Goumard, M. A1 - Lenain, J. -P. A1 - Lennarz, D. A1 - Lohse, T. A1 - Lopatin, A. A1 - Lu, C. -C. A1 - Marandon, V. A1 - Marcowith, Alexandre A1 - Maxted, N. A1 - Mayer, M. A1 - McComb, T. J. L. A1 - Medina, M. C. A1 - Mehault, J. A1 - Menzler, U. A1 - Meyer, M. A1 - Moderski, R. A1 - Mohamed, M. A1 - Moulin, Emmanuel A1 - Murach, T. A1 - Naumann, C. L. A1 - de Naurois, M. A1 - Nedbal, D. A1 - Niemiec, J. A1 - Nolan, S. J. A1 - Oakes, L. A1 - Ohm, S. A1 - Wilhelmi, E. de Ona A1 - Opitz, B. A1 - Ostrowski, M. A1 - Oya, I. A1 - Panter, M. A1 - Parsons, R. D. A1 - Arribas, M. Paz A1 - Pekeur, N. W. A1 - Pelletier, G. A1 - Perez, J. A1 - Petrucci, P. -O. A1 - Peyaud, B. A1 - Pita, S. A1 - Poon, H. A1 - Punch, M. A1 - Quirrenbach, A. A1 - Raab, S. A1 - Raue, M. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - de los Reyes, R. A1 - Rieger, F. A1 - Rob, L. A1 - Rosier-Lees, S. A1 - Rowell, G. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Sahakian, V. A1 - Sanchez, David M. A1 - Santangelo, Andrea A1 - Schlickeiser, R. A1 - Schuessler, F. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schwemmer, S. A1 - Sol, H. A1 - Spengler, G. A1 - Spiess, F. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Stinzing, F. A1 - Stycz, K. A1 - Sushch, Iurii A1 - Szostek, A. A1 - Tavernet, J. -P. A1 - Terrier, R. A1 - Tluczykont, M. A1 - Trichard, C. A1 - Valerius, K. A1 - van Eldik, C. A1 - Vasileiadis, G. A1 - Venter, C. A1 - Viana, A. A1 - Vincent, P. A1 - Voelk, H. J. A1 - Volpe, F. A1 - Vorster, M. A1 - Wagner, S. J. A1 - Wagner, P. A1 - Ward, M. A1 - Weidinger, M. A1 - White, R. A1 - Wierzcholska, A. A1 - Willmann, P. A1 - Woernlein, A. A1 - Wouters, D. A1 - Zacharias, M. A1 - Zajczyk, A. A1 - Zdziarski, A. A. A1 - Zech, Alraune A1 - Zechlin, H. -S. T1 - Discovery of high and very high-energy emission from the BL Lacertae object SHBL J001355.9-185406 JF - Astronomy and astrophysics : an international weekly journal N2 - The detection of the high-frequency peaked BL Lac object (HBL) SHBL J001355.9-185406 (z = 0.095) at high (HE; 100 MeV < E < 300 GeV) and very high-energy (VHE; E > 100 GeV) with the Fermi Large Area Telescope (LAT) and the High Energy Stereoscopic System (H.E.S.S.) is reported. Dedicated observations were performed with the H. E. S. S. telescopes, leading to a detection at the 5.5 sigma significance level. The measured flux above 310 GeV is (8.3 +/- 1.7(stat) +/- 1.7(sys)) x 10(-13) photons cm(-2) s(-1) (about 0.6% of that of the Crab Nebula), and the power-law spectrum has a photon index of Gamma = 3.4 +/- 0.5(stat) +/- 0.2(sys). Using 3.5 years of publicly available Fermi-LAT data, a faint counterpart has been detected in the LAT data at the 5.5 sigma significance level, with an integrated flux above 300 MeV of (9.3 +/- 3.4(stat) +/- 0.8(sys)) x 10(-10) photons cm(-2) s(-1) and a photon index of Gamma = 1.96 +/- 0.20(stat) +/- 0.08(sys). X-ray observations with Swift-XRT allow the synchrotron peak energy in vF(v) representation to be located at similar to 1.0 keV. The broadband spectral energy distribution is modelled with a one-zone synchrotron self-Compton (SSC) model and the optical data by a black-body emission describing the thermal emission of the host galaxy. The derived parameters are typical of HBLs detected at VHE, with a particle-dominated jet. KW - BL Lacertae objects: individual: SHBL J001355.9-185406 KW - gamma rays: general Y1 - 2013 U6 - https://doi.org/10.1051/0004-6361/201220996 SN - 0004-6361 VL - 554 PB - EDP Sciences CY - Les Ulis ER - TY - JOUR A1 - Abramowski, Attila A1 - Acero, F. A1 - Aharonian, Felix A. A1 - Benkhali, Faical Ait A1 - Akhperjanian, A. G. A1 - Angüner, Ekrem Oǧuzhan A1 - Anton, Gisela A1 - Balenderan, Shangkari A1 - Balzer, Arnim A1 - Barnacka, Anna A1 - Becherini, Yvonne A1 - Tjus, J. Becker A1 - Bernlöhr, K. A1 - Birsin, E. A1 - Bissaldi, E. A1 - Biteau, Jonathan A1 - Boettcher, Markus A1 - Boisson, Catherine A1 - Bolmont, J. A1 - Bordas, Pol A1 - Brucker, J. A1 - Brun, Francois A1 - Brun, Pierre A1 - Bulik, Tomasz A1 - Carrigan, Svenja A1 - Casanova, Sabrina A1 - Cerruti, M. A1 - Chadwick, Paula M. A1 - Chalme-Calvet, R. A1 - Chaves, Ryan C. G. A1 - Cheesebrough, A. A1 - Chretien, M. A1 - Clapson, A. C. A1 - Colafrancesco, Sergio A1 - Cologna, Gabriele A1 - Conrad, Jan A1 - Couturier, C. A1 - Cui, Y. A1 - Dalton, M. A1 - Daniel, M. K. A1 - Davids, I. D. A1 - Degrange, B. A1 - Deil, C. A1 - deWilt, P. A1 - Dickinson, H. J. A1 - Djannati-Ataï, A. A1 - Domainko, W. A1 - Dubus, G. A1 - Dutson, K. A1 - Dyks, J. A1 - Dyrda, M. A1 - Edwards, T. A1 - Egberts, Kathrin A1 - Eger, P. A1 - Espigat, P. A1 - Farnier, C. A1 - Fegan, S. A1 - Feinstein, F. A1 - Fernandes, M. V. A1 - Fernandez, D. A1 - Fiasson, A. A1 - Fontaine, G. A1 - Foerster, A. A1 - Fuessling, M. A1 - Gajdus, M. A1 - Gallant, Y. A. A1 - Garrigoux, T. A1 - Giavitto, G. A1 - Giebels, B. A1 - Glicenstein, J. F. A1 - Grondin, M. -H. A1 - Grudzinska, M. A1 - Haeffner, S. A1 - Hahn, J. A1 - Harris, J. A1 - Heinzelmann, G. A1 - Henri, G. A1 - Hermann, G. A1 - Hervet, O. A1 - Hillert, A. A1 - Hinton, James Anthony A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Holler, Markus A1 - Horns, D. A1 - Jacholkowska, A. A1 - Jahn, C. A1 - Jamrozy, Marek A1 - Janiak, M. A1 - Jankowsky, F. A1 - Jung, I. A1 - Kastendieck, M. A. A1 - Katarzynski, Krzysztof A1 - Katz, Uli A1 - Kaufmann, S. A1 - Khelifi, B. A1 - Kieffer, M. A1 - Klepser, S. A1 - Klochkov, D. A1 - Kluzniak, W. A1 - Kneiske, T. A1 - Kolitzus, D. A1 - Komin, Nu. A1 - Kosack, K. A1 - Krakau, S. A1 - Krayzel, F. A1 - Krueger, P. P. A1 - Laffon, H. A1 - Lamanna, G. A1 - Lefaucheur, J. A1 - Lemiere, A. A1 - Lemoine-Goumard, M. A1 - Lenain, J. -P. A1 - Lennarz, D. A1 - Lohse, T. A1 - Lopatin, A. A1 - Lu, C. -C. A1 - Marandon, V. A1 - Marcowith, Alexandre A1 - Marx, R. A1 - Maurin, G. A1 - Maxted, N. A1 - Mayer, M. A1 - McComb, T. J. L. A1 - Mehault, J. A1 - Meintjes, P. J. A1 - Menzler, U. A1 - Meyer, Manuel A1 - Moderski, R. A1 - Mohamed, M. A1 - Moulin, Emmanuel A1 - Murach, T. A1 - Naumann, C. L. A1 - de Naurois, M. A1 - Niemiec, J. A1 - Nolan, S. J. A1 - Oakes, L. A1 - Ohm, S. A1 - Wilhelmi, E. de Ona A1 - Opitz, B. A1 - Ostrowski, M. A1 - Oya, I. A1 - Panter, M. A1 - Parsons, R. D. A1 - Arribas, M. Paz A1 - Pekeur, N. W. A1 - Pelletier, G. A1 - Perez, J. A1 - Petrucci, P. -O. A1 - Peyaud, B. A1 - Pita, S. A1 - Poon, H. A1 - Puehlhofer, G. A1 - Punch, M. A1 - Quirrenbach, A. A1 - Raab, S. A1 - Raue, M. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - de los Reyes, R. A1 - Rieger, F. A1 - Rob, L. A1 - Romoli, C. A1 - Rosier-Lees, S. A1 - Rowell, G. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Sahakian, V. A1 - Sanchez, David M. A1 - Santangelo, Andrea A1 - Schlickeiser, R. A1 - Schuessler, F. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schwemmer, S. A1 - Sol, H. A1 - Spengler, G. A1 - Spies, F. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Stinzing, F. A1 - Stycz, K. A1 - Sushch, Iurii A1 - Szostek, A. A1 - Tavernet, J. -P. A1 - Tavernier, T. A1 - Taylor, A. M. A1 - Terrier, R. A1 - Tluczykont, M. A1 - Trichard, C. A1 - Valerius, K. A1 - van Eldik, Christopher A1 - van Soelen, B. A1 - Vasileiadis, G. A1 - Venter, C. A1 - Viana, A. A1 - Vincent, P. A1 - Voelk, H. J. A1 - Volpe, F. A1 - Vorster, M. A1 - Vuillaume, T. A1 - Wagner, S. J. A1 - Wagner, P. A1 - Ward, M. A1 - Weidinger, M. A1 - Weitzel, Q. A1 - White, R. A1 - Wierzcholska, A. A1 - Willmann, P. A1 - Woernlein, A. A1 - Wouters, D. A1 - Zabalza, V. A1 - Zacharias, M. A1 - Zajczyk, A. A1 - Zdziarski, A. A. A1 - Zech, Alraune A1 - Zechlin, H. -S. T1 - Discovery of the VHE gamma-ray source HESS J1832-093 in the vicinity of SNR G22.7-0.2 JF - Monthly notices of the Royal Astronomical Society KW - astroparticle physics KW - ISM: individual objects: HESS J1832-093 KW - ISM: individual objects: SNR G22.7-0.2 KW - gamma-rays: general Y1 - 2015 U6 - https://doi.org/10.1093/mnras/stu2148 SN - 0035-8711 SN - 1365-2966 VL - 446 IS - 2 SP - 1163 EP - 1169 PB - Oxford Univ. Press CY - Oxford ER - TY - JOUR A1 - Abramowski, Attila A1 - Acero, F. A1 - Aharonian, Felix A. A1 - Benkhali, Faical Ait A1 - Akhperjanian, A. G. A1 - Angüner, Ekrem Oǧuzhan A1 - Anton, Gisela A1 - Balenderan, Shangkari A1 - Balzer, Arnim A1 - Barnacka, Anna A1 - Becherini, Yvonne A1 - Tjus, J. Becker A1 - Bernlöhr, K. A1 - Birsin, E. A1 - Bissaldi, E. A1 - Biteau, Jonathan A1 - Boettcher, Markus A1 - Boisson, Catherine A1 - Bolmont, J. A1 - Bordas, Pol A1 - Brucker, J. A1 - Brun, Francois A1 - Brun, Pierre A1 - Bulik, Tomasz A1 - Carrigan, Svenja A1 - Casanova, Sabrina A1 - Cerruti, M. A1 - Chadwick, Paula M. A1 - Chalme-Calvet, R. A1 - Chaves, Ryan C. G. A1 - Cheesebrough, A. A1 - Chretien, M. A1 - Colafrancesco, Sergio A1 - Cologna, Gabriele A1 - Conrad, Jan A1 - Couturier, C. A1 - Dalton, M. A1 - Daniel, M. K. A1 - Davids, I. D. A1 - Degrange, B. A1 - Deil, C. A1 - deWilt, P. A1 - Dickinson, H. J. A1 - Djannati-Ataï, A. A1 - Domainko, W. A1 - Drury, L. O&rsquo A1 - C., A1 - Dubus, G. A1 - Dutson, K. A1 - Dyks, J. A1 - Dyrda, M. A1 - Edwards, T. A1 - Egberts, Kathrin A1 - Eger, P. A1 - Espigat, P. A1 - Farnier, C. A1 - Fegan, S. A1 - Feinstein, F. A1 - Fernandes, M. V. A1 - Fernandez, D. A1 - Fiasson, A. A1 - Fontaine, G. A1 - Foerster, A. A1 - Fuessling, M. A1 - Gajdus, M. A1 - Gallant, Y. A. A1 - Garrigoux, T. A1 - Giebels, B. A1 - Glicenstein, J. F. A1 - Grondin, M. -H. A1 - Grudzinska, M. A1 - Haeffner, S. A1 - Hague, J. D. A1 - Hahn, J. A1 - Harris, J. A1 - Heinzelmann, G. A1 - Henri, G. A1 - Hermann, G. A1 - Hervet, O. A1 - Hillert, A. A1 - Hinton, James Anthony A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Holler, M. A1 - Horns, D. A1 - Jacholkowska, A. A1 - Jahn, C. A1 - Jamrozy, M. A1 - Janiak, M. A1 - Jankowsky, F. A1 - Jung, I. A1 - Kastendieck, M. A. A1 - Katarzynski, K. A1 - Katz, U. A1 - Kaufmann, S. A1 - Khelifi, B. A1 - Kieffer, M. A1 - Klepser, S. A1 - Klochkov, D. A1 - Kluzniak, W. A1 - Kneiske, T. A1 - Kolitzus, D. A1 - Komin, Nu. A1 - Kosack, K. A1 - Krakau, S. A1 - Krayzel, F. A1 - Krueger, P. P. A1 - Laffon, H. A1 - Lamanna, G. A1 - Lefaucheur, J. A1 - Lemoine-Goumard, M. A1 - Lenain, J. -P. A1 - Lennarz, D. A1 - Lohse, T. A1 - Lopatin, A. A1 - Lu, C. -C. A1 - Marandon, V. A1 - Marcowith, Alexandre A1 - Marx, R. A1 - Maurin, G. A1 - Maxted, N. A1 - Mayer, M. A1 - McComb, T. J. L. A1 - Medina, M. C. A1 - Mehault, J. A1 - Menzler, U. A1 - Meyer, M. A1 - Moderski, R. A1 - Mohamed, M. A1 - Moulin, Emmanuel A1 - Murach, T. A1 - Naumann, C. L. A1 - de Naurois, M. A1 - Nedbal, D. A1 - Niemiec, J. A1 - Nolan, S. J. A1 - Oakes, L. A1 - Ohm, S. A1 - Wilhelmi, E. de Ona A1 - Opitz, B. A1 - Ostrowski, M. A1 - Oya, I. A1 - Panter, M. A1 - Parsons, R. D. A1 - Arribas, M. Paz A1 - Pekeur, N. W. A1 - Pelletier, G. A1 - Perez, J. A1 - Petrucci, P. -O. A1 - Peyaud, B. A1 - Pita, S. A1 - Poon, H. A1 - Puehlhofer, G. A1 - Punch, M. A1 - Quirrenbach, A. A1 - Raab, S. A1 - Raue, M. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - de los Reyes, R. A1 - Rieger, F. A1 - Rob, L. A1 - Rosier-Lees, S. A1 - Rowell, G. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Sahakian, V. A1 - Sanchez, David M. A1 - Santangelo, Andrea A1 - Schlickeiser, R. A1 - Schuessler, F. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schwemmer, S. A1 - Sol, H. A1 - Spengler, G. A1 - Spies, F. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Stinzing, F. A1 - Stycz, K. A1 - Sushch, Iurii A1 - Szostek, A. A1 - Tavernet, J. -P. A1 - Terrier, R. A1 - Tluczykont, M. A1 - Trichard, C. A1 - Valerius, K. A1 - van Eldik, C. A1 - Vasileiadis, G. A1 - Venter, C. A1 - Viana, A. A1 - Vincent, P. A1 - Voelk, H. J. A1 - Volpe, F. A1 - Vorster, M. A1 - Wagner, S. J. A1 - Wagner, P. A1 - Ward, M. A1 - Weidinger, M. A1 - Weitzel, Q. A1 - White, R. A1 - Wierzcholska, A. A1 - Willmann, P. A1 - Woernlein, A. A1 - Wouters, D. A1 - Zacharias, M. A1 - Zajczyk, A. A1 - Zdziarski, A. A. A1 - Zech, Alraune A1 - Zechlin, H. -S. T1 - Discovery of very high energy gamma-ray emission from the BL Lacertae object PKS0301-243 with HESS JF - ASTRONOMY & ASTROPHYSICS N2 - The active galactic nucleus PKS 0301-243 (z = 0.266) is a high-synchrotron-peaked BL Lac object that is detected at high energies (HE, 100 MeV < E < 100 GeV) by Fermi/LAT. This paper reports on the discovery of PKS 0301-243 at very high energies (E > 100 GeV) by the High Energy Stereoscopic System (H.E.S.S.) from observations between September 2009 and December 2011 for a total live time of 34.9 h. Gamma rays above 200 GeV are detected at a significance of 9.4 sigma. A hint of variability at the 2.5 sigma level is found. An integral flux I(E > 200GeV) = (3.3 +/- 1.1(stat) +/- 0.7(syst)) x 10(-12) ph cm(-2) s(-1) and a photon index Gamma = 4.6 +/- 0.7(stat) +/- 0.2(syst) are measured. Multi-wavelength light curves in HE, X-ray and optical bands show strong variability, and a minimal variability timescale of eight days is estimated from the optical light curve. A single-zone leptonic synchrotron self-Compton scenario satisfactorily reproduces the multi-wavelength data. In this model, the emitting region is out of equipartition and the jet is particle dominated. Because of its high redshift compared to other sources observed at TeV energies, the very high energy emission from PKS 0301-243 is attenuated by the extragalactic background light (EBL) and the measured spectrum is used to derive an upper limit on the opacity of the EBL. KW - galaxies: active KW - BL Lacertae objects: general KW - BL Lacertae objects: individual: PKS 0301-243 KW - gamma rays: galaxies KW - radiation mechanisms: non-thermal Y1 - 2013 U6 - https://doi.org/10.1051/0004-6361/201321639 SN - 0004-6361 SN - 1432-0746 VL - 559 PB - EDP SCIENCES S A CY - LES ULIS CEDEX A ER - TY - BOOK A1 - Bauckmann, Jana A1 - Leser, Ulf A1 - Naumann, Felix T1 - Efficient and exact computation of inclusion dependencies for data integration N2 - Data obtained from foreign data sources often come with only superficial structural information, such as relation names and attribute names. Other types of metadata that are important for effective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is difficult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with Spider an algorithm that efficiently finds all INDs in a given relational database. It leverages the sorting facilities of DBMS but performs the actual comparisons outside of the database to save computation. Spider analyzes very large databases up to an order of magnitude faster than previous approaches. We also evaluate in detail the effectiveness of several heuristics to reduce the number of necessary comparisons. Furthermore, we generalize Spider to find composite INDs covering multiple attributes, and partial INDs, which are true INDs for all but a certain number of values. This last type is particularly relevant when integrating dirty data as is often the case in the life sciences domain - our driving motivation. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 34 KW - Metadatenentdeckung KW - Metadatenqualität KW - Schemaentdeckung KW - Datenanalyse KW - Datenintegration KW - metadata discovery KW - metadata quality KW - schema discovery KW - data profiling KW - data integration Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41396 SN - 978-3-86956-048-9 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - JOUR A1 - Schirmer, Philipp A1 - Papenbrock, Thorsten A1 - Koumarelas, Ioannis A1 - Naumann, Felix T1 - Efficient discovery of matching dependencies JF - ACM transactions on database systems : TODS N2 - Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB. KW - matching dependencies KW - functional dependencies KW - dependency discovery KW - data profiling KW - data matching KW - entity resolution KW - similarity measures Y1 - 2020 U6 - https://doi.org/10.1145/3392778 SN - 0362-5915 SN - 1557-4644 VL - 45 IS - 3 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Koumarelas, Ioannis A1 - Kroschk, Axel A1 - Mosley, Clifford A1 - Naumann, Felix T1 - Experience: Enhancing address matching with geocoding and similarity measure selection JF - Journal of Data and Information Quality N2 - Given a query record, record matching is the problem of finding database records that represent the same real-world object. In the easiest scenario, a database record is completely identical to the query. However, in most cases, problems do arise, for instance, as a result of data errors or data integrated from multiple sources or received from restrictive form fields. These problems are usually difficult, because they require a variety of actions, including field segmentation, decoding of values, and similarity comparisons, each requiring some domain knowledge. In this article, we study the problem of matching records that contain address information, including attributes such as Street-address and City. To facilitate this matching process, we propose a domain-specific procedure to, first, enrich each record with a more complete representation of the address information through geocoding and reverse-geocoding and, second, to select the best similarity measure per each address attribute that will finally help the classifier to achieve the best f-measure. We report on our experience in selecting geocoding services and discovering similarity measures for a concrete but common industry use-case. KW - Address matching KW - record linkage KW - duplicate detection KW - similarity measures KW - conditional functional dependencies KW - address normalization KW - address parsing KW - geocoding KW - geographic information systems KW - random forest Y1 - 2018 U6 - https://doi.org/10.1145/3232852 SN - 1936-1955 VL - 10 IS - 2 SP - 1 EP - 16 PB - Association for Computing Machinery CY - New York ER - TY - JOUR A1 - Hacker, Philipp A1 - Krestel, Ralf A1 - Grundmann, Stefan A1 - Naumann, Felix T1 - Explainable AI under contract and tort law BT - legal incentives and technical challenges JF - Artificial intelligence and law N2 - This paper shows that the law, in subtle ways, may set hitherto unrecognized incentives for the adoption of explainable machine learning applications. In doing so, we make two novel contributions. First, on the legal side, we show that to avoid liability, professional actors, such as doctors and managers, may soon be legally compelled to use explainable ML models. We argue that the importance of explainability reaches far beyond data protection law, and crucially influences questions of contractual and tort liability for the use of ML models. To this effect, we conduct two legal case studies, in medical and corporate merger applications of ML. As a second contribution, we discuss the (legally required) trade-off between accuracy and explainability and demonstrate the effect in a technical case study in the context of spam classification. KW - explainability KW - explainable AI KW - interpretable machine learning KW - contract KW - law KW - tort law KW - explainability-accuracy trade-off KW - medical malpractice KW - corporate takeovers Y1 - 2020 U6 - https://doi.org/10.1007/s10506-020-09260-6 SN - 0924-8463 SN - 1572-8382 VL - 28 IS - 4 SP - 415 EP - 439 PB - Springer CY - Dordrecht ER -