TY - BOOK A1 - Draisbach, Uwe A1 - Naumann, Felix A1 - Szott, Sascha A1 - Wonneberg, Oliver T1 - Adaptive windows for duplicate detection N2 - Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons). N2 - Duplikaterkennung beschreibt das Auffinden von mehreren Datensätzen, die das gleiche Realwelt-Objekt repräsentieren. Diese Aufgabe ist nicht trivial, da sich (i) die Datensätze geringfügig unterscheiden können, so dass Ähnlichkeitsmaße für einen paarweisen Vergleich benötigt werden, und (ii) aufgrund der Datenmenge ein vollständiger, paarweiser Vergleich nicht möglich ist. Zur Lösung des zweiten Problems existieren verschiedene Algorithmen, die die Datenmenge partitionieren und nur noch innerhalb der Partitionen Vergleiche durchführen. Einer dieser Algorithmen ist die Sorted-Neighborhood-Methode (SNM), welche Daten anhand eines Schlüssels sortiert und dann ein Fenster über die sortierten Daten schiebt. Vergleiche werden nur innerhalb dieses Fensters durchgeführt. Wir beschreiben verschiedene Variationen der Sorted-Neighborhood-Methode, die auf variierenden Fenstergrößen basieren. Diese Ansätze basieren auf der Intuition, dass Bereiche mit größerer und geringerer Ähnlichkeiten innerhalb der sortierten Datensätze existieren, für die entsprechend größere bzw. kleinere Fenstergrößen sinnvoll sind. Wir beschreiben und evaluieren verschiedene Adaptierungs-Strategien, von denen nachweislich einige bezüglich Effizienz besser sind als die originale Sorted-Neighborhood-Methode (gleiches Ergebnis bei weniger Vergleichen). T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 49 KW - Informationssysteme KW - Datenqualität KW - Datenintegration KW - Duplikaterkennung KW - Duplicate Detection KW - Data Quality KW - Data Integration KW - Information Systems Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-53007 SN - 978-3-86956-143-1 SN - 1613-5652 SN - 2191-1665 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK A1 - Bauckmann, Jana A1 - Abedjan, Ziawasch A1 - Leser, Ulf A1 - Müller, Heiko A1 - Naumann, Felix T1 - Covering or complete? : Discovering conditional inclusion dependencies N2 - Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case. N2 - Datenabhängigkeiten (wie zum Beispiel Integritätsbedingungen), werden verwendet, um die Qualität eines Datenbankschemas zu erhöhen, um Anfragen zu optimieren und um Konsistenz in einer Datenbank sicherzustellen. In den letzten Jahren wurden bedingte Abhängigkeiten (conditional dependencies) vorgestellt, die die Qualität von Daten analysieren und verbessern sollen. Eine bedingte Abhängigkeit ist eine Abhängigkeit mit begrenztem Gültigkeitsbereich, der über Bedingungen auf einem oder mehreren Attributen definiert wird. In diesem Bericht betrachten wir bedingte Inklusionsabhängigkeiten (conditional inclusion dependencies; CINDs). Wir generalisieren die Definition von CINDs anhand der Unterscheidung von überdeckenden (covering) und vollständigen (completeness) Bedingungen. Wir stellen einen Anwendungsfall für solche CINDs vor, der den Nutzen von CINDs bei der Lösung komplexer Datenqualitätsprobleme aufzeigt. Darüber hinaus definieren wir Qualitätsmaße für Bedingungen basierend auf Sensitivität und Genauigkeit. Wir stellen effiziente Algorithmen vor, die überdeckende und vollständige Bedingungen innerhalb vorgegebener Schwellwerte finden. Unsere Algorithmen wählen nicht nur die Werte der Bedingungen, sondern finden auch die Bedingungsattribute automatisch. Abschließend zeigen wir, dass unser Ansatz effizient sinnvolle und hilfreiche Ergebnisse für den vorgestellten Anwendungsfall liefert. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 62 KW - Datenabhängigkeiten KW - Bedingte Inklusionsabhängigkeiten KW - Erkennen von Meta-Daten KW - Linked Open Data KW - Link-Entdeckung KW - Assoziationsregeln KW - Data Dependency KW - Conditional Inclusion Dependency KW - Metadata Discovery KW - Linked Open Data KW - Link Discovery KW - Association Rule Mining Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-62089 SN - 978-3-86956-212-4 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - JOUR A1 - Abramowski, Attila A1 - Acero, F. A1 - Aharonian, Felix A. A1 - Akhperjanian, A. G. A1 - Anton, Gisela A1 - Balzer, Arnim A1 - Barnacka, Anna A1 - de Almeida, U. Barres A1 - Becherini, Yvonne A1 - Becker, J. A1 - Behera, B. A1 - Bernlöhr, K. A1 - Birsin, E. A1 - Biteau, Jonathan A1 - Bochow, A. A1 - Boisson, Catherine A1 - Bolmont, J. A1 - Bordas, Pol A1 - Brucker, J. A1 - Brun, Francois A1 - Brun, Pierre A1 - Bulik, Tomasz A1 - Buesching, I. A1 - Carrigan, Svenja A1 - Casanova, Sabrina A1 - Cerruti, M. A1 - Chadwick, Paula M. A1 - Charbonnier, A. A1 - Chaves, Ryan C. G. A1 - Cheesebrough, A. A1 - Clapson, A. C. A1 - Coignet, G. A1 - Cologna, Gabriele A1 - Conrad, Jan A1 - Dalton, M. A1 - Daniel, M. K. A1 - Davids, I. D. A1 - Degrange, B. A1 - Deil, C. A1 - Dickinson, H. J. A1 - Djannati-Ataï, A. A1 - Domainko, W. A1 - Drury, L. O'C. A1 - Dubus, G. A1 - Dutson, K. A1 - Dyks, J. A1 - Dyrda, M. A1 - Egberts, Kathrin A1 - Eger, P. A1 - Espigat, P. A1 - Fallon, L. A1 - Farnier, C. A1 - Fegan, S. A1 - Feinstein, F. A1 - Fernandes, M. V. A1 - Fiasson, A. A1 - Fontaine, G. A1 - Foerster, A. A1 - Fuessling, M. A1 - Gallant, Y. A. A1 - Gast, H. A1 - Gerard, L. A1 - Gerbig, D. A1 - Giebels, B. A1 - Glicenstein, J. F. A1 - Glueck, B. A1 - Goret, P. A1 - Goering, D. A1 - Haeffner, S. A1 - Hague, J. D. A1 - Hampf, D. A1 - Hauser, M. A1 - Heinz, S. A1 - Heinzelmann, G. A1 - Henri, G. A1 - Hermann, G. A1 - Hinton, James Anthony A1 - Hoffmann, A. A1 - Hofmann, W. A1 - Hofverberg, P. A1 - Holler, M. A1 - Horns, D. A1 - Jacholkowska, A. A1 - de Jager, O. C. A1 - Jahn, C. A1 - Jamrozy, M. A1 - Jung, I. A1 - Kastendieck, M. A. A1 - Katarzynski, K. A1 - Katz, U. A1 - Kaufmann, S. A1 - Keogh, D. A1 - Khangulyan, D. A1 - Khelifi, B. A1 - Klochkov, D. A1 - Kluzniak, W. A1 - Kneiske, T. A1 - Komin, Nu. A1 - Kosack, K. A1 - Kossakowski, R. A1 - Laffon, H. A1 - Lamanna, G. A1 - Lennarz, D. A1 - Lohse, T. A1 - Lopatin, A. A1 - Lu, C. -C. A1 - Marandon, V. A1 - Marcowith, Alexandre A1 - Masbou, J. A1 - Maurin, D. A1 - Maxted, N. A1 - Mayer, M. A1 - McComb, T. J. L. A1 - Medina, M. C. A1 - Mehault, J. A1 - Moderski, R. A1 - Moulin, Emmanuel A1 - Naumann, C. L. A1 - Naumann-Godo, M. A1 - de Naurois, M. A1 - Nedbal, D. A1 - Nekrassov, D. A1 - Nguyen, N. A1 - Nicholas, B. A1 - Niemiec, J. A1 - Nolan, S. J. A1 - Ohm, S. A1 - Wilhelmi, E. de Ona A1 - Opitz, B. A1 - Ostrowski, M. A1 - Oya, I. A1 - Panter, M. A1 - Arribas, M. Paz A1 - Pedaletti, G. A1 - Pelletier, G. A1 - Petrucci, P. -O. A1 - Pita, S. A1 - Puehlhofer, G. A1 - Punch, M. A1 - Quirrenbach, A. A1 - Raue, M. A1 - Rayner, S. M. A1 - Reimer, A. A1 - Reimer, O. A1 - Renaud, M. A1 - de los Reyes, R. A1 - Rieger, F. A1 - Ripken, J. A1 - Rob, L. A1 - Rosier-Lees, S. A1 - Rowell, G. A1 - Rudak, B. A1 - Rulten, C. B. A1 - Ruppel, J. A1 - Sahakian, V. A1 - Sanchez, David M. A1 - Santangelo, Andrea A1 - Schlickeiser, R. A1 - Schoeck, F. M. A1 - Schulz, A. A1 - Schwanke, U. A1 - Schwarzburg, S. A1 - Schwemmer, S. A1 - Sheidaei, F. A1 - Skilton, J. L. A1 - Sol, H. A1 - Spengler, G. A1 - Stawarz, L. A1 - Steenkamp, R. A1 - Stegmann, Christian A1 - Stinzing, F. A1 - Stycz, K. A1 - Sushch, Iurii A1 - Szostek, A. A1 - Tavernet, J. -P. A1 - Terrier, R. A1 - Tluczykont, M. A1 - Valerius, K. A1 - van Eldik, C. A1 - Vasileiadis, G. A1 - Venter, C. A1 - Vialle, J. P. A1 - Viana, A. A1 - Vincent, P. A1 - Voelk, H. J. A1 - Volpe, F. A1 - Vorobiov, S. A1 - Vorster, M. A1 - Wagner, S. J. A1 - Ward, M. A1 - White, R. A1 - Wierzcholska, A. A1 - Zacharias, M. A1 - Zajczyk, A. A1 - Zdziarski, A. A. A1 - Zech, Alraune A1 - Zechlin, H. -S. A1 - Aleksic, J. A1 - Antonelli, L. A. A1 - Antoranz, P. A1 - Backes, Michael A1 - Barrio, J. A. A1 - Bastieri, D. A1 - Becerra Gonzalez, J. A1 - Bednarek, W. A1 - Berdyugin, A. A1 - Berger, K. A1 - Bernardini, E. A1 - Biland, A. A1 - Blanch Bigas, O. A1 - Bock, R. K. A1 - Boller, A. A1 - Bonnoli, G. A1 - Tridon, D. Borla A1 - Braun, I. A1 - Bretz, T. A1 - Canellas, A. A1 - Carmona, E. A1 - Carosi, A. A1 - Colin, P. A1 - Colombo, E. A1 - Contreras, J. L. A1 - Cortina, J. A1 - Cossio, L. A1 - Covino, S. A1 - Dazzi, F. A1 - De Angelis, A. A1 - De Cea del Pozo, E. A1 - De Lotto, B. A1 - Delgado Mendez, C. A1 - Diago Ortega, A. A1 - Doert, M. A1 - Dominguez, A. A1 - Prester, Dijana Dominis A1 - Dorner, D. A1 - Doro, M. A1 - Elsaesser, D. A1 - Ferenc, D. A1 - Fonseca, M. V. A1 - Font, L. A1 - Fruck, C. A1 - Garcia Lopez, R. J. A1 - Garczarczyk, M. A1 - Garrido, D. A1 - Giavitto, G. A1 - Godinovic, N. A1 - Hadasch, D. A1 - Haefner, D. A1 - Herrero, A. A1 - Hildebrand, D. A1 - Hoehne-Moench, D. A1 - Hose, J. A1 - Hrupec, D. A1 - Huber, B. A1 - Jogler, T. A1 - Klepser, S. A1 - Kraehenbuehl, T. A1 - Krause, J. A1 - La Barbera, A. A1 - Lelas, D. A1 - Leonardo, E. A1 - Lindfors, E. A1 - Lombardi, S. A1 - Lopez, M. A1 - Lorenz, E. A1 - Makariev, M. A1 - Maneva, G. A1 - Mankuzhiyil, N. A1 - Mannheim, K. A1 - Maraschi, L. A1 - Mariotti, M. A1 - Martinez, M. A1 - Mazin, D. A1 - Meucci, M. A1 - Miranda, J. M. A1 - Mirzoyan, R. A1 - Miyamoto, H. A1 - Moldon, J. A1 - Moralejo, A. A1 - Munar, P. A1 - Nieto, D. A1 - Nilsson, K. A1 - Orito, R. A1 - Oya, I. A1 - Paneque, D. A1 - Paoletti, R. A1 - Pardo, S. A1 - Paredes, J. M. A1 - Partini, S. A1 - Pasanen, M. A1 - Pauss, F. A1 - Perez-Torres, M. A. A1 - Persic, M. A1 - Peruzzo, L. A1 - Pilia, M. A1 - Pochon, J. A1 - Prada, F. A1 - Moroni, P. G. Prada A1 - Prandini, E. A1 - Puljak, I. A1 - Reichardt, I. A1 - Reinthal, R. A1 - Rhode, W. A1 - Ribo, M. A1 - Rico, J. A1 - Ruegamer, S. A1 - Saggion, A. A1 - Saito, K. A1 - Saito, T. Y. A1 - Salvati, M. A1 - Satalecka, K. A1 - Scalzotto, V. A1 - Scapin, V. A1 - Schultz, C. A1 - Schweizer, T. A1 - Shayduk, M. A1 - Shore, S. N. A1 - Sillanpaa, A. A1 - Sitarek, J. A1 - Sobczynska, D. A1 - Spanier, F. A1 - Spiro, S. A1 - Stamerra, A. A1 - Steinke, B. A1 - Storz, J. A1 - Strah, N. A1 - Suric, T. A1 - Takalo, L. A1 - Takami, H. A1 - Tavecchio, F. A1 - Temnikov, P. A1 - Terzic, T. A1 - Tescaro, D. A1 - Teshima, M. A1 - Thom, M. A1 - Tibolla, O. A1 - Torres, D. F. A1 - Treves, A. A1 - Vankov, H. A1 - Vogler, P. A1 - Wagner, R. M. A1 - Weitzel, Q. A1 - Zabalza, V. A1 - Zandanel, F. A1 - Zanin, R. A1 - Arlen, T. A1 - Aune, T. A1 - Beilicke, M. A1 - Benbow, W. A1 - Bouvier, A. A1 - Bradbury, S. M. A1 - Buckley, J. H. A1 - Bugaev, V. A1 - Byrum, K. A1 - Cannon, A. A1 - Cesarini, A. A1 - Ciupik, L. A1 - Connolly, M. P. A1 - Cui, W. A1 - Dickherber, R. A1 - Duke, C. A1 - Errando, M. A1 - Falcone, A. A1 - Finley, J. P. A1 - Finnegan, G. A1 - Fortson, L. A1 - Furniss, A. A1 - Galante, N. A1 - Gall, D. A1 - Godambe, S. A1 - Griffin, S. A1 - Grube, J. A1 - Gyuk, G. A1 - Hanna, D. A1 - Holder, J. A1 - Huan, H. A1 - Hui, C. M. A1 - Kaaret, P. A1 - Karlsson, N. A1 - Kertzman, M. A1 - Khassen, Y. A1 - Kieda, D. A1 - Krawczynski, H. A1 - Krennrich, F. A1 - Lang, M. J. A1 - LeBohec, S. A1 - Maier, G. A1 - McArthur, S. A1 - McCann, A. A1 - Moriarty, P. A1 - Mukherjee, R. A1 - Nunez, P. D. A1 - Ong, R. A. A1 - Orr, M. A1 - Otte, A. N. A1 - Park, N. A1 - Perkins, J. S. A1 - Pichel, A. A1 - Pohl, Martin A1 - Prokoph, H. A1 - Ragan, K. A1 - Reyes, L. C. A1 - Reynolds, P. T. A1 - Roache, E. A1 - Rose, H. J. A1 - Ruppel, J. A1 - Schroedter, M. A1 - Sembroski, G. H. A1 - Sentuerk, G. D. A1 - Telezhinsky, Igor O. A1 - Tesic, G. A1 - Theiling, M. A1 - Thibadeau, S. A1 - Varlotta, A. A1 - Vassiliev, V. V. A1 - Vivier, M. A1 - Wakely, S. P. A1 - Weekes, T. C. A1 - Williams, D. A. A1 - Zitzer, B. A1 - de Almeida, U. Barres A1 - Cara, M. A1 - Casadio, C. A1 - Cheung, C. C. A1 - McConville, W. A1 - Davies, F. A1 - Doi, A. A1 - Giovannini, G. A1 - Giroletti, M. A1 - Hada, K. A1 - Hardee, P. A1 - Harris, D. E. A1 - Junor, W. A1 - Kino, M. A1 - Lee, N. P. A1 - Ly, C. A1 - Madrid, J. A1 - Massaro, F. A1 - Mundell, C. G. A1 - Nagai, H. A1 - Perlman, E. S. A1 - Steele, I. A. A1 - Walker, R. C. A1 - Wood, D. L. T1 - The 2010 very high energy gamma-ray flare and 10 years ofmulti-wavelength oservations of M 87 JF - The astrophysical journal : an international review of spectroscopy and astronomical physics N2 - The giant radio galaxy M 87 with its proximity (16 Mpc), famous jet, and very massive black hole ((3-6) x 10(9) M-circle dot) provides a unique opportunity to investigate the origin of very high energy (VHE; E > 100 GeV) gamma-ray emission generated in relativistic outflows and the surroundings of supermassive black holes. M 87 has been established as a VHE gamma-ray emitter since 2006. The VHE gamma-ray emission displays strong variability on timescales as short as a day. In this paper, results from a joint VHE monitoring campaign on M 87 by the MAGIC and VERITAS instruments in 2010 are reported. During the campaign, a flare at VHE was detected triggering further observations at VHE (H.E.S.S.), X-rays (Chandra), and radio (43 GHz Very Long Baseline Array, VLBA). The excellent sampling of the VHE gamma-ray light curve enables one to derive a precise temporal characterization of the flare: the single, isolated flare is well described by a two-sided exponential function with significantly different flux rise and decay times of tau(rise)(d) = (1.69 +/- 0.30) days and tau(decay)(d) = (0.611 +/- 0.080) days, respectively. While the overall variability pattern of the 2010 flare appears somewhat different from that of previous VHE flares in 2005 and 2008, they share very similar timescales (similar to day), peak fluxes (Phi(>0.35 TeV) similar or equal to (1-3) x 10(-11) photons cm(-2) s(-1)), and VHE spectra. VLBA radio observations of 43 GHz of the inner jet regions indicate no enhanced flux in 2010 in contrast to observations in 2008, where an increase of the radio flux of the innermost core regions coincided with a VHE flare. On the other hand, Chandra X-ray observations taken similar to 3 days after the peak of the VHE gamma-ray emission reveal an enhanced flux from the core (flux increased by factor similar to 2; variability timescale <2 days). The long-term (2001-2010) multi-wavelength (MWL) light curve of M 87, spanning from radio to VHE and including data from Hubble Space Telescope, Liverpool Telescope, Very Large Array, and European VLBI Network, is used to further investigate the origin of the VHE gamma-ray emission. No unique, common MWL signature of the three VHE flares has been identified. In the outer kiloparsec jet region, in particular in HST-1, no enhanced MWL activity was detected in 2008 and 2010, disfavoring it as the origin of the VHE flares during these years. Shortly after two of the three flares (2008 and 2010), the X-ray core was observed to be at a higher flux level than its characteristic range (determined from more than 60 monitoring observations: 2002-2009). In 2005, the strong flux dominance of HST-1 could have suppressed the detection of such a feature. Published models for VHE gamma-ray emission from M 87 are reviewed in the light of the new data. KW - galaxies: active KW - galaxies: individual (M 87) KW - galaxies: jets KW - galaxies: nuclei KW - gamma rays: galaxies KW - radiation mechanisms: non-thermal Y1 - 2012 U6 - https://doi.org/10.1088/0004-637X/746/2/151 SN - 0004-637X VL - 746 IS - 2 PB - IOP Publ. Ltd. CY - Bristol ER - TY - BOOK A1 - Albrecht, Alexander A1 - Naumann, Felix T1 - Understanding cryptic schemata in large extract-transform-load systems N2 - Extract-Transform-Load (ETL) tools are used for the creation, maintenance, and evolution of data warehouses, data marts, and operational data stores. ETL workflows populate those systems with data from various data sources by specifying and executing a DAG of transformations. Over time, hundreds of individual workflows evolve as new sources and new requirements are integrated into the system. The maintenance and evolution of large-scale ETL systems requires much time and manual effort. A key problem is to understand the meaning of unfamiliar attribute labels in source and target databases and ETL transformations. Hard-to-understand attribute labels lead to frustration and time spent to develop and understand ETL workflows. We present a schema decryption technique to support ETL developers in understanding cryptic schemata of sources, targets, and ETL transformations. For a given ETL system, our recommender-like approach leverages the large number of mapped attribute labels in existing ETL workflows to produce good and meaningful decryptions. In this way we are able to decrypt attribute labels consisting of a number of unfamiliar few-letter abbreviations, such as UNP_PEN_INT, which we can decrypt to UNPAID_PENALTY_INTEREST. We evaluate our schema decryption approach on three real-world repositories of ETL workflows and show that our approach is able to suggest high-quality decryptions for cryptic attribute labels in a given schema. N2 - Extract-Transform-Load (ETL) Tools werden häufig beim Erstellen, der Wartung und der Weiterentwicklung von Data Warehouses, Data Marts und operationalen Datenbanken verwendet. ETL Workflows befüllen diese Systeme mit Daten aus vielen unterschiedlichen Quellsystemen. Ein ETL Workflow besteht aus mehreren Transformationsschritten, die einen DAG-strukturierter Graphen bilden. Mit der Zeit entstehen hunderte individueller ETL Workflows, da neue Datenquellen integriert oder neue Anforderungen umgesetzt werden müssen. Die Wartung und Weiterentwicklung von großen ETL Systemen benötigt viel Zeit und manuelle Arbeit. Ein zentrales Problem ist dabei das Verständnis unbekannter Attributnamen in Quell- und Zieldatenbanken und ETL Transformationen. Schwer verständliche Attributnamen führen zu Frustration und hohen Zeitaufwänden bei der Entwicklung und dem Verständnis von ETL Workflows. Wir präsentieren eine Schema Decryption Technik, die ETL Entwicklern das Verständnis kryptischer Schemata in Quell- und Zieldatenbanken und ETL Transformationen erleichtert. Unser Ansatz berücksichtigt für ein gegebenes ETL System die Vielzahl verknüpfter Attributnamen in den existierenden ETL Workflows. So werden gute und aussagekräftige "Decryptions" gefunden und wir sind in der Lage Attributnamen, die aus unbekannten Abkürzungen bestehen, zu "decrypten". So wird z.B. für den Attributenamen UNP_PEN_INT als Decryption UNPAIN_PENALTY_INTEREST vorgeschlagen. Unser Schema Decryption Ansatz wurde für drei ETL-Repositories evaluiert und es zeigte sich, dass unser Ansatz qualitativ hochwertige Decryptions für kryptische Attributnamen vorschlägt. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 60 KW - Extract-Transform-Load (ETL) KW - Data Warehouse KW - Datenintegration KW - Extract-Transform-Load (ETL) KW - Data Warehouse KW - Data Integration Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-61257 SN - 978-3-86956-201-8 PB - Universitätsverlag Potsdam CY - Potsdam ER -