TY - BOOK A1 - Meinel, Christoph A1 - Plattner, Hasso A1 - Döllner, Jürgen Roland Friedrich A1 - Weske, Mathias A1 - Polze, Andreas A1 - Hirschfeld, Robert A1 - Naumann, Felix A1 - Giese, Holger A1 - Baudisch, Patrick T1 - Proceedings of the 7th Ph.D. Retreat of the HPI Research School on Service-oriented Systems Engineering N2 - Design and Implementation of service-oriented architectures imposes a huge number of research questions from the fields of software engineering, system analysis and modeling, adaptability, and application integration. Component orientation and web services are two approaches for design and realization of complex web-based system. Both approaches allow for dynamic application adaptation as well as integration of enterprise application. Commonly used technologies, such as J2EE and .NET, form de facto standards for the realization of complex distributed systems. Evolution of component systems has lead to web services and service-based architectures. This has been manifested in a multitude of industry standards and initiatives such as XML, WSDL UDDI, SOAP, etc. All these achievements lead to a new and promising paradigm in IT systems engineering which proposes to design complex software solutions as collaboration of contractually defined software services. Service-Oriented Systems Engineering represents a symbiosis of best practices in object-orientation, component-based development, distributed computing, and business process management. It provides integration of business and IT concerns. The annual Ph.D. Retreat of the Research School provides each member the opportunity to present his/her current state of their research and to give an outline of a prospective Ph.D. thesis. Due to the interdisciplinary structure of the Research Scholl, this technical report covers a wide range of research topics. These include but are not limited to: Self-Adaptive Service-Oriented Systems, Operating System Support for Service-Oriented Systems, Architecture and Modeling of Service-Oriented Systems, Adaptive Process Management, Services Composition and Workflow Planning, Security Engineering of Service-Based IT Systems, Quantitative Analysis and Optimization of Service-Oriented Systems, Service-Oriented Systems in 3D Computer Graphics sowie Service-Oriented Geoinformatics. N2 - Der Entwurf und die Realisierung dienstbasierender Architekturen wirft eine Vielzahl von Forschungsfragestellungen aus den Gebieten der Softwaretechnik, der Systemmodellierung und -analyse, sowie der Adaptierbarkeit und Integration von Applikationen auf. Komponentenorientierung und WebServices sind zwei Ansätze für den effizienten Entwurf und die Realisierung komplexer Web-basierender Systeme. Sie ermöglichen die Reaktion auf wechselnde Anforderungen ebenso, wie die Integration großer komplexer Softwaresysteme. Heute übliche Technologien, wie J2EE und .NET, sind de facto Standards für die Entwicklung großer verteilter Systeme. Die Evolution solcher Komponentensysteme führt über WebServices zu dienstbasierenden Architekturen. Dies manifestiert sich in einer Vielzahl von Industriestandards und Initiativen wie XML, WSDL, UDDI, SOAP. All diese Schritte führen letztlich zu einem neuen, vielversprechenden Paradigma für IT Systeme, nach dem komplexe Softwarelösungen durch die Integration vertraglich vereinbarter Software-Dienste aufgebaut werden sollen. "Service-Oriented Systems Engineering" repräsentiert die Symbiose bewährter Praktiken aus den Gebieten der Objektorientierung, der Komponentenprogrammierung, des verteilten Rechnen sowie der Geschäftsprozesse und berücksichtigt auch die Integration von Geschäftsanliegen und Informationstechnologien. Die Klausurtagung des Forschungskollegs "Service-oriented Systems Engineering" findet einmal jährlich statt und bietet allen Kollegiaten die Möglichkeit den Stand ihrer aktuellen Forschung darzulegen. Bedingt durch die Querschnittstruktur des Kollegs deckt dieser Bericht ein große Bandbreite aktueller Forschungsthemen ab. Dazu zählen unter anderem Self-Adaptive Service-Oriented Systems, Operating System Support for Service-Oriented Systems, Architecture and Modeling of Service-Oriented Systems, Adaptive Process Management, Services Composition and Workflow Planning, Security Engineering of Service-Based IT Systems, Quantitative Analysis and Optimization of Service-Oriented Systems, Service-Oriented Systems in 3D Computer Graphics sowie Service-Oriented Geoinformatics. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 83 KW - Hasso-Plattner-Institut KW - Forschungskolleg KW - Klausurtagung KW - Service-oriented Systems Engineering KW - Hasso Plattner Institute KW - Research School KW - Ph.D. Retreat KW - Service-oriented Systems Engineering Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-63490 SN - 978-3-86956-273-5 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - JOUR A1 - Loster, Michael A1 - Koumarelas, Ioannis A1 - Naumann, Felix T1 - Knowledge transfer for entity resolution with siamese neural networks JF - ACM journal of data and information quality N2 - The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise.
We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent. KW - Entity resolution KW - duplicate detection KW - transfer learning KW - neural KW - networks KW - metric learning KW - similarity learning KW - data quality Y1 - 2021 U6 - https://doi.org/10.1145/3410157 SN - 1936-1955 SN - 1936-1963 VL - 13 IS - 1 PB - Association for Computing Machinery CY - New York ER - TY - BOOK A1 - Abedjan, Ziawasch A1 - Naumann, Felix T1 - Advancing the discovery of unique column combinations N2 - Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown GORDIAN algorithm and "Apriori-based" algorithms are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCAGORDIAN combines the advantages of GORDIAN and our new algorithm HCA, and it significantly outperforms all previous work in many situations. N2 - Unique-Spaltenkombinationen sind Spaltenkombinationen einer Datenbanktabelle, die nur einzigartige Werte beinhalten. Das Finden von Unique-Spaltenkombinationen spielt sowohl eine wichtige Rolle im Bereich der Grundlagenforschung von Informationssystemen als auch in Anwendungsgebieten wie dem Datenmanagement und der Erkenntnisgewinnung aus Datenbeständen. Vorhandene Algorithmen, die dieses Problem angehen, sind entweder Brute-Force oder benötigen zu viel Hauptspeicher. Deshalb können diese Algorithmen nur auf kleine Datenmengen angewendet werden. In dieser Arbeit werden der bekannte GORDIAN-Algorithmus und Apriori-basierte Algorithmen zum Zwecke weiterer Optimierung analysiert. Wir verbessern die Apriori Algorithmen durch eine effiziente Kandidatengenerierung und Heuristikbasierten Kandidatenfilter. Eine Hybride Lösung, HCA-GORDIAN, kombiniert die Vorteile von GORDIAN und unserem neuen Algorithmus HCA, welche die bisherigen Algorithmen hinsichtlich der Effizienz in vielen Situationen übertrifft. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 51 KW - Apriori KW - eindeutig KW - funktionale Abhängigkeit KW - Schlüsselentdeckung KW - Data Profiling KW - apriori KW - unique KW - functional dependency KW - key discovery KW - data profiling Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-53564 SN - 978-3-86956-148-6 SN - 1613-5652 SN - 2191-1665 PB - Universitätsverlag Potsdam CY - Potsdam ER -