TY  - GEN
A1  - Benson, Lawrence
A1  - Makait, Hendrik
A1  - Rabl, Tilmann
T1  - Viper
BT  - An Efficient Hybrid PMem-DRAM Key-Value Store
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 20 
KW  - memory
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-559664
SN  - 2150-8097
IS  - 9
ER  - 
TY  - JOUR
A1  - Benson, Lawrence
A1  - Makait, Hendrik
A1  - Rabl, Tilmann
T1  - Viper
BT  - An Efficient Hybrid PMem-DRAM Key-Value Store
JF  - Proceedings of the VLDB Endowment
N2  - Key-value stores (KVSs) have found wide application in modern software systems. For persistence, their data resides in slow secondary storage, which requires KVSs to employ various techniques to increase their read and write performance from and to the underlying medium. Emerging persistent memory (PMem) technologies offer data persistence at close-to-DRAM speed, making them a promising alternative to classical disk-based storage. However, simply drop-in replacing existing storage with PMem does not yield good results, as block-based access behaves differently in PMem than on disk and ignores PMem's byte addressability, layout, and unique performance characteristics. In this paper, we propose three PMem-specific access patterns and implement them in a hybrid PMem-DRAM KVS called Viper. We employ a DRAM-based hash index and a PMem-aware storage layout to utilize the random-write speed of DRAM and efficient sequential-write performance PMem. Our evaluation shows that Viper significantly outperforms existing KVSs for core KVS operations while providing full data persistence. Moreover, Viper outperforms existing PMem-only, hybrid, and disk-based KVSs by 4-18x for write workloads, while matching or surpassing their get performance.
KW  - memory
Y1  - 2021
U6  - https://doi.org/10.14778/3461535.3461543
SN  - 2150-8097
VL  - 14
IS  - 9
SP  - 1544
EP  - 1556
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - GEN
A1  - Benlian, Alexander
A1  - Wiener, Martin
A1  - Cram, W. Alec
A1  - Krasnova, Hanna
A1  - Maedche, Alexander
A1  - Mohlmann, Mareike
A1  - Recker, Jan
A1  - Remus, Ulrich
T1  - Algorithmic management
BT  - Bright and dark sides, practical implications, and research opportunities
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 174 
Y1  - 0202
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-607112
SN  - 2363-7005
SN  - 1867-0202
SN  - 1867-5808
IS  - 6
ER  - 
TY  - JOUR
A1  - Benlian, Alexander
A1  - Wiener, Martin
A1  - Cram, W. Alec
A1  - Krasnova, Hanna
A1  - Maedche, Alexander
A1  - Mohlmann, Mareike
A1  - Recker, Jan
A1  - Remus, Ulrich
T1  - Algorithmic management
BT  - bright and dark sides, practical implications, and research opportunities
JF  - Business and information systems engineering
Y1  - 2022
U6  - https://doi.org/10.1007/s12599-022-00764-w
SN  - 2363-7005
SN  - 1867-0202
VL  - 64
IS  - 6
SP  - 825
EP  - 839
PB  - Springer Gabler
CY  - Wiesbaden
ER  - 
TY  - JOUR
A1  - Bender, Benedict
A1  - Körppen, Tim
T1  - Integriert statt isoliert
BT  - Technologien für die erfolgreiche Umsetzung von datengetriebenem Management
JF  - Digital business : cloud
N2  - Dass Daten und Analysen Innovationstreiber sind und nicht mehr nur einen Hygienefaktor darstellen, haben viele Unternehmen erkannt. Um Potenziale zu heben, müssen Daten zielführend integriert werden. Komplexe Systemlandschaften und isolierte Datenbestände erschweren dies. Technologien für die erfolgreiche Umsetzung von datengetriebenem Management müssen richtig eingesetzt werden.
N2  - The fact that data and analyses are innovation drivers and no longer just represent a hygiene factor is nowadays understood by many companies. An important step for the development of this hidden potential is the target-oriented utilization of the existing data stocks in one's own company. In doing so, many companies face the hurdle of complex system landscapes and isolated data stocks. This article provides an overview of solutions for analysis-oriented data integration and helps decision-makers to select a suitable technology for their own company.
KW  - data analytics
KW  - data requirements
KW  - software selection
Y1  - 2022
UR  - https://www.wiso-net.de/document/DBC__584ddfcbfbc5ff400cb2ffb0f31eba6e6903fb3d
SN  - 2510-344X
VL  - 26
IS  - 1
SP  - 26
EP  - 27
PB  - WIN-Verlag GmbH & Co. KG
CY  - Vaterstetten
ER  - 
TY  - BOOK
A1  - Bein, Leon
A1  - Braun, Tom
A1  - Daase, Björn
A1  - Emsbach, Elina
A1  - Matthes, Leon
A1  - Stiede, Maximilian
A1  - Taeumel, Marcel
A1  - Mattis, Toni
A1  - Ramson, Stefan
A1  - Rein, Patrick
A1  - Hirschfeld, Robert
A1  - Mönig, Jens
T1  - SandBlocks
T1  - SandBlocks
BT  - Integration visueller und textueller Programmelemente in Live-Programmiersysteme
BT  - integration of visual and textual elements in live programming systems
N2  - Visuelle Programmiersprachen werden heutzutage zugunsten textueller Programmiersprachen nahezu nicht verwendet, obwohl visuelle Programmiersprachen einige Vorteile bieten. Diese reichen von der Vermeidung von Syntaxfehlern, über die Nutzung konkreter domänenspezifischer Notation bis hin zu besserer Lesbarkeit und Wartbarkeit des Programms. Trotzdem greifen professionelle Softwareentwickler nahezu ausschließlich auf textuelle Programmiersprachen zurück.

Damit Entwickler diese Vorteile visueller Programmiersprachen nutzen können, aber trotzdem nicht auf die ihnen bekannten textuellen Programmiersprachen verzichten müssen, gibt es die Idee, textuelle und visuelle Programmelemente gemeinsam in einer Programmiersprache nutzbar zu machen. Damit ist dem Entwickler überlassen wann und wie er visuelle Elemente in seinem Programmcode verwendet.

Diese Arbeit stellt das SandBlocks-Framework vor, das diese gemeinsame Nutzung visueller und textueller Programmelemente ermöglicht. Neben einer Auswertung visueller Programmiersprachen, zeigt es die technische Integration visueller Programmelemente in das Squeak/Smalltalk-System auf, gibt Einblicke in die Umsetzung und Verwendung in Live-Programmiersystemen und diskutiert ihre Verwendung in unterschiedlichen Domänen.
N2  - Nowadays, visual programming languages exist but are rarely used because textual languages dominate the field. Even though visual languages can offer many virtues - such as protection from syntax errors, concise notation for specific domains, improved readability and maintainability of programs – professional software developers tend to only employ textual programming languages.

We propose an approach to combine both textual and visual elements in a shared programming system. Developers can rely on the familiar textual representation of source code but also leverage the programming experience with a visual language as needed.

This work presents the SandBlocks framework, which enables a joint experience of visual and textual programming elements. It discusses the virtues of visual languages and related work, describes a technical integration of visual elements into the Squeak/Smalltalk programming system, sketches potential workflows in live programming systems, and illustrates applications for several domains.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 132 
KW  - Programmieren
KW  - Benutzerinteraktion
KW  - visuelle Sprachen
KW  - Liveness
KW  - Smalltalk
KW  - programming
KW  - user interaction
KW  - visual languages
KW  - liveness
KW  - Smalltalk
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-439263
SN  - 978-3-86956-482-1
SN  - 1613-5652
SN  - 2191-1665
IS  - 132
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Beckmann, Tom
A1  - Hildebrand, Justus
A1  - Jaschek, Corinna
A1  - Krebs, Eva
A1  - Löser, Alexander
A1  - Taeumel, Marcel
A1  - Pape, Tobias
A1  - Fister, Lasse
A1  - Hirschfeld, Robert
T1  - The font engineering platform
T1  - Eine Plattform für Schriftarten
BT  - collaborative font creation in a self-supporting programming environment
BT  - kollaborative Schriftartgestaltung in Einer selbsttragenden Programmierumgebung
N2  - Creating fonts is a complex task that requires expert knowledge in a variety of domains. Often, this knowledge is not held by a single person, but spread across a number of domain experts. A central concept needed for designing fonts is the glyph, an elemental symbol representing a readable character. Required domains include designing glyph shapes, engineering rules to combine glyphs for complex scripts and checking legibility. This process is most often iterative and requires communication in all directions. This report outlines a platform that aims to enhance the means of communication, describes our prototyping process, discusses complex font rendering and editing in a live environment and an approach to generate code based on a user’s live-edits.
N2  - Die Erstellung von Schriften ist eine komplexe Aufgabe, die Expertenwissen aus einer Vielzahl von Bereichen erfordert. Oftmals liegt dieses Wissen nicht bei einer einzigen Person, sondern bei einer Reihe von Fachleuten. Ein zentrales Konzept für die Gestaltung von Schriften ist der Glyph, ein elementares Symbol, das ein einzelnes lesbares Zeichen darstellt. Zu den erforderlichen Domänen gehören das Entwerfen der Glyphenformen, technische Regeln zur Kombination von Glyphen für komplexe Skripte und das Prüfen der Lesbarkeit. Dieser Prozess ist meist iterativ und erfordert ständige Kommunikation zwischen den Experten. Dieser Bericht skizziert eine Plattform, die darauf abzielt, die Kommunikationswege zu verbessern, beschreibt unseren Prototyping-Prozess, diskutiert komplexe Schriftrendering und -bearbeitung in einer Echtzeitumgebung und einen Ansatz zur Generierung von Code basierend auf direkter Manipulation eines Nutzers.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 128 
KW  - smalltalk
KW  - squeak
KW  - font rendering
KW  - font engineering
KW  - prototyping
KW  - Smalltalk
KW  - Squeak
KW  - Schriftrendering
KW  - Schriftartgestaltung
KW  - Prototyping
Y1  - 2019
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-427487
SN  - 978-3-86956-464-7
SN  - 1613-5652
SN  - 2191-1665
IS  - 128
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Becker, Basil
A1  - Giese, Holger
A1  - Neumann, Stefan
T1  - Correct dynamic service-oriented architectures : modeling and compositional verification with dynamic collaborations
N2  - Service-oriented modeling employs collaborations to capture the coordination of multiple roles in form of service contracts. In case of dynamic collaborations the roles may join and leave the collaboration at runtime and therefore complex structural dynamics can result, which makes it very hard to ensure their correct and safe operation. We present in this paper our approach for modeling and verifying such dynamic collaborations. Modeling is supported using a well-defined subset of UML class diagrams, behavioral rules for the structural dynamics, and UML state machines for the role behavior. To be also able to verify the resulting service-oriented systems, we extended our former results for the automated verification of systems with structural dynamics [7, 8] and developed a compositional reasoning scheme, which enables the reuse of verification results. We outline our approach using the example of autonomous vehicles that use such dynamic collaborations via ad-hoc networking to coordinate and optimize their joint behavior.
N2  - Bei der Modellierung Service-orientierter Systeme werden Kollaborationen verwendet, um die Koordination mehrerer Rollen durch Service-Verträge zu beschreiben. Dynamische Kollaborationen erlauben ein Hinzufügen und Entfernen von Rollen zur Kollaboration zur Laufzeit, wodurch eine komplexe strukturelle Dynamik entstehen kann. Die automatische Analyse service-orientierter Systeme wird durch diese erheblich erschwert. In dieser Arbeit stellen wir einen Ansatz zur Modellierung und Verifikation solcher dynamischer Kollaborationen vor. Eine spezielle Untermenge der UML ermöglicht die Modellierung, wobei Klassendiagramme, Verhaltensregeln für die strukturelle Dynamik und UML Zustandsdiagramme für das Verhalten der Rollen verwendet werden. Um die Verifikation der so modellierten service-orientierten Systeme zu ermöglichen, erweiterten wir unsere früheren Ergebnisse zur Verifikation von Systemen mit struktureller Dynamik [7,8] und entwickelten einen kompositionalen Verifikationsansatz. Der entwickelte Verifikationsansatz erlaubt es Ergebnisse wiederzuverwenden. Die entwickelten Techniken werden anhand autonomer Fahrzeuge, die dynamische Kollaborationen über ad-hoc Netzwerke zur Koordination und Optimierung ihres gemeinsamen Verhaltens nutzen, exemplarisch vorgestellt.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 29 
Y1  - 2009
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-30473
SN  - 978-3-940793-91-1
ER  - 
TY  - BOOK
A1  - Becker, Basil
A1  - Giese, Holger
T1  - Cyber-physical systems with dynamic structure : towards modeling and verification of inductive invariants
N2  - Cyber-physical systems achieve sophisticated system behavior exploring the tight interconnection of physical coupling present in classical engineering systems and information technology based coupling. A particular challenging case are systems where these cyber-physical systems are formed ad hoc according to the specific local topology, the available networking capabilities, and the goals and constraints of the subsystems captured by the information processing part. In this paper we present a formalism that permits to model the sketched class of cyber-physical systems. The ad hoc formation of tightly coupled subsystems of arbitrary size are specified using a UML-based graph transformation system approach. Differential equations are employed to define the resulting tightly coupled behavior. Together, both form hybrid graph transformation systems where the graph transformation rules define the discrete steps where the topology or modes may change, while the differential equations capture the continuous behavior in between such discrete changes. In addition, we demonstrate that automated analysis techniques known for timed graph transformation systems for inductive invariants can be extended to also cover the hybrid case for an expressive case of hybrid models where the formed tightly coupled subsystems are restricted to smaller local networks.
N2  - Cyber-physical Systeme erzielen ihr ausgefeiltes Systemverhalten durch die enge Verschränkung von physikalischer Kopplung, wie sie in Systemen der klassichen Igenieurs-Disziplinen vorkommt, und der Kopplung durch Informationstechnologie. Eine besondere Herausforderung stellen in diesem Zusammenhang Systeme dar, die durch die spontane Vernetzung einzelner Cyber-Physical-Systeme entsprechend der lokalen, topologischen Gegebenheiten, verfügbarer Netzwerkfähigkeiten und der Anforderungen und Beschränkungen der Teilsysteme, die durch den informationsverabeitenden Teil vorgegeben sind, entstehen. In diesem Bericht stellen wir einen Formalismus vor, der die Modellierung der eingangs skizzierten Systeme erlaubt. Ein auf UML aufbauender Graph-Transformations-Ansatz wird genutzt, um die spontane Bildung eng kooperierender Teilsysteme beliebiger Größe zu spezifizieren. Differentialgleichungen beschreiben das kombinierte Verhalten auf physikalischer Ebene. In Kombination ergeben diese beiden Formalismen hybride Graph-Transformations-Systeme, in denen die Graph-Transformationen diskrete Schritte und die Differentialgleichungen das kontinuierliche, physikalische Verhalten des Systems beschreiben. Zusätzlich, präsentieren wir die Erweiterung einer automatischen Analysetechnik zur Verifikation induktiver Invarianten, die bereits für zeitbehaftete Systeme bekannt ist, auf den ausdrucksstärkeren Fall der hybriden Modelle.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 64 
KW  - Cyber-Physical-Systeme
KW  - Verifikation
KW  - Modellierung
KW  - hybride Graph-Transformations-Systeme
KW  - Cyber-physical-systems
KW  - verification
KW  - modeling
KW  - hybrid graph-transformation-systems
Y1  - 2012
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-62437
SN  - 978-3-86956-217-9
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Becker, Basil
T1  - Architectural modelling and verification of open service-oriented systems of systems
T1  - Architekturmodellierung und Verifikation von offenen und service-orientierten Systems of Systems
N2  - Systems of Systems (SoS) have received a lot of attention recently. In this thesis we will focus on SoS that are built atop the techniques of Service-Oriented Architectures and thus combine the benefits and challenges of both paradigms. For this thesis we will understand SoS as ensembles of single autonomous systems that are integrated to a larger system, the SoS. The interesting fact about these systems is that the previously isolated systems are still maintained, improved and developed on their own. Structural dynamics is an issue in SoS, as at every point in time systems can join and leave the ensemble. This and the fact that the cooperation among the constituent systems is not necessarily observable means that we will consider these systems as open systems. Of course, the system has a clear boundary at each point in time, but this can only be identified by halting the complete SoS. However, halting a system of that size is practically impossible. Often SoS are combinations of software systems and physical systems. Hence a failure in the software system can have a serious physical impact what makes an SoS of this kind easily a safety-critical system. The contribution of this thesis is a modelling approach that extends OMG's SoaML and basically relies on collaborations and roles as an abstraction layer above the components. This will allow us to describe SoS at an architectural level. We will also give a formal semantics for our modelling approach which employs hybrid graph-transformation systems. The modelling approach is accompanied by a modular verification scheme that will be able to cope with the complexity constraints implied by the SoS' structural dynamics and size. Building such autonomous systems as SoS without evolution at the architectural level --- i. e. adding and removing of components and services --- is inadequate. Therefore our approach directly supports the modelling and verification of evolution.
N2  - Systems of Systems (SoS) sind ein seit längerem bekanntes Konzept, das jedoch in letzter Zeit vermehrt Aufmerksamkeit erhielt. Das Hauptaugenmerk dieser Arbeit wird auf SoS liegen, die mit Hilfe von Techniken aus Service-Orientierten Architekturen erstellt werden. Somit vereinen die hier betrachteten SoS die Vorteile und Herausforderungen beider Paradigmen. SoS können definiert werden als Zusammenschlüsse einzelner, autonomer Systeme, die zu einem größeren System integriert werden. In diesem Zusammenhang interessant ist, dass die ehemals isolierten Systeme nach wie vor isoliert voneinander weiterentwickelt und gewartet werden. Desweiteren kommt der Strukturdynamik innerhalb des SoS eine beachtliche Bedeutung zu, da jederzeit Systeme dem SoS beitreten und es verlassen können. Zusammen mit der Tatsache, dass die Kooperationen zwischen den konstituierenden Systemen nicht immer beobachtbar sind, führt dies dazu, dass wir diese Systeme als offene Systeme bezeichnen. Wobei das System natürlich jederzeit eine klar definierte Grenze besitzt, diese aber nur durch ein Anhalten des Systems zu bestimmen ist. Dies jedoch ist, von einer praktischen Perspektive aus betrachtet, unmöglich. Häufig stellen SoS eine Kombination aus Softwaresystemen und pyhsikalischen Systemen dar mit der Folge, dass ein Fehler in der Software eine SoS schnell eine immense physikalische Wirkung entwickeln kann. Von daher fallen SoS leicht in die Klasse der sicherheitskritischen Systeme. In dieser Arbeit werden wir einen Modellierungsansatz vorstellen, der die Sprache SoaML der OMG erweitert. Die grundlegenden Konzepte dieses Ansatzes sind die Modellierung mit Kollaborationen und Rollen als Abstraktionsebene über Komponenten. Der vorgestellte Ansatz erlaubt es uns SoS auf einer architekturellen Ebene zu betrachten. Die formale Semantik unseres Modellierungsansatzes ist durch hybride Graphtransformationssysteme gegeben. Abgestimmt auf die Modellierung werden wir ebenfalls ein Verfahren zu Verifikation von SoS vorstellen, welches trotz der inhärenten Komplexität von SoS, diese zu verifizieren. Die Modellierung und Verifikation von Evolution wird von unserem Ansatz direkt unterstützt.
KW  - Modellierung
KW  - Verifikation
KW  - Evolution
KW  - Systems of Systems
KW  - Service-orientierte Systeme
KW  - modelling
KW  - verification
KW  - evolution
KW  - systems of systems
KW  - service-oriented systems
Y1  - 2013
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-70158
ER  - 
TY  - THES
A1  - Bazhenova, Ekaterina
T1  - Discovery of Decision Models Complementary to Process Models
T1  - Das Konstruieren von Entscheidungsmodellen als Ergänzung zu Prozessmodellen
N2  - Business process management is an acknowledged asset for running an organization in a productive and sustainable way. One of the most important aspects of business process management, occurring on a daily basis at all levels, is decision making. In recent years, a number of decision management frameworks have appeared in addition to existing business process management systems. More recently, Decision Model and Notation (DMN) was developed by the OMG consortium with the aim of complementing the widely used Business Process Model and Notation (BPMN). One of the reasons for the emergence of DMN is the increasing interest in the evolving paradigm known as the separation of concerns. This paradigm states that modeling decisions complementary to processes reduces process complexity by externalizing decision logic from process models and importing it into a dedicated decision model. Such an approach increases the agility of model design and execution. This provides organizations with the flexibility to adapt to the ever increasing rapid and dynamic changes in the business ecosystem. The research gap, identified by us, is that the separation of concerns, recommended by DMN, prescribes the externalization of the decision logic of process models in one or more separate decision models, but it does not specify this can be achieved. 


The goal of this thesis is to overcome the presented gap by developing a framework for discovering decision models in a semi-automated way from information about existing process decision making. Thus, in this thesis we develop methodologies to extract decision models from: (1) control flow and data of process models that exist in enterprises; and (2) from event logs recorded by enterprise information systems, encapsulating day-to-day operations. Furthermore, we provide an extension of the methodologies to discover decision models from event logs enriched with fuzziness, a tool dealing with partial knowledge of the process execution information. All the proposed techniques are implemented and evaluated in case studies using real-life and synthetic process models and event logs. The evaluation of these case studies shows that the proposed methodologies provide valid and accurate output decision models that can serve as blueprints for executing decisions complementary to process models. Thus, these methodologies have applicability in the real world and they can be used, for example, for compliance checks, among other uses, which could improve the organization's decision making and hence it's overall performance.
N2  - Geschäftsprozessmanagement ist eine anerkannte Strategie, um Unternehmen produktiv und nachhaltig zu führen. Einer der wichtigsten Faktoren des Geschäftsprozessmanagements ist die Entscheidungsfindung – tagtäglich und auf allen Ebenen. In den letzten Jahren wurden – zusätzlich zu existierenden Geschäftsprozessmanagementsystemen – eine Reihe von Frameworks zum Entscheidungsmanagement entwickelt. Um die weit verbreitete Business Process Model and Notation (BPMN) zu ergänzen, hat das OMG-Konsortium kürzlich die Decision Model and Notation (DMN) entwickelt. Einer der Treiber für die Entwicklung der DMN ist das wachsende Interesse an dem aufstrebenden Paradigma der “Separation of Concerns” (Trennung der Sichtweisen). Dieses Prinzip besagt, dass die Prozesskomplexität reduziert wird, wenn Entscheidungen komplementär zu den Prozessen modelliert werden, indem die Entscheidungslogik von Prozessmodellen entkoppelt und in ein dediziertes Entscheidungsmodel aufgenommen wird. Solch ein Ansatz erhöht die Agilität von Modelentwurf und –ausführung und bietet Unternehmen so die Flexibilität, auf die stetig zunehmenden, rasanten Veränderungen in der Unternehmenswelt zu reagieren. Während die DMN die Trennung der Belange empfiehlt und die Entkopplung der Entscheidungslogik von den Prozessmodellen vorschreibt, gibt es bisher keine Spezifikation, wie dies erreicht werden kann. Diese Forschungslücke ist der Ausgangspunkt der vorliegenden Arbeit.
Das Ziel dieser Doktorarbeit ist es, die beschriebene Lücke zu füllen und ein Framework zur halbautomatischen Konstruktion von Entscheidungsmodellen zu entwickeln, basierend auf Informationen über existierende Prozessentscheidungsfindung. In dieser Arbeit werden die entwickelten Methoden zur Entkopplung von Entscheidungsmodellen dargestellt. Die Extraktion der Modelle basiert auf folgenden Eingaben: (1) Kontrollfluss und Daten aus Prozessmodellen, die in Unternehmen existieren; und (2) von Unternehmensinformationssystemen aufgezeichnete Ereignisprotokolle der Tagesgeschäfte. Außerdem stellen wir eine Erweiterung der Methode vor, die es ermöglicht, auch in von Unschärfe geprägten Ereignisprotokollen Entscheidungsmodelle zu entdecken. Hier wird mit Teilwissen über die Prozessausführung gearbeitet. Alle vorgestellten Techniken wurden implementiert und in Fallstudien evaluiert – basierend auf realen und künstlichen Prozessmodellen, sowie auf Ereignisprotokollen. Die Evaluierung der Fallstudien zeigt, dass die vorgeschlagenen Methoden valide und akkurate Entscheidungsmodelle produzieren, die als Blaupause für das Vollziehen von Entscheidungen dienen können und die Prozessmodelle ergänzen. Demnach sind die vorgestellten Methoden in der realenWelt anwendbar und können beispielsweise für Übereinstimmungskontrollen genutzt werden, was wiederum die Entscheidungsfindung in Unternehmen und somit deren Gesamtleistung verbessern kann.
KW  - business process management
KW  - decision management
KW  - process models
KW  - decision models
KW  - decision mining
KW  - Geschäftsprozessmanagement
KW  - Entscheidungsmanagement
KW  - Entscheidungsfindung
KW  - Entscheidungsmodelle
KW  - Prozessmodelle
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-410020
ER  - 
TY  - BOOK
A1  - Bauckmann, Jana
A1  - Leser, Ulf
A1  - Naumann, Felix
T1  - Efficient and exact computation of inclusion dependencies for data integration
N2  - Data obtained from foreign data sources often come with only superficial structural information, such as relation names and attribute names. Other types of metadata that are important for effective integration and meaningful querying of such data sets are missing. In particular, relationships among attributes, such as foreign keys, are crucial metadata for understanding the structure of an unknown database. The discovery of such relationships is difficult, because in principle for each pair of attributes in the database each pair of data values must be compared. A precondition for a foreign key is an inclusion dependency (IND) between the key and the foreign key attributes. We present with Spider an algorithm that efficiently finds all INDs in a given relational database. It leverages the sorting facilities of DBMS but performs the actual comparisons outside of the database to save computation. Spider analyzes very large databases up to an order of magnitude faster than previous approaches. We also evaluate in detail the effectiveness of several heuristics to reduce the number of necessary comparisons. Furthermore, we generalize Spider to find composite INDs covering multiple attributes, and partial INDs, which are true INDs for all but a certain number of values. This last type is particularly relevant when integrating dirty data as is often the case in the life sciences domain - our driving motivation.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 34 
KW  - Metadatenentdeckung
KW  - Metadatenqualität
KW  - Schemaentdeckung
KW  - Datenanalyse
KW  - Datenintegration
KW  - metadata discovery
KW  - metadata quality
KW  - schema discovery
KW  - data profiling
KW  - data integration
Y1  - 2010
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41396
SN  - 978-3-86956-048-9
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Bauckmann, Jana
A1  - Abedjan, Ziawasch
A1  - Leser, Ulf
A1  - Müller, Heiko
A1  - Naumann, Felix
T1  - Covering or complete? : Discovering conditional inclusion dependencies
N2  - Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database. In the last years conditional dependencies have been introduced to analyze and improve data quality. In short, a conditional dependency is a dependency with a limited scope defined by conditions over one or more attributes. Only the matching part of the instance must adhere to the dependency. In this paper we focus on conditional inclusion dependencies (CINDs). We generalize the definition of CINDs, distinguishing covering and completeness conditions. We present a new use case for such CINDs showing their value for solving complex data quality tasks. Further, we define quality measures for conditions inspired by precision and recall. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. Our algorithms choose not only the condition values but also the condition attributes automatically. Finally, we show that our approach efficiently provides meaningful and helpful results for our use case.
N2  - Datenabhängigkeiten (wie zum Beispiel Integritätsbedingungen), werden verwendet, um die Qualität eines Datenbankschemas zu erhöhen, um Anfragen zu optimieren und um Konsistenz in einer Datenbank sicherzustellen. In den letzten Jahren wurden bedingte Abhängigkeiten (conditional dependencies) vorgestellt, die die Qualität von Daten analysieren und verbessern sollen. Eine bedingte Abhängigkeit ist eine Abhängigkeit mit begrenztem Gültigkeitsbereich, der über Bedingungen auf einem oder mehreren Attributen definiert wird. In diesem Bericht betrachten wir bedingte Inklusionsabhängigkeiten (conditional inclusion dependencies; CINDs). Wir generalisieren die Definition von CINDs anhand der Unterscheidung von überdeckenden (covering) und vollständigen (completeness) Bedingungen. Wir stellen einen Anwendungsfall für solche CINDs vor, der den Nutzen von CINDs bei der Lösung komplexer Datenqualitätsprobleme aufzeigt. Darüber hinaus definieren wir Qualitätsmaße für Bedingungen basierend auf Sensitivität und Genauigkeit. Wir stellen effiziente Algorithmen vor, die überdeckende und vollständige Bedingungen innerhalb vorgegebener Schwellwerte finden. Unsere Algorithmen wählen nicht nur die Werte der Bedingungen, sondern finden auch die Bedingungsattribute automatisch. Abschließend zeigen wir, dass unser Ansatz effizient sinnvolle und hilfreiche Ergebnisse für den vorgestellten Anwendungsfall liefert.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 62 
KW  - Datenabhängigkeiten
KW  - Bedingte Inklusionsabhängigkeiten
KW  - Erkennen von Meta-Daten
KW  - Linked Open Data
KW  - Link-Entdeckung
KW  - Assoziationsregeln
KW  - Data Dependency
KW  - Conditional Inclusion Dependency
KW  - Metadata Discovery
KW  - Linked Open Data
KW  - Link Discovery
KW  - Association Rule Mining
Y1  - 2012
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-62089
SN  - 978-3-86956-212-4
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Bauckmann, Jana
T1  - Dependency discovery for data integration
T1  - Erkennen von Datenabhängigkeiten zur Datenintegration
N2  - Data integration aims to combine data of different sources and to provide users with a unified view on these data. This task is as challenging as valuable. In this thesis we propose algorithms for dependency discovery to provide necessary information for data integration. We focus on inclusion dependencies (INDs) in general and a special form named conditional inclusion dependencies (CINDs): (i) INDs enable the discovery of structure in a given schema. (ii) INDs and CINDs support the discovery of cross-references or links between schemas. An IND “A in B” simply states that all values of attribute A are included in the set of values of attribute B. We propose an algorithm that discovers all inclusion dependencies in a relational data source. The challenge of this task is the complexity of testing all attribute pairs and further of comparing all of each attribute pair's values. The complexity of existing approaches depends on the number of attribute pairs, while ours depends only on the number of attributes. Thus, our algorithm enables to profile entirely unknown data sources with large schemas by discovering all INDs. Further, we provide an approach to extract foreign keys from the identified INDs. We extend our IND discovery algorithm to also find three special types of INDs: (i) Composite INDs, such as “AB in CD”, (ii) approximate INDs that allow a certain amount of values of A to be not included in B, and (iii) prefix and suffix INDs that represent special cross-references between schemas. Conditional inclusion dependencies are inclusion dependencies with a limited scope defined by conditions over several attributes. Only the matching part of the instance must adhere the dependency. We generalize the definition of CINDs distinguishing covering and completeness conditions and define quality measures for conditions. We propose efficient algorithms that identify covering and completeness conditions conforming to given quality thresholds. The challenge for this task is twofold: (i) Which (and how many) attributes should be used for the conditions? (ii) Which attribute values should be chosen for the conditions? Previous approaches rely on pre-selected condition attributes or can only discover conditions applying to quality thresholds of 100%. Our approaches were motivated by two application domains: data integration in the life sciences and link discovery for linked open data. We show the efficiency and the benefits of our approaches for use cases in these domains.
N2  - Datenintegration hat das Ziel, Daten aus unterschiedlichen Quellen zu kombinieren und Nutzern eine einheitliche Sicht auf diese Daten zur Verfügung zu stellen. Diese Aufgabe ist gleichermaßen anspruchsvoll wie wertvoll. In dieser Dissertation werden Algorithmen zum Erkennen von Datenabhängigkeiten vorgestellt, die notwendige Informationen zur Datenintegration liefern. Der Schwerpunkt dieser Arbeit liegt auf Inklusionsabhängigkeiten (inclusion dependency, IND) im Allgemeinen und auf der speziellen Form der Bedingten Inklusionsabhängigkeiten (conditional inclusion dependency, CIND): (i) INDs ermöglichen das Finden von Strukturen in einem gegebenen Schema. (ii) INDs und CINDs unterstützen das Finden von Referenzen zwischen Datenquellen. Eine IND „A in B“ besagt, dass alle Werte des Attributs A in der Menge der Werte des Attributs B enthalten sind. Diese Arbeit liefert einen Algorithmus, der alle INDs in einer relationalen Datenquelle erkennt. Die Herausforderung dieser Aufgabe liegt in der Komplexität alle Attributpaare zu testen und dabei alle Werte dieser Attributpaare zu vergleichen. Die Komplexität bestehender Ansätze ist abhängig von der Anzahl der Attributpaare während der hier vorgestellte Ansatz lediglich von der Anzahl der Attribute abhängt. Damit ermöglicht der vorgestellte Algorithmus unbekannte Datenquellen mit großen Schemata zu untersuchen. Darüber hinaus wird der Algorithmus erweitert, um drei spezielle Formen von INDs zu finden, und ein Ansatz vorgestellt, der Fremdschlüssel aus den erkannten INDs filtert. Bedingte Inklusionsabhängigkeiten (CINDs) sind Inklusionsabhängigkeiten deren Geltungsbereich durch Bedingungen über bestimmten Attributen beschränkt ist. Nur der zutreffende Teil der Instanz muss der Inklusionsabhängigkeit genügen. Die Definition für CINDs wird in der vorliegenden Arbeit generalisiert durch die Unterscheidung von überdeckenden und vollständigen Bedingungen. Ferner werden Qualitätsmaße für Bedingungen definiert. Es werden effiziente Algorithmen vorgestellt, die überdeckende und vollständige Bedingungen mit gegebenen Qualitätsmaßen auffinden. Dabei erfolgt die Auswahl der verwendeten Attribute und Attributkombinationen sowie der Attributwerte automatisch. Bestehende Ansätze beruhen auf einer Vorauswahl von Attributen für die Bedingungen oder erkennen nur Bedingungen mit Schwellwerten von 100% für die Qualitätsmaße. Die Ansätze der vorliegenden Arbeit wurden durch zwei Anwendungsbereiche motiviert: Datenintegration in den Life Sciences und das Erkennen von Links in Linked Open Data. Die Effizienz und der Nutzen der vorgestellten Ansätze werden anhand von Anwendungsfällen in diesen Bereichen aufgezeigt.
KW  - Datenabhängigkeiten-Entdeckung
KW  - Datenintegration
KW  - Schema-Entdeckung
KW  - Link-Entdeckung
KW  - Inklusionsabhängigkeit
KW  - dependency discovery
KW  - data integration
KW  - schema discovery
KW  - link discovery
KW  - inclusion dependency
Y1  - 2013
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-66645
ER  - 
TY  - BOOK
A1  - Bartz, Christian
A1  - Krestel, Ralf
T1  - Deep learning for computer vision in the art domain
BT  - proceedings of the master seminar on practical introduction to deep learning for computer vision, HPI WS 20/21
N2  - In recent years, computer vision algorithms based on machine learning have seen rapid development. In the past, research mostly focused on solving computer vision problems such as image classification or object detection on images displaying natural scenes. Nowadays other fields such as the field of cultural heritage, where an abundance of data is available, also get into the focus of research. In the line of current research endeavours, we collaborated with the Getty Research Institute which provided us with a challenging dataset, containing images of paintings and drawings. In this technical report, we present the results of the seminar "Deep Learning for Computer Vision". In this seminar, students of the Hasso Plattner Institute evaluated state-of-the-art approaches for image classification, object detection and image recognition on the dataset of the Getty Research Institute. The main challenge when applying modern computer vision methods to the available data is the availability of annotated training data, as the dataset provided by the Getty Research Institute does not contain a sufficient amount of annotated samples for the training of deep neural networks. However, throughout the report we show that it is possible to achieve satisfying to very good results, when using further publicly available datasets, such as the WikiArt dataset, for the training of machine learning models.
N2  - Methoden zur Anwendung von maschinellem Lernen für das maschinelle Sehen haben sich in den letzten Jahren stark weiterentwickelt. Dabei konzentrierte sich die Forschung hauptsächlich auf die Lösung von Problemen im Bereich der Bildklassifizierung, oder der Objekterkennung aus Bildern mit natürlichen Motiven. Mehr und mehr kommen zusätzlich auch andere Inhaltsbereiche, vor allem aus dem kulturellen Umfeld in den Fokus der Forschung. Kulturforschungsinstitute, wie das Getty Research Institute, besitzen eine Vielzahl von digitalisierten Dokumenten, die bisher noch nicht analysiert wurden. Im Rahmen einer Zusammenarbeit, überließ das Getty Research Institute uns einen Datensatz, bestehend aus Photos von Kunstwerken. In diesem technischen Bericht präsentieren wir die Ergebnisse des Masterseminars "Deep Learning for Computer Vision", in dem Studierende des Hasso-Plattner-Instituts den Stand der Kunst, bei der Anwendung von Bildklassifizierungs, Objekterkennungs und Image Retrieval Algorithmen evaluierten. Eine besondere Schwierigkeit war, dass es nicht möglich ist bestehende Verfahren direkt auf dem Datensatz anzuwenden, da keine, bzw. kaum Annotationen für das Training von Machine Learning Modellen verfügbar sind. In den einzelnen Teilen des Berichts zeigen wir jedoch, dass es möglich ist unter Zuhilfenahme von weiteren öffentlich verfügbaren Datensätzen, wie dem WikiArt Datensatz, zufriedenstellende bis sehr gute Ergebnisse für die einzelnen Analyseaufgaben zu erreichen.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 139 
KW  - computer vision
KW  - cultural heritage
KW  - art analysis
KW  - maschinelles Sehen
KW  - kulturelles Erbe
KW  - Kunstanalyse
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-512906
SN  - 978-3-86956-514-9
SN  - 1613-5652
SN  - 2191-1665
IS  - 139
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Bartz, Christian
T1  - Reducing the annotation burden: deep learning for optical character recognition using less manual annotations
N2  - Text is a ubiquitous entity in our world and daily life. We encounter it nearly everywhere in shops, on the street, or in our flats. Nowadays, more and more text is contained in digital images. These images are either taken using cameras, e.g., smartphone cameras, or taken using scanning devices such as document scanners. The sheer amount of available data, e.g., millions of images taken by Google Streetview, prohibits manual analysis and metadata extraction. Although much progress was made in the area of optical character recognition (OCR) for printed text in documents, broad areas of OCR are still not fully explored and hold many research challenges. With the mainstream usage of machine learning and especially deep learning, one of the most pressing problems is the availability and acquisition of annotated ground truth for the training of machine learning models because obtaining annotated training data using manual annotation mechanisms is time-consuming and costly. In this thesis, we address of how we can reduce the costs of acquiring ground truth annotations for the application of state-of-the-art machine learning methods to optical character recognition pipelines. To this end, we investigate how we can reduce the annotation cost by using only a fraction of the typically required ground truth annotations, e.g., for scene text recognition systems. We also investigate how we can use synthetic data to reduce the need of manual annotation work, e.g., in the area of document analysis for archival material. In the area of scene text recognition, we have developed a novel end-to-end scene text recognition system that can be trained using inexact supervision and shows competitive/state-of-the-art performance on standard benchmark datasets for scene text recognition. Our method consists of two independent neural networks, combined using spatial transformer networks. Both networks learn together to perform text localization and text recognition at the same time while only using annotations for the recognition task. We apply our model to end-to-end scene text recognition (meaning localization and recognition of words) and pure scene text recognition without any changes in the network architecture.

In the second part of this thesis, we introduce novel approaches for using and generating synthetic data to analyze handwriting in archival data. First, we propose a novel preprocessing method to determine whether a given document page contains any handwriting. We propose a novel data synthesis strategy to train a classification model and show that our data synthesis strategy is viable by evaluating the trained model on real images from an archive. Second, we introduce the new analysis task of handwriting classification. Handwriting classification entails classifying a given handwritten word image into classes such as date, word, or number. Such an analysis step allows us to select the best fitting recognition model for subsequent text recognition; it also allows us to reason about the semantic content of a given document page without the need for fine-grained text recognition and further analysis steps, such as Named Entity Recognition. We show that our proposed approaches work well when trained on synthetic data. Further, we propose a flexible metric learning approach to allow zero-shot classification of classes unseen during the network’s training. Last, we propose a novel data synthesis algorithm to train off-the-shelf pixel-wise semantic segmentation networks for documents. Our data synthesis pipeline is based on the famous Style-GAN architecture and can synthesize realistic document images with their corresponding segmentation annotation without the need for any annotated data!
N2  - Text umgibt uns überall. Wir finden Text in allen Lebenslagen, z.B. in einem Geschäft, an Gebäuden, oder in unserer Wohnung. Viele dieser Textentitäten können heutzutage auch in digitalen Bildern gefunden werden, welche auf verschiedene Art und Weise erstellt werden können, z.B. mittels einer Kamera in einem Smartphone oder durch einen Dokumentenscanner. Die Anzahl verfügbarer digitaler Bilder, z.B. Millionen – wenn nicht Milliarden von Bildern – in Google Streetview, macht eine manuelle Analyse der Bilddaten unmöglich. Obwohl es im Gebiet der Optical Character Recognition (OCR) in den letzten Jahren viel Fortschritt gab, gibt es doch noch viele Bereiche, die noch nicht vollständig erforscht worden sind. Der immer zunehmende Einsatz von Methoden des maschinellen Lernens, insbesondere der Einsatz von Deep Learning Technologien, im Bereich der OCR, führt zu dem großen Problem der Verfügbarkeit von annotierten Trainingsdaten. Die Beschaffung annotierter Daten mittels manueller Annotation ist zeitintensiv und sehr teuer. In dieser Arbeit zeigen wir neue Wege und Verfahren auf, wie das Problem der Beschaffung annotierter Daten für die Anwendung von modernsten Deep Learning Verfahren im Bereich der OCR gelöst werden könnte. Hierbei zeigen wir neue Verfahren in zwei Unterbereichen der OCR. Einerseits untersuchen wir, wie wir die Annotationskosten reduzieren könnten, indem wir inexakte Annotationen benutzen um z.B. die Kosten der Annotation von echten Daten im Bereich der Texterkennung aus natürlichen Bildern zu reduzieren. Dieses System wird mittels weak supervision trainiert und erreicht Ergebnisse, die auf dem Stand der Technik bzw. darüber liegen. Unsere Methode basiert auf zwei unabhängigen neuronalen Netzwerken, die mittels eines Spatial Transformers verbunden werden. Beide Netzwerke werden zusammen trainiert und lernen zusammen, wie Text gefunden und gelesen werden kann. Dabei nutzen wir aber nur Annotationen und Supervision für das Lesen (recognition) des Textes, nicht für die Textfindung. Wir zeigen weiterhin, dass unser System für eine Mehrzahl von Aufgaben im Bereich der Texterkennung aus natürlichen Bildern genutzt werden kann, ohne Veränderungen im Netzwerk vornehmen zu müssen. Andererseits untersuchen wir, wie wir Verfahren zur Erstellung von synthetischen Daten benutzen können, um die Kosten und den Aufwand der manuellen Annotation zu verringern und zeigen Ergebnisse aus dem Bereich der Analyse von Handschrift in historischen Archivdokumenten. Zuerst präsentieren wir ein System zur Erkennung, ob ein Bild überhaupt Handschrift enthält. Hier schlagen wir eine neue Datengenerierungsmethode vor. Die generierten Daten werden zum Training eines Klassifizierungsmodells genutzt. Unsere experimentellen Ergebnisse belegen, dass unsere Idee auch auf echten Daten aus einem Archiv eingesetzt werden kann.

Als Zweites führen wir einen neuen Schritt in einer Dokumentenanalyseplattform ein: Handschriftklassifizierung. Hier ordnen wir Bilder einzelner handgeschriebener Wörter anhand ihrer visuellen Struktur in Klassen, wie Zahlen, Datumsangaben oder Wörter ein. Die Einführung dieses Analyseschrittes erlaubt es uns den besten Algorithmus für den nächsten Schritt, die eigentliche Handschrifterkennung, zu finden. Der Analyseschritt erlaubt es uns auch, bereits Aussagen über den semantischen Inhalt eines Dokumentes zu treffen, ohne weitere Analyseschritte, wie Named Entity Recognition, durchführen zu müssen. Wir zeigen, dass unser Ansatz sehr gut funktioniert, wenn er auf synthetischen Daten trainiert wird; wir zeigen weiterhin, dass unser Ansatz auch für zero-shot Klassifikation eingesetzt werden kann. Zum Schluss präsentieren wir ein neues Verfahren zur Generierung von Trainingsdaten für die pixelgenaue semantische Segmentierung in Bildern von Dokumenten. Unser Verfahren basiert auf der bekannten StyleGAN Architektur und ist in der Lage Bilder mit entsprechender Annotation automatisch zu generieren. Hierbei werden keine echten annotierten Daten benötigt und das Verfahren kann auf jeder Form von Dokumenten eingesetzt werden.
KW  - computer vision
KW  - optical character recognition
KW  - archive analysis
KW  - data synthesis
KW  - weak supervision
KW  - Archivanalyse
KW  - maschinelles Sehen
KW  - Datensynthese
KW  - Texterkennung
KW  - schwach überwachtes maschinelles Lernen
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-555407
ER  - 
TY  - JOUR
A1  - Barkowsky, Matthias
A1  - Giese, Holger
T1  - Hybrid search plan generation for generalized graph pattern matching
JF  - Journal of logical and algebraic methods in programming
N2  - In recent years, the increased interest in application areas such as social networks has resulted in a rising popularity of graph-based approaches for storing and processing large amounts of interconnected data. To extract useful information from the growing network structures, efficient querying techniques are required.
In this paper, we propose an approach for graph pattern matching that allows a uniform handling of arbitrary constraints over the query vertices. Our technique builds on a previously introduced matching algorithm, which takes concrete host graph information into account to dynamically adapt the employed search plan during query execution. The dynamic algorithm is combined with an existing static approach for search plan generation, resulting in a hybrid technique which we further extend by a more sophisticated handling of filtering effects caused by constraint checks. We evaluate the presented concepts empirically based on an implementation for our graph pattern matching tool, the Story Diagram Interpreter, with queries and data provided by the LDBC Social Network Benchmark. Our results suggest that the hybrid technique may improve search efficiency in several cases, and rarely reduces efficiency.
KW  - graph pattern matching
KW  - search plan generation
Y1  - 2020
U6  - https://doi.org/10.1016/j.jlamp.2020.100563
SN  - 2352-2208
VL  - 114
PB  - Elsevier
CY  - New York
ER  - 
TY  - BOOK
A1  - Barkowsky, Matthias
A1  - Giese, Holger
T1  - Modular and incremental global model management with extended generalized discrimination networks
T1  - Modulares und inkrementelles Globales Modellmanagement mit erweiterten Generalized Discrimination Networks
N2  - Complex projects developed under the model-driven engineering paradigm nowadays often involve several interrelated models, which are automatically processed via a multitude of model operations. Modular and incremental construction and execution of such networks of models and model operations are required to accommodate efficient development with potentially large-scale models. The underlying problem is also called Global Model Management.


In this report, we propose an approach to modular and incremental Global Model Management via an extension to the existing technique of Generalized Discrimination Networks (GDNs). In addition to further generalizing the notion of query operations employed in GDNs, we adapt the previously query-only mechanism to operations with side effects to integrate model transformation and model synchronization. We provide incremental algorithms for the execution of the resulting extended Generalized Discrimination Networks (eGDNs), as well as a prototypical implementation for a number of example eGDN operations.


Based on this prototypical implementation, we experiment with an application scenario from the software development domain to empirically evaluate our approach with respect to scalability and conceptually demonstrate its applicability in a typical scenario. Initial results confirm that the presented approach can indeed be employed to realize efficient Global Model Management in the considered scenario.
N2  - Komplexe Projekte, die unter dem Paradigma der modellgetriebenen Entwicklung entwickelt werden, nutzen heutzutage oft mehrere miteinander in Beziehung stehende Modelle, die durch eine Vielzahl von Modelloperationen automatiscsh verarbeitet werden. Die modulare und inkrementelle Konstruktion und Ausführung solcher Netzwerke von Modelloperationen ist eine Voraussetzung für effiziente Entwicklung mit potenziell sehr großen Modellen. Das zugrunde liegende Forschungsproblem heißt auch Globales Modellmanagement.

In diesem Bericht schlagen wir einen Ansatz für modulares und inkrementelles Globales Modellmanagement vor, der auf einer Erweiterung der existierenden Technik der Generalized Discrimination Networks (GDNs) basiert. Neben einer weiteren Verallgemeinerung des Konzepts der Anfrageoperationen in GDNs erweitern wir den zuvor rein lesenden Mechanismus auf Operationen mit Seiteneffekten, um Modelltransformationen und Modellsynchronisationen zu integrieren. Wir präsentieren inkrementelle Algorithmen für die Ausführung der resultierenden erweiterten GDNs (eGDNs) sowie eine prototypische Implementierung von Beispieloperationen für eGDNs.

Mithilfe dieser prototypischen Implementierung evaluieren wir unsere Lösung hinsichtlich ihrer Skalierbarkeit in einem Anwendungsszenario aus dem Bereich der Softwareentwicklung. Außerdem demonstrieren wir die Anwendbarkeit der entwickelten Technik konzeptionell anhand eines typischen Anwendugsszenario. Unsere ersten Ergebnisse bestätigen, dass die Lösung genutzt werden kann, um effizientes Globales Modellmanagement im betrachteten Szenario zu realisieren.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 154 
KW  - global model management
KW  - generalized discrimination networks
KW  - globales Modellmanagement
KW  - Generalized Discrimination Networks
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-573965
SN  - 978-3-86956-555-2
SN  - 1613-5652
SN  - 2191-1665
IS  - 154
SP  - 63
EP  - 63
ER  - 
TY  - BOOK
A1  - Barkowsky, Matthias
A1  - Giese, Holger
T1  - Triple graph grammars for multi-version models
N2  - Like conventional software projects, projects in model-driven software engineering require adequate management of multiple versions of development artifacts, importantly allowing living with temporary inconsistencies. In the case of model-driven software engineering, employed versioning approaches also have to handle situations where different artifacts, that is, different models, are linked via automatic model transformations.

In this report, we propose a technique for jointly handling the transformation of multiple versions of a source model into corresponding versions of a target model, which enables the use of a more compact representation that may afford improved execution time of both the transformation and further analysis operations. Our approach is based on the well-known formalism of triple graph grammars and a previously introduced encoding of model version histories called multi-version models. In addition to showing the correctness of our approach with respect to the standard semantics of triple graph grammars, we conduct an empirical evaluation that demonstrates the potential benefit regarding execution time performance.
N2  - Ähnlich zu konventionellen Softwareprojekten erfordern Projekte im Bereich der modellgetriebenen Softwareentwicklung eine adäquate Verwaltung mehrerer Versionen von Entwicklungsartefakten. Eine solche Versionsverwaltung muss es insbesondere ermöglichen, zeitweise mit Inkonsistenzen zu leben. Im Fall der modellgetriebenen Softwareentwicklung muss ein verwendeter Ansatz zusätzlich mit Situationen umgehen können, in denen verschiedene Entwicklungsartefakte, das heißt verschiedene Modelle, durch automatische Modelltransformationen verknüpft sind.

In diesem Bericht schlagen wir eine Technik für die integrierte Transformation mehrerer Versionen eines Quellmodells in entsprechende Versionen eines Zielmodells vor. Dies ermöglicht die Verwendung einer kompakteren Repräsentation der Modelle, was zu verbesserten Laufzeiteigenschaften der Transformation und weiterführender Operationen führen kann. Unser Ansatz basiert auf dem bekannten Formalismus der Tripel-Graph-Grammatiken und einer in früheren Arbeiten eingeführten Kodierung von Versionshistorien von Modellen. Neben einem Beweis der Korrektheit des Ansatzes in Bezug auf die standardmäßige Semantik von Tripel-Graph-Grammatiken führen wir eine empirische Evaluierung durch, die den potenziellen Performancevorteil der Technik demonstriert.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 155 
KW  - triple graph grammars
KW  - multi-version models
KW  - Tripel-Graph-Grammatiken
KW  - Modelle mit mehreren Versionen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-573994
SN  - 978-3-86956-556-9
SN  - 1613-5652
SN  - 2191-1665
IS  - 155
SP  - 28
EP  - 28
ER  - 
TY  - THES
A1  - Bano, Dorina
T1  - Discovering data models from event logs
T1  - Entdecken von Datenmodellen aus Ereignisprotokollen
N2  - In the last two decades, process mining has developed from a niche
discipline to a significant research area with considerable impact on academia and industry. Process mining enables organisations to identify the running business processes from historical execution data. The first requirement of any process mining technique is an event log, an artifact that represents concrete business process executions in the form of sequence of events. These logs can be extracted from the organization's information systems and are used by process experts to retrieve deep insights from the organization's running processes. Considering the events pertaining to such logs, the process models can be automatically discovered and enhanced or annotated with performance-related information. Besides behavioral information, event logs contain domain specific data, albeit implicitly. However, such data are usually overlooked and, thus, not utilized to their full potential.

Within the process mining area, we address in this thesis the research gap of discovering, from event logs, the contextual information that cannot be captured by applying existing process mining techniques. Within this research gap, we identify four key problems and tackle them by looking at an event log from different angles. First, we address the problem of deriving an event log in the absence of a proper database access and domain knowledge. The second problem is related to the under-utilization of the implicit domain knowledge present in an event log that can increase the understandability of the discovered process model. Next, there is a lack of a holistic representation of the historical data manipulation at the process model level of abstraction. Last but not least, each process model presumes to be independent of other process models when discovered from an event log, thus, ignoring possible data dependencies between processes within an organization. 

For each of the problems mentioned above, this thesis proposes a dedicated method. The first method provides a solution to extract an event log only from the transactions performed on the database that are stored in the form of redo logs. The second method deals with discovering the underlying data model that is implicitly embedded in the event log, thus, complementing the discovered process model with important domain knowledge information. The third method captures, on the process model level, how the data affects the running process instances. Lastly, the fourth method is about the discovery of the relations between business processes (i.e., how they exchange data) from a set of event logs and explicitly representing such complex interdependencies in a business process architecture.

All the methods introduced in this thesis are implemented as a prototype and their feasibility is proven by being applied on real-life event logs.
N2  - In den letzten zwei Jahrzehnten hat sich Process Mining von einer Nischendisziplin zu einem bedeutenden Forschungsgebiet mit erheblichen Auswirkungen auf Wissenschaft und Industrie entwickelt. Process Mining ermöglicht es Unternehmen, die laufenden Geschäftsprozesse anhand historischer Ausführungsdaten zu identifizieren. Die erste Voraussetzung für jede Process-Mining-Technik ist ein Ereignisprotokoll (Event Log), ein Artefakt, das konkrete Geschäftsprozessausführungen in Form einer Abfolge von Ereignissen darstellt. Diese Protokolle (Logs) können aus den Informationssystemen der Unternehmen extrahiert werden und ermöglichen es Prozessexperten, tiefe Einblicke in die laufenden Unternehmensprozesse zu gewinnen. Unter Berücksichtigung der Abfolge der Ereignisse in diesen Protokollen (Logs) können Prozessmodelle automatisch entdeckt und mit leistungsbezogenen Informationen erweitert werden. Neben verhaltensbezogenen Informationen enthalten Ereignisprotokolle (Event Logs) auch domänenspezifische Daten, wenn auch nur implizit. Solche Daten werden jedoch in der Regel nicht in vollem Umfang genutzt. Diese Arbeit befasst sich
im Bereich Process Mining mit der Forschungslücke der Extraktion von Kontextinformationen aus Ereignisprotokollen (Event Logs), die von bestehenden Process Mining-Techniken nicht erfasst werden.

Innerhalb dieser Forschungslücke identifizieren wir vier Schlüsselprobleme, bei denen wir die Ereignisprotokolle (Event Logs) aus verschiedenen Perspektiven betrachten. Zunächst befassen wir uns mit dem Problem der Erfassung eines Ereignisprotokolls (Event Logs) ohne hinreichenden Datenbankzugang. Das zweite Problem ist die unzureichende Nutzung des in Ereignisprotokollen (Event Logs) enthaltenen Domänenwissens, das zum besseren Verständnis der generierten Prozessmodelle beitragen kann. Außerdem mangelt es an einer ganzheitlichen Darstellung der historischen Datenmanipulation auf Prozessmodellebene. Nicht zuletzt werden Prozessmodelle häufig unabhängig
von anderen Prozessmodellen betrachtet, wenn sie aus Ereignisprotokollen (Event Logs) ermittelt wurden. Dadurch können mögliche Datenabhängigkeiten zwischen Prozessen innerhalb einer Organisation übersehen werden.

Für jedes der oben genannten Probleme schlägt diese Arbeit eine eigene Methode vor. Die erste Methode ermöglicht es, ein Ereignisprotokoll (Event Log) ausschließlich anhand der Historie der auf einer Datenbank durchgeführten Transaktionen zu extrahieren, die in Form von Redo-Logs gespeichert ist. Die zweite Methode befasst sich mit der Entdeckung des 

zugrundeliegenden Datenmodells, das implizit in dem jeweiligen Ereignisprotokoll (Event Log) eingebettet ist, und ergänzt so mit das entdeckte Prozessmodell mit wichtigen, domänenspezifischen Informationen. Bei der dritten Methode wird auf der Ebene des Prozess-
modells erfasst, wie sich die Daten auf die laufenden Prozessinstanzen auswirken. Die vierte Methode befasst sich schließlich mit der Entdeckung der Beziehungen zwischen Geschäftsprozessen (d.h. deren Datenaustausch) auf Basis der jeweiligen Ereignisprotokolle (Event Logs), sowie mit der expliziten Darstellung solcher komplexen Abhängigkeiten in einer Geschäftsprozessarchitektur.

 

Alle in dieser Arbeit vorgestellten Methoden sind als Prototyp implementiert und ihre Anwendbarkeit wird anhand ihrer Anwendung auf reale Ereignisprotokolle (Event Logs) nachgewiesen.
KW  - process mining
KW  - data models
KW  - business process architectures
KW  - Datenmodelle
KW  - Geschäftsprozessarchitekturen
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-585427
ER  -