TY - BOOK A1 - Smirnov, Sergey A1 - Reijers, Hajo A. A1 - Nugteren, Thijs A1 - Weske, Mathias T1 - Business process model abstraction : theory and practice N2 - Business process management aims at capturing, understanding, and improving work in organizations. The central artifacts are process models, which serve different purposes. Detailed process models are used to analyze concrete working procedures, while high-level models show, for instance, handovers between departments. To provide different views on process models, business process model abstraction has emerged. While several approaches have been proposed, a number of abstraction use case that are both relevant for industry and scientifically challenging are yet to be addressed. In this paper we systematically develop, classify, and consolidate different use cases for business process model abstraction. The reported work is based on a study with BPM users in the health insurance sector and validated with a BPM consultancy company and a large BPM vendor. The identified fifteen abstraction use cases reflect the industry demand. The related work on business process model abstraction is evaluated against the use cases, which leads to a research agenda. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 35 Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41782 SN - 978-3-86956-054-0 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - THES A1 - Brauer, Falk T1 - Extraktion und Identifikation von Entitäten in Textdaten im Umfeld der Enterprise Search T1 - Extraction and identification of entities in text data in the field of enterprise search N2 - Die automatische Informationsextraktion (IE) aus unstrukturierten Texten ermöglicht völlig neue Wege, auf relevante Informationen zuzugreifen und deren Inhalte zu analysieren, die weit über bisherige Verfahren zur Stichwort-basierten Dokumentsuche hinausgehen. Die Entwicklung von Programmen zur Extraktion von maschinenlesbaren Daten aus Texten erfordert jedoch nach wie vor die Entwicklung von domänenspezifischen Extraktionsprogrammen. Insbesondere im Bereich der Enterprise Search (der Informationssuche im Unternehmensumfeld), in dem eine große Menge von heterogenen Dokumenttypen existiert, ist es oft notwendig ad-hoc Programm-module zur Extraktion von geschäftsrelevanten Entitäten zu entwickeln, die mit generischen Modulen in monolithischen IE-Systemen kombiniert werden. Dieser Umstand ist insbesondere kritisch, da potentiell für jeden einzelnen Anwendungsfall ein von Grund auf neues IE-System entwickelt werden muss. Die vorliegende Dissertation untersucht die effiziente Entwicklung und Ausführung von IE-Systemen im Kontext der Enterprise Search und effektive Methoden zur Ausnutzung bekannter strukturierter Daten im Unternehmenskontext für die Extraktion und Identifikation von geschäftsrelevanten Entitäten in Doku-menten. Grundlage der Arbeit ist eine neuartige Plattform zur Komposition von IE-Systemen auf Basis der Beschreibung des Datenflusses zwischen generischen und anwendungsspezifischen IE-Modulen. Die Plattform unterstützt insbesondere die Entwicklung und Wiederverwendung von generischen IE-Modulen und zeichnet sich durch eine höhere Flexibilität und Ausdrucksmächtigkeit im Vergleich zu vorherigen Methoden aus. Ein in der Dissertation entwickeltes Verfahren zur Dokumentverarbeitung interpretiert den Daten-austausch zwischen IE-Modulen als Datenströme und ermöglicht damit eine weitgehende Parallelisierung von einzelnen Modulen. Die autonome Ausführung der Module führt zu einer wesentlichen Beschleu-nigung der Verarbeitung von Einzeldokumenten und verbesserten Antwortzeiten, z. B. für Extraktions-dienste. Bisherige Ansätze untersuchen lediglich die Steigerung des durchschnittlichen Dokumenten-durchsatzes durch verteilte Ausführung von Instanzen eines IE-Systems. Die Informationsextraktion im Kontext der Enterprise Search unterscheidet sich z. B. von der Extraktion aus dem World Wide Web dadurch, dass in der Regel strukturierte Referenzdaten z. B. in Form von Unternehmensdatenbanken oder Terminologien zur Verfügung stehen, die oft auch die Beziehungen von Entitäten beschreiben. Entitäten im Unternehmensumfeld haben weiterhin bestimmte Charakteristiken: Eine Klasse von relevanten Entitäten folgt bestimmten Bildungsvorschriften, die nicht immer bekannt sind, auf die aber mit Hilfe von bekannten Beispielentitäten geschlossen werden kann, so dass unbekannte Entitäten extrahiert werden können. Die Bezeichner der anderen Klasse von Entitäten haben eher umschreibenden Charakter. Die korrespondierenden Umschreibungen in Texten können variieren, wodurch eine Identifikation derartiger Entitäten oft erschwert wird. Zur effizienteren Entwicklung von IE-Systemen wird in der Dissertation ein Verfahren untersucht, das alleine anhand von Beispielentitäten effektive Reguläre Ausdrücke zur Extraktion von unbekannten Entitäten erlernt und damit den manuellen Aufwand in derartigen Anwendungsfällen minimiert. Verschiedene Generalisierungs- und Spezialisierungsheuristiken erkennen Muster auf verschiedenen Abstraktionsebenen und schaffen dadurch einen Ausgleich zwischen Genauigkeit und Vollständigkeit bei der Extraktion. Bekannte Regellernverfahren im Bereich der Informationsextraktion unterstützen die beschriebenen Problemstellungen nicht, sondern benötigen einen (annotierten) Dokumentenkorpus. Eine Methode zur Identifikation von Entitäten, die durch Graph-strukturierte Referenzdaten vordefiniert sind, wird als dritter Schwerpunkt untersucht. Es werden Verfahren konzipiert, welche über einen exakten Zeichenkettenvergleich zwischen Text und Referenzdatensatz hinausgehen und Teilübereinstimmungen und Beziehungen zwischen Entitäten zur Identifikation und Disambiguierung heranziehen. Das in der Arbeit vorgestellte Verfahren ist bisherigen Ansätzen hinsichtlich der Genauigkeit und Vollständigkeit bei der Identifikation überlegen. N2 - The automatic information extraction (IE) from unstructured texts enables new ways to access relevant information and analyze text contents, which goes beyond existing technologies for keyword-based search in document collections. However, the development of systems for extracting machine-readable data from text still requires the implementation of domain-specific extraction programs. In particular in the field of enterprise search (the retrieval of information in the enterprise settings), in which a large amount of heterogeneous document types exists, it is often necessary to develop ad-hoc program-modules and to combine them with generic program components to extract by business relevant entities. This is particularly critical, as potentially for each individual application a new IE system must be developed from scratch. In this work we examine efficient methods to develop and execute IE systems in the context of enterprise search and effective algorithms to exploit pre-existing structured data in the business context for the extraction and identification of business entities in documents. The basis of this work is a novel platform for composition of IE systems through the description of the data flow between generic and application-specific IE modules. The platform supports in particular the development and reuse of generic IE modules and is characterized by a higher flexibility as compared to previous methods. A technique developed in this work interprets the document processing as data stream between IE modules and thus enables an extensive parallelization of individual modules. The autonomous execution of each module allows for a significant runtime improvement for individual documents and thus improves response times, e.g. for extraction services. Previous parallelization approaches focused only on an improved throughput for large document collections, e.g., by leveraging distributed instances of an IE system. Information extraction in the context of enterprise search differs for instance from the extraction from the World Wide Web by the fact that usually a variety of structured reference data (corporate databases or terminologies) is available, which often describes the relationships among entities. Furthermore, entity names in a business environment usually follow special characteristics: On the one hand relevant entities such as product identifiers follow certain patterns that are not always known beforehand, but can be inferred using known sample entities, so that unknown entities can be extracted. On the other hand many designators have a more descriptive character (concatenation of descriptive words). The respective references in texts might differ due to the diversity of potential descriptions, often making the identification of such entities difficult. To address IE applications in the presence of available structured data, we study in this work the inference of effective regular expressions from given sample entities. Various generalization and specialization heuristics are used to identify patterns at different syntactic abstraction levels and thus generate regular expressions which promise both high recall and precision. Compared to previous rule learning techniques in the field of information extraction, our technique does not require any annotated document corpus. A method for the identification of entities that are predefined by graph structured reference data is examined as a third contribution. An algorithm is presented which goes beyond an exact string comparison between text and reference data set. It allows for an effective identification and disambiguation of potentially discovered entities by exploitation of approximate matching strategies. The method leverages further relationships among entities for identification and disambiguation. The method presented in this work is superior to previous approaches with regard to precision and recall. KW - Informationsextraktion KW - Enterprise Search KW - Parallele Datenverarbeitung KW - Grammatikalische Inferenz KW - Graph-basiertes Ranking KW - information extraction KW - enterprise search KW - multi core data processing KW - grammar inference KW - graph-based ranking Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-51409 ER - TY - CHAP A1 - Palix, Nicolas A1 - Lawall, Julia L. A1 - Thomas, Gaël A1 - Muller, Gilles T1 - How Often do Experts Make Mistakes? N2 - Large open-source software projects involve developers with a wide variety of backgrounds and expertise. Such software projects furthermore include many internal APIs that developers must understand and use properly. According to the intended purpose of these APIs, they are more or less frequently used, and used by developers with more or less expertise. In this paper, we study the impact of usage patterns and developer expertise on the rate of defects occurring in the use of internal APIs. For this preliminary study, we focus on memory management APIs in the Linux kernel, as the use of these has been shown to be highly error prone in previous work. We study defect rates and developer expertise, to consider e.g., whether widely used APIs are more defect prone because they are used by less experienced developers, or whether defects in widely used APIs are more likely to be fixed. KW - History of pattern occurrences KW - bug tracking KW - Herodotos KW - Coccinelle Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41327 ER - TY - CHAP ED - Adams, Bram ED - Haupt, Michael ED - Lohmann, Daniel T1 - Preface N2 - Aspect-oriented programming, component models, and design patterns are modern and actively evolving techniques for improving the modularization of complex software. In particular, these techniques hold great promise for the development of "systems infrastructure" software, e.g., application servers, middleware, virtual machines, compilers, operating systems, and other software that provides general services for higher-level applications. The developers of infrastructure software are faced with increasing demands from application programmers needing higher-level support for application development. Meeting these demands requires careful use of software modularization techniques, since infrastructural concerns are notoriously hard to modularize. Aspects, components, and patterns provide very different means to deal with infrastructure software, but despite their differences, they have much in common. For instance, component models try to free the developer from the need to deal directly with services like security or transactions. These are primary examples of crosscutting concerns, and modularizing such concerns are the main target of aspect-oriented languages. Similarly, design patterns like Visitor and Interceptor facilitate the clean modularization of otherwise tangled concerns. Building on the ACP4IS meetings at AOSD 2002-2009, this workshop aims to provide a highly interactive forum for researchers and developers to discuss the application of and relationships between aspects, components, and patterns within modern infrastructure software. The goal is to put aspects, components, and patterns into a common reference frame and to build connections between the software engineering and systems communities. KW - Aspektorientierte Softwareentwicklung KW - Systemsoftware KW - Middleware KW - Virtuelle Maschinen KW - Betriebssysteme KW - systems software KW - middleware KW - virtual machines KW - operating systems Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41338 ER - TY - CHAP A1 - Bynens, Maarten A1 - Van Landuyt, Dimitri A1 - Truyen, Eddy A1 - Joosen, Wouter T1 - Towards reusable aspects: the callback mismatch problem N2 - Because software development is increasingly expensive and timeconsuming, software reuse gains importance. Aspect-oriented software development modularizes crosscutting concerns which enables their systematic reuse. Literature provides a number of AOP patterns and best practices for developing reusable aspects based on compelling examples for concerns like tracing, transactions and persistence. However, such best practices are lacking for systematically reusing invasive aspects. In this paper, we present the ‘callback mismatch problem’. This problem arises in the context of abstraction mismatch, in which the aspect is required to issue a callback to the base application. As a consequence, the composition of invasive aspects is cumbersome to implement, difficult to maintain and impossible to reuse. We motivate this problem in a real-world example, show that it persists in the current state-of-the-art, and outline the need for advanced aspectual composition mechanisms to deal with this. KW - reusable aspects KW - invasive aspects KW - aspect adapter Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41347 ER - TY - CHAP A1 - Harrison, William T1 - Malleability, obliviousness and aspects for broadcast service attachment N2 - An important characteristic of Service-Oriented Architectures is that clients do not depend on the service implementation's internal assignment of methods to objects. It is perhaps the most important technical characteristic that differentiates them from more common object-oriented solutions. This characteristic makes clients and services malleable, allowing them to be rearranged at run-time as circumstances change. That improvement in malleability is impaired by requiring clients to direct service requests to particular services. Ideally, the clients are totally oblivious to the service structure, as they are to aspect structure in aspect-oriented software. Removing knowledge of a method implementation's location, whether in object or service, requires re-defining the boundary line between programming language and middleware, making clearer specification of dependence on protocols, and bringing the transaction-like concept of failure scopes into language semantics as well. This paper explores consequences and advantages of a transition from object-request brokering to service-request brokering, including the potential to improve our ability to write more parallel software. KW - service-oriented KW - aspect-oriented KW - programming language KW - middleware KW - concurrency Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41389 ER - TY - BOOK A1 - Geller, Felix A1 - Hirschfeld, Robert A1 - Bracha, Gilad T1 - Pattern Matching for an object-oriented and dynamically typed programming language N2 - Pattern matching is a well-established concept in the functional programming community. It provides the means for concisely identifying and destructuring values of interest. This enables a clean separation of data structures and respective functionality, as well as dispatching functionality based on more than a single value. Unfortunately, expressive pattern matching facilities are seldomly incorporated in present object-oriented programming languages. We present a seamless integration of pattern matching facilities in an object-oriented and dynamically typed programming language: Newspeak. We describe language extensions to improve the practicability and integrate our additions with the existing programming environment for Newspeak. This report is based on the first author’s master’s thesis. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 36 KW - Pattern Matching KW - Musterabgleich KW - Muster KW - Objekt-Orientiertes Programmieren KW - Dynamische Typ Systeme KW - Pattern Matching KW - Patterns KW - Object-Oriented Programming KW - Dynamic Type System Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-43035 SN - 978-3-86956-065-6 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK A1 - Lange, Dustin A1 - Böhm, Christoph A1 - Naumann, Felix T1 - Extracting structured information from Wikipedia articles to populate infoboxes N2 - Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes. With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values for independently extracting value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes. N2 - Ungefähr jeder dritte Wikipedia-Artikel enthält eine Infobox - eine Tabelle, die wichtige Fakten über das beschriebene Thema in Attribut-Wert-Form darstellt. Das Schema einer Infobox, d.h. die Attribute, die für ein Konzept verwendet werden können, wird durch ein Infobox-Template definiert. Häufig geben Autoren nicht für alle Template-Attribute Werte an, wodurch unvollständige Infoboxen entstehen. Mit iPopulator stellen wir ein System vor, welches automatisch Infoboxen von Wikipedia-Artikeln durch Extrahieren von Attributwerten aus dem Artikeltext befüllt. Im Unterschied zu früheren Arbeiten erkennt iPopulator die Struktur von Attributwerten und nutzt diese aus, um die einzelnen Bestandteile von Attributwerten unabhängig voneinander zu extrahieren. Wir haben iPopulator auf der gesamten Menge der Infobox-Templates getestet und analysieren detailliert die Effektivität. Wir erreichen beispielsweise für die Extraktion einen durchschnittlichen Precision-Wert von 91% für 1.727 verschiedene Infobox-Template-Attribute. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 38 KW - Informationsextraktion KW - Wikipedia KW - Linked Data KW - Information Extraction KW - Wikipedia KW - Linked Data Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-45714 SN - 978-3-86956-081-6 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK A1 - Wist, Dominic A1 - Schaefer, Mark A1 - Vogler, Walter A1 - Wollowski, Ralf T1 - STG decomposition : internal communication for SI implementability N2 - STG decomposition is a promising approach to tackle the complexity problems arising in logic synthesis of speed independent circuits, a robust asynchronous (i.e. clockless) circuit type. Unfortunately, STG decomposition can result in components that in isolation have irreducible CSC conflicts. Generalising earlier work, it is shown how to resolve such conflicts by introducing internal communication between the components via structural techniques only. N2 - STG-Dekomposition ist ein bewährter Ansatz zur Bewältigung der Komplexitätsprobleme bei der Logiksynthese von SI (speed independent) Schaltungen – ein robuster asynchroner (d.h. ohne Taktsignal arbeitender digitaler) Schaltungstyp. Allerdings können dabei Komponenten mit irreduziblen CSC-Konflikten entstehen. Durch Verallgemeinerung früherer Arbeiten wird gezeigt, wie solche Konflikte durch Einführung interner Kommunikation zwischen den Komponenten gelöst werden können, und zwar ausschließlich durch Verwendung an der Graphenstruktur ansetzender Verfahren. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 32 KW - Asynchrone Schaltung KW - Petrinetz KW - Signalflankengraph (SFG oder STG) KW - STG-Dekomposition KW - speed independent KW - CSC KW - Controller-Resynthese KW - Asynchronous circuit KW - petri net KW - signal transition graph KW - STG decomposition KW - speed independent KW - CSC KW - control resynthesis Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-40786 SN - 978-3-86956-037-3 PB - Universitätsverlag Potsdam CY - Potsdam ER - TY - BOOK ED - Adams, Bram ED - Haupt, Michael ED - Lohmann, Daniel T1 - Proceedings of the 9th Workshop on Aspects, Components, and Patterns for Infrastructure Software (ACP4IS '10) N2 - Aspect-oriented programming, component models, and design patterns are modern and actively evolving techniques for improving the modularization of complex software. In particular, these techniques hold great promise for the development of "systems infrastructure" software, e.g., application servers, middleware, virtual machines, compilers, operating systems, and other software that provides general services for higher-level applications. The developers of infrastructure software are faced with increasing demands from application programmers needing higher-level support for application development. Meeting these demands requires careful use of software modularization techniques, since infrastructural concerns are notoriously hard to modularize. Aspects, components, and patterns provide very different means to deal with infrastructure software, but despite their differences, they have much in common. For instance, component models try to free the developer from the need to deal directly with services like security or transactions. These are primary examples of crosscutting concerns, and modularizing such concerns are the main target of aspect-oriented languages. Similarly, design patterns like Visitor and Interceptor facilitate the clean modularization of otherwise tangled concerns. Building on the ACP4IS meetings at AOSD 2002-2009, this workshop aims to provide a highly interactive forum for researchers and developers to discuss the application of and relationships between aspects, components, and patterns within modern infrastructure software. The goal is to put aspects, components, and patterns into a common reference frame and to build connections between the software engineering and systems communities. T3 - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 33 KW - Aspektorientierte Softwareentwicklung KW - Systemsoftware KW - Middleware KW - Virtuelle Maschinen KW - Betriebssysteme KW - systems software KW - middleware KW - virtual machines KW - operating systems Y1 - 2010 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus-41221 SN - 978-3-86956-043-4 PB - Universitätsverlag Potsdam CY - Potsdam ER -