TY - THES A1 - Hesse, Günter T1 - A benchmark for enterprise stream processing architectures T1 - Ein Benchmark für Architekturen zur Datenstromverarbeitung im Unternehmenskontext N2 - Data stream processing systems (DSPSs) are a key enabler to integrate continuously generated data, such as sensor measurements, into enterprise applications. DSPSs allow to steadily analyze information from data streams, e.g., to monitor manufacturing processes and enable fast reactions to anomalous behavior. Moreover, DSPSs continuously filter, sample, and aggregate incoming streams of data, which reduces the data size, and thus data storage costs. The growing volumes of generated data have increased the demand for high-performance DSPSs, leading to a higher interest in these systems and to the development of new DSPSs. While having more DSPSs is favorable for users as it allows choosing the system that satisfies their requirements the most, it also introduces the challenge of identifying the most suitable DSPS regarding current needs as well as future demands. Having a solution to this challenge is important because replacements of DSPSs require the costly re-writing of applications if no abstraction layer is used for application development. However, quantifying performance differences between DSPSs is a difficult task. Existing benchmarks fail to integrate all core functionalities of DSPSs and lack tool support, which hinders objective result comparisons. Moreover, no current benchmark covers the combination of streaming data with existing structured business data, which is particularly relevant for companies. This thesis proposes a performance benchmark for enterprise stream processing called ESPBench. With enterprise stream processing, we refer to the combination of streaming and structured business data. Our benchmark design represents real-world scenarios and allows for an objective result comparison as well as scaling of data. The defined benchmark query set covers all core functionalities of DSPSs. The benchmark toolkit automates the entire benchmark process and provides important features, such as query result validation and a configurable data ingestion rate. To validate ESPBench and to ease the use of the benchmark, we propose an example implementation of the ESPBench queries leveraging the Apache Beam software development kit (SDK). The Apache Beam SDK is an abstraction layer designed for developing stream processing applications that is applied in academia as well as enterprise contexts. It allows to run the defined applications on any of the supported DSPSs. The performance impact of Apache Beam is studied in this dissertation as well. The results show that there is a significant influence that differs among DSPSs and stream processing applications. For validating ESPBench, we use the example implementation of the ESPBench queries developed using the Apache Beam SDK. We benchmark the implemented queries executed on three modern DSPSs: Apache Flink, Apache Spark Streaming, and Hazelcast Jet. The results of the study prove the functioning of ESPBench and its toolkit. ESPBench is capable of quantifying performance characteristics of DSPSs and of unveiling differences among systems. The benchmark proposed in this thesis covers all requirements to be applied in enterprise stream processing settings, and thus represents an improvement over the current state-of-the-art. N2 - Data Stream Processing Systems (DSPSs) sind eine Schlüsseltechnologie, um kontinuierlich generierte Daten, wie beispielsweise Sensormessungen, in Unternehmensanwendungen zu integrieren. Die durch DSPSs ermöglichte permanente Analyse von Datenströmen kann dabei zur Überwachung von Produktionsprozessen genutzt werden, um möglichst zeitnah auf ungewollte Veränderungen zu reagieren. Darüber hinaus filtern, sampeln und aggregieren DSPSs einkommende Daten, was die Datengröße reduziert und so auch etwaige Kosten für die Datenspeicherung. Steigende Datenvolumen haben in den letzten Jahren den Bedarf für performante DSPSs steigen lassen, was zur Entwicklung neuer DSPSs führte. Während eine große Auswahl an verfügbaren Systemen generell gut für Nutzer ist, stellt es potentielle Anwender auch vor die Herausforderung, das für aktuelle und zukünftige Anforderungen passendste DSPS zu identifizieren. Es ist wichtig, eine Lösung für diese Herausforderung zu haben, da das Austauschen von einem DSPS zu teuren Anpassungen oder Neuentwicklungen der darauf laufenden Anwendungen erfordert, falls für deren Entwicklung keine Abstraktionsschicht verwendet wurde. Das quantitative Vergleichen von DSPSs ist allerdings eine schwierige Aufgabe. Existierende Benchmarks decken nicht alle Kernfunktionalitäten von DSPSs ab und haben keinen oder unzureichenden Tool-Support, was eine objektive Ergebnisberechnung hinsichtlich der Performanz erschwert. Zudem beinhaltet kein Benchmark die Integration von Streamingdaten und strukturierten Geschäftsdaten, was ein besonders für Unternehmen relevantes Szenario ist. Diese Dissertation stellt ESPBench vor, einen neuen Benchmark für Stream Processing-Szenarien im Unternehmenskontext. Der geschäftliche Kontext wird dabei durch die Verbindung von Streamingdaten und Geschäftsdaten dargestellt. Das Design von ESPBench repräsentiert Szenarien der realen Welt, stellt die objektive Berechnung von Benchmarkergebnissen sicher und erlaubt das Skalieren über Datencharakteristiken. Das entwickelte Toolkit des Benchmarks stellt wichtige Funktionalitäten bereit, wie beispielsweise die Automatisierung den kompletten Benchmarkprozesses sowie die Überprüfung der Abfrageergebnisse hinsichtlich ihrer Korrektheit. Um ESPBench zu validieren und die Anwendung weiter zu vereinfachen, haben wir eine Beispielimplementierung der Queries veröffentlicht. Die Implementierung haben wir mithilfe des in Industrie und Wissenschaft eingesetzten Softwareentwicklungsbaukastens Apache Beam durchgeführt, der es ermöglicht, entwickelte Anwendungen auf allen unterstützten DSPSs auszuführen. Den Einfluss auf die Performanz des Verwendens von Apache Beam wird dabei ebenfalls in dieser Arbeit untersucht. Weiterhin nutzen wir die veröffentlichte Beispielimplementierung der Queries um drei moderne DSPSs mit ESPBench zu untersuchen: Apache Flink, Apache Spark Streaming und Hazelcast Jet. Der Ergebnisse der Studie verdeutlichen die Funktionsfähigkeit von ESPBench und dessen Toolkit. ESPBench befähigt Performanzcharakteristiken von DSPSs zu quantifizieren und Unterschiede zwischen Systemen aufzuzeigen. Der in dieser Dissertation vorgestellte Benchmark erfüllt alle Anforderungen, um in Stream Processing-Szenarien im Unternehmenskontext eingesetzt zu werden und stellt somit eine Verbesserung der aktuellen Situation dar. KW - stream processing KW - performance KW - benchmarking KW - dsps KW - espbench KW - benchmark KW - Performanz KW - Datenstromverarbeitung KW - Benchmark Y1 - 2022 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-566000 ER - TY - JOUR A1 - Röhrig, Bernd A1 - Salzwedel, Annett A1 - Linck-Eleftheriadis, Sigrid A1 - Völler, Heinz A1 - Nosper, Manfred T1 - Outcome Based Center Comparisons in Inpatient Cardiac Rehabilitation Results from the EVA-Reha (R) Cardiology Project JF - Die Rehabilitation : Zeitschrift für Praxis und Forschung in der Rehabilitation N2 - Background: So far, for center comparisons in inpatient cardiac rehabilitation (CR), the objective outcome quality was neglected because of challenges in quantifying the overall success of CR. In this article, a multifactorial benchmark model measuring the individual rehabilitation success is presented. Methods: In 21 rehabilitation centers, 5 123 patients were consecutively enrolled between 01/2010 and 12/2012 in the prospective multicenter registry EVA-Reha (R) Cardiology. Changes in 13 indicators in the areas cardiovascular risk factors, physical performance and subjective health during rehabilitation were evaluated according to levels of severity. Changes were only rated for patients who needed a medical intervention. Additionally, the changes had to be clinically relevant. Therefore Minimal Important Differences (MID) were predefined. Ratings were combined to a single score, the multiple outcome criterion (MEK). Results: The MEK was determined for all patients (71.7 +/- 7.4 years, 76.9 % men) and consisted of an average of 5.6 indicators. After risk adjustment for sociodemographic and clinical baseline parameters, MEK was used for center ranking. In addition, individual results of indicators were compared with means of all study sites. Conclusion: With the method presented here, the outcome quality can be quantified and outcome-based comparisons of providers can be made. KW - outcome quality KW - quality assurance KW - cardiac rehabilitation KW - benchmark KW - profiling Y1 - 2015 U6 - https://doi.org/10.1055/s-0034-1395556 SN - 0034-3536 SN - 1439-1309 VL - 54 IS - 1 SP - 45 EP - 52 PB - Thieme CY - Stuttgart ER - TY - JOUR A1 - Waitelonis, Jörg A1 - Jürges, Henrik A1 - Sack, Harald T1 - Remixing entity linking evaluation datasets for focused benchmarking JF - Semantic Web N2 - In recent years, named entity linking (NEL) tools were primarily developed in terms of a general approach, whereas today numerous tools are focusing on specific domains such as e.g. the mapping of persons and organizations only, or the annotation of locations or events in microposts. However, the available benchmark datasets necessary for the evaluation of NEL tools do not reflect this focalizing trend. We have analyzed the evaluation process applied in the NEL benchmarking framework GERBIL [in: Proceedings of the 24th International Conference on World Wide Web (WWW’15), International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2015, pp. 1133–1143, Semantic Web 9(5) (2018), 605–625] and all its benchmark datasets. Based on these insights we have extended the GERBIL framework to enable a more fine grained evaluation and in depth analysis of the available benchmark datasets with respect to different emphases. This paper presents the implementation of an adaptive filter for arbitrary entities and customized benchmark creation as well as the automated determination of typical NEL benchmark dataset properties, such as the extent of content-related ambiguity and diversity. These properties are integrated on different levels, which also enables to tailor customized new datasets out of the existing ones by remixing documents based on desired emphases. Besides a new system library to enrich provided NIF [in: International Semantic Web Conference (ISWC’13), Lecture Notes in Computer Science, Vol. 8219, Springer, Berlin, Heidelberg, 2013, pp. 98–113] datasets with statistical information, best practices for dataset remixing are presented, and an in depth analysis of the performance of entity linking systems on special focus datasets is presented. KW - Entity Linking KW - GERBIL KW - evaluation KW - benchmark Y1 - 2019 U6 - https://doi.org/10.3233/SW-180334 SN - 1570-0844 SN - 2210-4968 VL - 10 IS - 2 SP - 385 EP - 412 PB - IOS Press CY - Amsterdam ER -