TY  - THES
A1  - Zuo, Zhe
T1  - From unstructured to structured: Context-based named entity mining from text
T1  - Von unstrukturiert zu strukturiert: Kontextbasierte Gewinnung benannter Entitäten von Text
N2  - With recent advances in the area of information extraction, automatically extracting structured information from a vast amount of unstructured textual data becomes an important task, which is infeasible for humans to capture all information manually. Named entities (e.g., persons, organizations, and locations), which are crucial components in texts, are usually the subjects of structured information from textual documents. Therefore, the task of named entity mining receives much attention. It consists of three major subtasks, which are named entity recognition, named entity linking, and relation extraction.

These three tasks build up an entire pipeline of a named entity mining system, where each of them has its challenges and can be employed for further applications. As a fundamental task in the natural language processing domain, studies on named entity recognition have a long history, and many existing approaches produce reliable results. The task is aiming to extract mentions of named entities in text and identify their types. Named entity linking recently received much attention with the development of knowledge bases that contain rich information about entities. The goal is to disambiguate mentions of named entities and to link them to the corresponding entries in a knowledge base. Relation extraction, as the final step of named entity mining, is a highly challenging task, which is to extract semantic relations between named entities, e.g., the ownership relation between two companies.

In this thesis, we review the state-of-the-art of named entity mining domain in detail, including valuable features, techniques, evaluation methodologies, and so on. Furthermore, we present two of our approaches that focus on the named entity linking and relation extraction tasks separately. 

To solve the named entity linking task, we propose the entity linking technique, BEL, which operates on a textual range of relevant terms and aggregates decisions from an ensemble of simple classifiers. Each of the classifiers operates on a randomly sampled subset of the above range. In extensive experiments on hand-labeled and benchmark datasets, our approach outperformed state-of-the-art entity linking techniques, both in terms of quality and efficiency. 

For the task of relation extraction, we focus on extracting a specific group of difficult relation types, business relations between companies. These relations can be used to gain valuable insight into the interactions between companies and perform complex analytics, such as predicting risk or valuating companies. Our semi-supervised strategy can extract business relations between companies based on only a few user-provided seed company pairs. By doing so, we also provide a solution for the problem of determining the direction of asymmetric relations, such as the ownership_of relation. We improve the reliability of the extraction process by using a holistic pattern identification method, which classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relation by using as few as five labeled seed pairs.
N2  - Mit den jüngsten Fortschritten in den Gebieten der Informationsextraktion wird die automatisierte Extrahierung strukturierter Informationen aus einer unüberschaubaren Menge unstrukturierter Textdaten eine wichtige Aufgabe, deren manuelle Ausführung  unzumutbar ist. Benannte Entitäten, (z.B. Personen, Organisationen oder Orte), essentielle Bestandteile in Texten, sind normalerweise der Gegenstand strukturierter Informationen aus Textdokumenten. Daher erhält die Aufgabe der Gewinnung benannter Entitäten viel Aufmerksamkeit. Sie besteht aus drei groen Unteraufgaben, nämlich Erkennung benannter Entitäten, Verbindung benannter Entitäten und Extraktion von Beziehungen.

Diese drei Aufgaben zusammen sind der Grundprozess eines Systems zur Gewinnung benannter Entitäten, wobei jede ihre eigene Herausforderung hat und für weitere Anwendungen eingesetzt werden kann. Als ein fundamentaler Aspekt in der Verarbeitung natürlicher Sprache haben Studien zur Erkennung benannter Entitäten eine lange Geschichte, und viele bestehenden Ansätze erbringen verlässliche Ergebnisse. Die Aufgabe zielt darauf ab, Nennungen benannter Entitäten zu extrahieren und ihre Typen zu bestimmen. Verbindung benannter Entitäten hat in letzter Zeit durch die Entwicklung von Wissensdatenbanken, welche reiche Informationen über Entitäten enthalten, viel Aufmerksamkeit erhalten. Das Ziel ist es, Nennungen benannter Entitäten zu unterscheiden und diese mit dazugehörigen Einträgen in einer Wissensdatenbank zu verknüpfen. Der letzte Schritt der Gewinnung benannter Entitäten, die Extraktion von Beziehungen, ist eine stark anspruchsvolle Aufgabe, nämlich die Extraktion semantischer Beziehungen zwischen Entitäten, z.B. die Eigentümerschaft zwischen zwei Firmen.

In dieser Doktorarbeit arbeiten wir den aktuellen Stand der Wissenschaft in den Domäne der Gewinnung benannter Entitäten auf, unter anderem wertvolle Eigenschaften und Evaluationsmethoden. Darüberhinaus präsentieren wir zwei Ansätze von uns, die jeweils ihren Fokus auf die Verbindung benannter Entitäten sowie der Aufgaben der Extraktion von Beziehungen legen.

Um die Aufgabe der Verbindung benannter Entitäten zu lösen schlagen wir hier die Verbindungstechnik BEL vor, welche auf einer textuellen Bandbreite relevanter Begriffe agiert und Entscheidungen einer Kombination von einfacher Klassifizierer aggregiert. Jeder dieser Klassifizierer arbeitet auf einer zufällig ausgewählten Teilmenge der obigen Bandbreite. In umfangreichen Experimenten mit handannotierten sowie Vergleichsdatensätzen hat unser Ansatz andere Lösungen zur Verbindung benannter Entitäten, die auf dem Stand der aktuellen Technik beruhen, sowie in Bezug auf Qualität als auch Effizienz geschlagen.

Für die Aufgabe der Extraktion von Beziehungen fokussieren wir uns auf eine bestimmte Gruppe schwieriger Beziehungstypen, nämlich die Geschäftsbeziehungen zwischen Firmen. Diese Beziehungen können benutzt werden, um wertvolle Erkenntnisse in das Zusammenspiel von Firmen zu gelangen und komplexe Analysen ausführen, beispielsweise die Risikovorhersage oder Bewertung von Firmen. Unsere teilbeaufsichtigte Strategie kann Geschäftsbeziehungen zwischen Firmen anhand nur weniger nutzergegebener Startwerte von Firmenpaaren extrahieren. Dadurch bieten wir auch eine Lösung für das Problem der Richtungserkennung asymmetrischer Beziehungen, beispielsweise der Eigentumsbeziehung. Wir verbessern die Verlässlichkeit des Extraktionsprozesses, indem wir holistische Musteridentifikationsmethoden verwenden, welche die erstellten Extraktionsmuster klassifizieren. Unsere Experimente zeigen, dass wir neue Entitätenpaare akkurat und verlässlich in der Zielbeziehung mit bereits fünf bezeichneten Startpaaren extrahieren können.
KW  - named entity mining
KW  - information extraction
KW  - natural language processing
KW  - Gewinnung benannter Entitäten
KW  - Informationsextraktion
KW  - maschinelle Verarbeitung natürlicher Sprache
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-412576
ER  - 
TY  - JOUR
A1  - Ziegler, Joceline
A1  - Pfitzner, Bjarne
A1  - Schulz, Heinrich
A1  - Saalbach, Axel
A1  - Arnrich, Bert
T1  - Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data
JF  - Sensors
N2  - Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
KW  - federated learning
KW  - privacy and security
KW  - privacy attack
KW  - X-ray
Y1  - 2022
U6  - https://doi.org/10.3390/s22145195
SN  - 1424-8220
VL  - 22
PB  - MDPI
CY  - Basel, Schweiz
ET  - 14
ER  - 
TY  - GEN
A1  - Ziegler, Joceline
A1  - Pfitzner, Bjarne
A1  - Schulz, Heinrich
A1  - Saalbach, Axel
A1  - Arnrich, Bert
T1  - Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Privacy regulations and the physical distribution of heterogeneous data are often primary concerns for the development of deep learning models in a medical context. This paper evaluates the feasibility of differentially private federated learning for chest X-ray classification as a defense against data privacy attacks. To the best of our knowledge, we are the first to directly compare the impact of differentially private training on two different neural network architectures, DenseNet121 and ResNet50. Extending the federated learning environments previously analyzed in terms of privacy, we simulated a heterogeneous and imbalanced federated setting by distributing images from the public CheXpert and Mendeley chest X-ray datasets unevenly among 36 clients. Both non-private baseline models achieved an area under the receiver operating characteristic curve (AUC) of 0.940.94 on the binary classification task of detecting the presence of a medical finding. We demonstrate that both model architectures are vulnerable to privacy violation by applying image reconstruction attacks to local model updates from individual clients. The attack was particularly successful during later training stages. To mitigate the risk of a privacy breach, we integrated Rényi differential privacy with a Gaussian noise mechanism into local model training. We evaluate model performance and attack vulnerability for privacy budgets ε∈{1,3,6,10}�∈{1,3,6,10}. The DenseNet121 achieved the best utility-privacy trade-off with an AUC of 0.940.94 for ε=6�=6. Model performance deteriorated slightly for individual clients compared to the non-private baseline. The ResNet50 only reached an AUC of 0.760.76 in the same privacy setting. Its performance was inferior to that of the DenseNet121 for all considered privacy constraints, suggesting that the DenseNet121 architecture is more robust to differentially private training.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 14 
KW  - federated learning
KW  - privacy and security
KW  - privacy attack
KW  - X-ray
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-581322
IS  - 14
ER  - 
TY  - THES
A1  - Zieger, Tobias
T1  - Self-adaptive data quality
BT  - automating duplicate detection
N2  - Carrying out business processes successfully is closely linked to the quality of the data inventory in an organization. Lacks in data quality lead to problems: Incorrect address data prevents (timely) shipments to customers. Erroneous orders lead to returns and thus to unnecessary effort. Wrong pricing forces companies to miss out on revenues or to impair customer satisfaction. If orders or customer records cannot be retrieved, complaint management takes longer. Due to erroneous inventories, too few or too much supplies might be reordered.

A special problem with data quality and the reason for many of the issues mentioned above are duplicates in databases. Duplicates are different representations of same real-world objects in a dataset. However, these representations differ from each other and are for that reason hard to match by a computer. Moreover, the number of required comparisons to find those duplicates grows with the square of the dataset size. To cleanse the data, these duplicates must be detected and removed. Duplicate detection is a very laborious process. To achieve satisfactory results, appropriate software must be created and configured (similarity measures, partitioning keys, thresholds, etc.). Both requires much manual effort and experience.

This thesis addresses automation of parameter selection for duplicate detection and presents several novel approaches that eliminate the need for human experience in parts of the duplicate detection process.

A pre-processing step is introduced that analyzes the datasets in question and classifies their attributes semantically. Not only do these annotations help understanding the respective datasets, but they also facilitate subsequent steps, for example, by selecting appropriate similarity measures or normalizing the data upfront. This approach works without schema information.

Following that, we show a partitioning technique that strongly reduces the number of pair comparisons for the duplicate detection process. The approach automatically finds particularly suitable partitioning keys that simultaneously allow for effective and efficient duplicate retrieval. By means of a user study, we demonstrate that this technique finds partitioning keys that outperform expert suggestions and additionally does not need manual configuration. Furthermore, this approach can be applied independently of the attribute types.

To measure the success of a duplicate detection process and to execute the described partitioning approach, a gold standard is required that provides information about the actual duplicates in a training dataset. This thesis presents a technique that uses existing duplicate detection results and crowdsourcing to create a near gold standard that can be used for the purposes above. Another part of the thesis describes and evaluates strategies how to reduce these crowdsourcing costs and to achieve a consensus with less effort.
N2  - Die erfolgreiche Ausführung von Geschäftsprozessen ist eng an die Datenqualität der Datenbestände in einer Organisation geknüpft. Bestehen Mängel in der Datenqualität, kann es zu Problemen kommen: Unkorrekte Adressdaten verhindern, dass Kunden (rechtzeitig) beliefert werden. Fehlerhafte Bestellungen führen zu Reklamationen und somit zu unnötigem Aufwand. Falsche Preisauszeichnungen zwingen Unternehmen, auf Einnahmen zu verzichten oder gefährden die Kundenzufriedenheit. Können Bestellungen oder Kundendaten nicht gefunden werden, verlängert sich die Abarbeitung von Beschwerden. Durch fehlerhafte Inventarisierung wird zu wenig oder zu viel Nachschub bestellt.

Ein spezielles Datenqualitätsproblem und der Grund für viele der genannten Datenqualitätsprobleme sind Duplikate in Datenbanken. Duplikate sind verschiedene Repräsentationen derselben Realweltobjekte im Datenbestand. Allerdings unterscheiden sich diese Repräsentationen voneinander und sind so für den Computer nur schwer als zusammengehörig zu erkennen. Außerdem wächst die Anzahl der zur Aufdeckung der Duplikate benötigten Vergleiche quadratisch mit der Datensatzgröße. Zum Zwecke der Datenreinigung müssen diese Duplikate erkannt und beseitigt werden. Diese Duplikaterkennung ist ein sehr aufwändiger Prozess. Um gute Ergebnisse zu erzielen, ist die Erstellung von entsprechender Software und das Konfigurieren vieler Parameter (Ähnlichkeitsmaße, Partitionierungsschlüssel, Schwellwerte usw.) nötig. Beides erfordert viel manuellen Aufwand und Erfahrung.

Diese Dissertation befasst sich mit dem Automatisieren der Parameterwahl für die Duplikaterkennung und stellt verschiedene neuartige Verfahren vor, durch die Teile des Duplikaterkennungsprozesses ohne menschliche Erfahrung gestaltet werden können.

Es wird ein Vorverarbeitungsschritt vorgestellt, der die betreffenden Datensätze analysiert und deren Attribute automatisch semantisch klassifiziert. Durch diese Annotationen wird nicht nur das Verständnis des Datensatzes verbessert, sondern es werden darüber hinaus die folgenden Schritte erleichtert, zum Beispiel können so geeignete Ähnlichkeitsmaße ausgewählt oder die Daten normalisiert werden. Dabei kommt der Ansatz ohne Schemainformationen aus.

Anschließend wird ein Partitionierungsverfahren gezeigt, das die Anzahl der für die Duplikaterkennung benötigten Vergleiche stark reduziert. Das Verfahren findet automatisch besonders geeignete Partitionierungsschlüssel, die eine gleichzeitig effektive und effiziente Duplikatsuche ermöglichen. Anhand einer Nutzerstudie wird gezeigt, dass die so gefundenen Partitionierungsschlüssel Expertenvorschlägen überlegen sind und zudem keine menschliche Konfiguration benötigen. Außerdem lässt sich das Verfahren unabhängig von den Attributtypen anwenden.

Zum Messen des Erfolges eines Duplikaterkennungsverfahrens und für das zuvor beschriebene Partitionierungsverfahren ist ein Goldstandard nötig, der Auskunft über die zu findenden Duplikate gibt. Die Dissertation stellt ein Verfahren vor, das anhand mehrerer vorhandener Duplikaterkennungsergebnisse und dem Einsatz von Crowdsourcing einen Nahezu-Goldstandard erzeugt, der für die beschriebenen Zwecke eingesetzt werden kann. Ein weiterer Teil der Arbeit beschreibt und evaluiert Strategien, wie die Kosten dieses Crowdsourcingeinsatzes reduziert werden können und mit geringerem Aufwand ein Konsens erreicht wird.
KW  - data quality
KW  - Datenqualität
KW  - Duplikaterkennung
KW  - duplicate detection
KW  - Machine Learning
KW  - Information Retrieval
KW  - Automatisierung
KW  - automation
Y1  - 2017
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-410573
ER  - 
TY  - GEN
A1  - Zhou, Lin
A1  - Fischer, Eric
A1  - Tunca, Can
A1  - Brahms, Clemens Markus
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - How We Found Our IMU
BT  - Guidelines to IMU Selection and a Comparison of Seven IMUs for Pervasive Healthcare Applications
T2  - Postprints der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - Inertial measurement units (IMUs) are commonly used for localization or movement tracking in pervasive healthcare-related studies, and gait analysis is one of the most often studied topics using IMUs. The increasing variety of commercially available IMU devices offers convenience by combining the sensor modalities and simplifies the data collection procedures. However, selecting the most suitable IMU device for a certain use case is increasingly challenging. In this study, guidelines for IMU selection are proposed. In particular, seven IMUs were compared in terms of their specifications, data collection procedures, and raw data quality. Data collected from the IMUs were then analyzed by a gait analysis algorithm. The difference in accuracy of the calculated gait parameters between the IMUs could be used to retrace the issues in raw data, such as acceleration range or sensor calibration. Based on our algorithm, we were able to identify the best-suited IMUs for our needs. This study provides an overview of how to select the IMUs based on the area of study with concrete examples, and gives insights into the features of seven commercial IMUs using real data.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 2 
KW  - inertial measurement unit
KW  - pervasive healthcare
KW  - gait analysis
KW  - comparison of devices
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-481628
IS  - 2
ER  - 
TY  - JOUR
A1  - Zhou, Lin
A1  - Fischer, Eric
A1  - Tunca, Can
A1  - Brahms, Clemens Markus
A1  - Ersoy, Cem
A1  - Granacher, Urs
A1  - Arnrich, Bert
T1  - How We Found Our IMU
BT  - Guidelines to IMU Selection and a Comparison of Seven IMUs for Pervasive Healthcare Applications
JF  - Sensors
N2  - Inertial measurement units (IMUs) are commonly used for localization or movement tracking in pervasive healthcare-related studies, and gait analysis is one of the most often studied topics using IMUs. The increasing variety of commercially available IMU devices offers convenience by combining the sensor modalities and simplifies the data collection procedures. However, selecting the most suitable IMU device for a certain use case is increasingly challenging. In this study, guidelines for IMU selection are proposed. In particular, seven IMUs were compared in terms of their specifications, data collection procedures, and raw data quality. Data collected from the IMUs were then analyzed by a gait analysis algorithm. The difference in accuracy of the calculated gait parameters between the IMUs could be used to retrace the issues in raw data, such as acceleration range or sensor calibration. Based on our algorithm, we were able to identify the best-suited IMUs for our needs. This study provides an overview of how to select the IMUs based on the area of study with concrete examples, and gives insights into the features of seven commercial IMUs using real data.
KW  - inertial measurement unit
KW  - pervasive healthcare
KW  - gait analysis
KW  - comparison of devices
Y1  - 2020
U6  - https://doi.org/10.3390/s20154090
SN  - 1424-8220
VL  - 20
IS  - 15
PB  - MDPI
CY  - Basel
ER  - 
TY  - BOOK
A1  - Zhang, Shuhao
A1  - Plauth, Max
A1  - Eberhardt, Felix
A1  - Polze, Andreas
A1  - Lehmann, Jens
A1  - Sejdiu, Gezim
A1  - Jabeen, Hajira
A1  - Servadei, Lorenzo
A1  - Möstl, Christian
A1  - Bär, Florian
A1  - Netzeband, André
A1  - Schmidt, Rainer
A1  - Knigge, Marlene
A1  - Hecht, Sonja
A1  - Prifti, Loina
A1  - Krcmar, Helmut
A1  - Sapegin, Andrey
A1  - Jaeger, David
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Friedrich, Tobias
A1  - Rothenberger, Ralf
A1  - Sutton, Andrew M.
A1  - Sidorova, Julia A.
A1  - Lundberg, Lars
A1  - Rosander, Oliver
A1  - Sköld, Lars
A1  - Di Varano, Igor
A1  - van der Walt, Estée
A1  - Eloff, Jan H. P.
A1  - Fabian, Benjamin
A1  - Baumann, Annika
A1  - Ermakova, Tatiana
A1  - Kelkel, Stefan
A1  - Choudhary, Yash
A1  - Cooray, Thilini
A1  - Rodríguez, Jorge
A1  - Medina-Pérez, Miguel Angel
A1  - Trejo, Luis A.
A1  - Barrera-Animas, Ari Yair
A1  - Monroy-Borja, Raúl
A1  - López-Cuevas, Armando
A1  - Ramírez-Márquez, José Emmanuel
A1  - Grohmann, Maria
A1  - Niederleithinger, Ernst
A1  - Podapati, Sasidhar
A1  - Schmidt, Christopher
A1  - Huegle, Johannes
A1  - de Oliveira, Roberto C. L.
A1  - Soares, Fábio Mendes
A1  - van Hoorn, André
A1  - Neumer, Tamas
A1  - Willnecker, Felix
A1  - Wilhelm, Mathias
A1  - Kuster, Bernhard
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2017
T1  - HPI Future SOC Lab – Proceedings 2017
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2017. Selected projects have presented their results on April 25th and November 15th 2017 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2017 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 25. April und 15. November 2017 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 130 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - In-Memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - Künstliche Intelligenz
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-433100
SN  - 978-3-86956-475-3
SN  - 1613-5652
SN  - 2191-1665
IS  - 130
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - GEN
A1  - Zenner, Alexander M.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
T1  - StudyMe
BT  - a new mobile app for user-centric N-of-1 trials
T2  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät
N2  - N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
T3  - Zweitveröffentlichungen der Universität Potsdam : Reihe der Digital Engineering Fakultät - 18 
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589763
IS  - 18
ER  - 
TY  - JOUR
A1  - Zenner, Alexander M.
A1  - Böttinger, Erwin
A1  - Konigorski, Stefan
T1  - StudyMe
BT  - a new mobile app for user-centric N-of-1 trials
JF  - Trials
N2  - N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, there is a gap in supporting non-experts in conducting their own user-centric trials. In this study, we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe, focusing on trial creation. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests, we developed StudyMe that features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.
Y1  - 2022
U6  - https://doi.org/10.1186/s13063-022-06893-7
SN  - 1745-6215
VL  - 23
PB  - BioMed Central
CY  - London
ER  - 
TY  - JOUR
A1  - Yousfi, Alaaeddine
A1  - Hewelt, Marcin
A1  - Bauer, Christine
A1  - Weske, Mathias
T1  - Toward uBPMN-Based patterns for modeling ubiquitous business processes
JF  - IEEE Transactions on Industrial Informatics
N2  - Ubiquitous business processes are the new generation of processes that pervade the physical space and interact with their environments using a minimum of human involvement. Although they are now widely deployed in the industry, their deployment is still ad hoc . They are implemented after an arbitrary modeling phase or no modeling phase at all. The absence of a solid modeling phase backing up the implementation generates many loopholes that are stressed in the literature. Here, we tackle the issue of modeling ubiquitous business processes. We propose patterns to represent the recent ubiquitous computing features. These patterns are the outcome of an analysis we conducted in the field of human-computer interaction to examine how the features are actually deployed. The patterns' understandability, ease-of-use, usefulness, and completeness are examined via a user experiment. The results indicate that these four indexes are on the positive track. Hence, the patterns may be the backbone of ubiquitous business process modeling in industrial applications.
KW  - Ubiquitous business process
KW  - ubiquitous business process model and notation (uBPMN)
KW  - ubiquitous business process modeling
KW  - ubiquitous computing (ubicomp)
Y1  - 2017
U6  - https://doi.org/10.1109/TII.2017.2777847
SN  - 1551-3203
SN  - 1941-0050
VL  - 14
IS  - 8
SP  - 3358
EP  - 3367
PB  - Inst. of Electr. and Electronics Engineers
CY  - Piscataway
ER  - 
TY  - JOUR
A1  - Yousfi, Alaaeddine
A1  - Batoulis, Kimon
A1  - Weske, Mathias
T1  - Achieving Business Process Improvement via Ubiquitous Decision-Aware Business Processes
JF  - ACM Transactions on Internet Technology
N2  - Business process improvement is an endless challenge for many organizations. As long as there is a process, it must he improved. Nowadays, improvement initiatives are driven by professionals. This is no longer practical because people cannot perceive the enormous data of current business environments. Here, we introduce ubiquitous decision-aware business processes. They pervade the physical space, analyze the ever-changing environments, and make decisions accordingly. We explain how they can be built and used for improvement. Our approach can be a valuable improvement option to alleviate the workload of participants by helping focus on the crucial rather than the menial tasks.
KW  - Business process improvement
KW  - ubiquitous decision-aware business process
KW  - ubiquitous decisions
KW  - context
KW  - uBPMN
KW  - DMN
Y1  - 2019
U6  - https://doi.org/10.1145/3298986
SN  - 1533-5399
SN  - 1557-6051
VL  - 19
IS  - 1
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - THES
A1  - Yang, Haojin
T1  - Deep representation learning for multimedia data analysis
Y1  - 2019
ER  - 
TY  - JOUR
A1  - Wuttke, Matthias
A1  - Li, Yong
A1  - Li, Man
A1  - Sieber, Karsten B.
A1  - Feitosa, Mary F.
A1  - Gorski, Mathias
A1  - Tin, Adrienne
A1  - Wang, Lihua
A1  - Chu, Audrey Y.
A1  - Hoppmann, Anselm
A1  - Kirsten, Holger
A1  - Giri, Ayush
A1  - Chai, Jin-Fang
A1  - Sveinbjornsson, Gardar
A1  - Tayo, Bamidele O.
A1  - Nutile, Teresa
A1  - Fuchsberger, Christian
A1  - Marten, Jonathan
A1  - Cocca, Massimiliano
A1  - Ghasemi, Sahar
A1  - Xu, Yizhe
A1  - Horn, Katrin
A1  - Noce, Damia
A1  - Van der Most, Peter J.
A1  - Sedaghat, Sanaz
A1  - Yu, Zhi
A1  - Akiyama, Masato
A1  - Afaq, Saima
A1  - Ahluwalia, Tarunveer Singh
A1  - Almgren, Peter
A1  - Amin, Najaf
A1  - Arnlov, Johan
A1  - Bakker, Stephan J. L.
A1  - Bansal, Nisha
A1  - Baptista, Daniela
A1  - Bergmann, Sven
A1  - Biggs, Mary L.
A1  - Biino, Ginevra
A1  - Boehnke, Michael
A1  - Boerwinkle, Eric
A1  - Boissel, Mathilde
A1  - Böttinger, Erwin
A1  - Boutin, Thibaud S.
A1  - Brenner, Hermann
A1  - Brumat, Marco
A1  - Burkhardt, Ralph
A1  - Butterworth, Adam S.
A1  - Campana, Eric
A1  - Campbell, Archie
A1  - Campbell, Harry
A1  - Canouil, Mickael
A1  - Carroll, Robert J.
A1  - Catamo, Eulalia
A1  - Chambers, John C.
A1  - Chee, Miao-Ling
A1  - Chee, Miao-Li
A1  - Chen, Xu
A1  - Cheng, Ching-Yu
A1  - Cheng, Yurong
A1  - Christensen, Kaare
A1  - Cifkova, Renata
A1  - Ciullo, Marina
A1  - Concas, Maria Pina
A1  - Cook, James P.
A1  - Coresh, Josef
A1  - Corre, Tanguy
A1  - Sala, Cinzia Felicita
A1  - Cusi, Daniele
A1  - Danesh, John
A1  - Daw, E. Warwick
A1  - De Borst, Martin H.
A1  - De Grandi, Alessandro
A1  - De Mutsert, Renee
A1  - De Vries, Aiko P. J.
A1  - Degenhardt, Frauke
A1  - Delgado, Graciela
A1  - Demirkan, Ayse
A1  - Di Angelantonio, Emanuele
A1  - Dittrich, Katalin
A1  - Divers, Jasmin
A1  - Dorajoo, Rajkumar
A1  - Eckardt, Kai-Uwe
A1  - Ehret, Georg
A1  - Elliott, Paul
A1  - Endlich, Karlhans
A1  - Evans, Michele K.
A1  - Felix, Janine F.
A1  - Foo, Valencia Hui Xian
A1  - Franco, Oscar H.
A1  - Franke, Andre
A1  - Freedman, Barry I.
A1  - Freitag-Wolf, Sandra
A1  - Friedlander, Yechiel
A1  - Froguel, Philippe
A1  - Gansevoort, Ron T.
A1  - Gao, He
A1  - Gasparini, Paolo
A1  - Gaziano, J. Michael
A1  - Giedraitis, Vilmantas
A1  - Gieger, Christian
A1  - Girotto, Giorgia
A1  - Giulianini, Franco
A1  - Gogele, Martin
A1  - Gordon, Scott D.
A1  - Gudbjartsson, Daniel F.
A1  - Gudnason, Vilmundur
A1  - Haller, Toomas
A1  - Hamet, Pavel
A1  - Harris, Tamara B.
A1  - Hartman, Catharina A.
A1  - Hayward, Caroline
A1  - Hellwege, Jacklyn N.
A1  - Heng, Chew-Kiat
A1  - Hicks, Andrew A.
A1  - Hofer, Edith
A1  - Huang, Wei
A1  - Hutri-Kahonen, Nina
A1  - Hwang, Shih-Jen
A1  - Ikram, M. Arfan
A1  - Indridason, Olafur S.
A1  - Ingelsson, Erik
A1  - Ising, Marcus
A1  - Jaddoe, Vincent W. V.
A1  - Jakobsdottir, Johanna
A1  - Jonas, Jost B.
A1  - Joshi, Peter K.
A1  - Josyula, Navya Shilpa
A1  - Jung, Bettina
A1  - Kahonen, Mika
A1  - Kamatani, Yoichiro
A1  - Kammerer, Candace M.
A1  - Kanai, Masahiro
A1  - Kastarinen, Mika
A1  - Kerr, Shona M.
A1  - Khor, Chiea-Chuen
A1  - Kiess, Wieland
A1  - Kleber, Marcus E.
A1  - Koenig, Wolfgang
A1  - Kooner, Jaspal S.
A1  - Korner, Antje
A1  - Kovacs, Peter
A1  - Kraja, Aldi T.
A1  - Krajcoviechova, Alena
A1  - Kramer, Holly
A1  - Kramer, Bernhard K.
A1  - Kronenberg, Florian
A1  - Kubo, Michiaki
A1  - Kuhnel, Brigitte
A1  - Kuokkanen, Mikko
A1  - Kuusisto, Johanna
A1  - La Bianca, Martina
A1  - Laakso, Markku
A1  - Lange, Leslie A.
A1  - Langefeld, Carl D.
A1  - Lee, Jeannette Jen-Mai
A1  - Lehne, Benjamin
A1  - Lehtimaki, Terho
A1  - Lieb, Wolfgang
A1  - Lim, Su-Chi
A1  - Lind, Lars
A1  - Lindgren, Cecilia M.
A1  - Liu, Jun
A1  - Liu, Jianjun
A1  - Loeffler, Markus
A1  - Loos, Ruth J. F.
A1  - Lucae, Susanne
A1  - Lukas, Mary Ann
A1  - Lyytikainen, Leo-Pekka
A1  - Magi, Reedik
A1  - Magnusson, Patrik K. E.
A1  - Mahajan, Anubha
A1  - Martin, Nicholas G.
A1  - Martins, Jade
A1  - Marz, Winfried
A1  - Mascalzoni, Deborah
A1  - Matsuda, Koichi
A1  - Meisinger, Christa
A1  - Meitinger, Thomas
A1  - Melander, Olle
A1  - Metspalu, Andres
A1  - Mikaelsdottir, Evgenia K.
A1  - Milaneschi, Yuri
A1  - Miliku, Kozeta
A1  - Mishra, Pashupati P.
A1  - Program, V. A. Million Veteran
A1  - Mohlke, Karen L.
A1  - Mononen, Nina
A1  - Montgomery, Grant W.
A1  - Mook-Kanamori, Dennis O.
A1  - Mychaleckyj, Josyf C.
A1  - Nadkarni, Girish N.
A1  - Nalls, Mike A.
A1  - Nauck, Matthias
A1  - Nikus, Kjell
A1  - Ning, Boting
A1  - Nolte, Ilja M.
A1  - Noordam, Raymond
A1  - Olafsson, Isleifur
A1  - Oldehinkel, Albertine J.
A1  - Orho-Melander, Marju
A1  - Ouwehand, Willem H.
A1  - Padmanabhan, Sandosh
A1  - Palmer, Nicholette D.
A1  - Palsson, Runolfur
A1  - Penninx, Brenda W. J. H.
A1  - Perls, Thomas
A1  - Perola, Markus
A1  - Pirastu, Mario
A1  - Pirastu, Nicola
A1  - Pistis, Giorgio
A1  - Podgornaia, Anna I.
A1  - Polasek, Ozren
A1  - Ponte, Belen
A1  - Porteous, David J.
A1  - Poulain, Tanja
A1  - Pramstaller, Peter P.
A1  - Preuss, Michael H.
A1  - Prins, Bram P.
A1  - Province, Michael A.
A1  - Rabelink, Ton J.
A1  - Raffield, Laura M.
A1  - Raitakari, Olli T.
A1  - Reilly, Dermot F.
A1  - Rettig, Rainer
A1  - Rheinberger, Myriam
A1  - Rice, Kenneth M.
A1  - Ridker, Paul M.
A1  - Rivadeneira, Fernando
A1  - Rizzi, Federica
A1  - Roberts, David J.
A1  - Robino, Antonietta
A1  - Rossing, Peter
A1  - Rudan, Igor
A1  - Rueedi, Rico
A1  - Ruggiero, Daniela
A1  - Ryan, Kathleen A.
A1  - Saba, Yasaman
A1  - Sabanayagam, Charumathi
A1  - Salomaa, Veikko
A1  - Salvi, Erika
A1  - Saum, Kai-Uwe
A1  - Schmidt, Helena
A1  - Schmidt, Reinhold
A1  - Ben Schottker, 
A1  - Schulz, Christina-Alexandra
A1  - Schupf, Nicole
A1  - Shaffer, Christian M.
A1  - Shi, Yuan
A1  - Smith, Albert V.
A1  - Smith, Blair H.
A1  - Soranzo, Nicole
A1  - Spracklen, Cassandra N.
A1  - Strauch, Konstantin
A1  - Stringham, Heather M.
A1  - Stumvoll, Michael
A1  - Svensson, Per O.
A1  - Szymczak, Silke
A1  - Tai, E-Shyong
A1  - Tajuddin, Salman M.
A1  - Tan, Nicholas Y. Q.
A1  - Taylor, Kent D.
A1  - Teren, Andrej
A1  - Tham, Yih-Chung
A1  - Thiery, Joachim
A1  - Thio, Chris H. L.
A1  - Thomsen, Hauke
A1  - Thorleifsson, Gudmar
A1  - Toniolo, Daniela
A1  - Tonjes, Anke
A1  - Tremblay, Johanne
A1  - Tzoulaki, Ioanna
A1  - Uitterlinden, Andre G.
A1  - Vaccargiu, Simona
A1  - Van Dam, Rob M.
A1  - Van der Harst, Pim
A1  - Van Duijn, Cornelia M.
A1  - Edward, Digna R. Velez
A1  - Verweij, Niek
A1  - Vogelezang, Suzanne
A1  - Volker, Uwe
A1  - Vollenweider, Peter
A1  - Waeber, Gerard
A1  - Waldenberger, Melanie
A1  - Wallentin, Lars
A1  - Wang, Ya Xing
A1  - Wang, Chaolong
A1  - Waterworth, Dawn M.
A1  - Bin Wei, Wen
A1  - White, Harvey
A1  - Whitfield, John B.
A1  - Wild, Sarah H.
A1  - Wilson, James F.
A1  - Wojczynski, Mary K.
A1  - Wong, Charlene
A1  - Wong, Tien-Yin
A1  - Xu, Liang
A1  - Yang, Qiong
A1  - Yasuda, Masayuki
A1  - Yerges-Armstrong, Laura M.
A1  - Zhang, Weihua
A1  - Zonderman, Alan B.
A1  - Rotter, Jerome I.
A1  - Bochud, Murielle
A1  - Psaty, Bruce M.
A1  - Vitart, Veronique
A1  - Wilson, James G.
A1  - Dehghan, Abbas
A1  - Parsa, Afshin
A1  - Chasman, Daniel I.
A1  - Ho, Kevin
A1  - Morris, Andrew P.
A1  - Devuyst, Olivier
A1  - Akilesh, Shreeram
A1  - Pendergrass, Sarah A.
A1  - Sim, Xueling
A1  - Boger, Carsten A.
A1  - Okada, Yukinori
A1  - Edwards, Todd L.
A1  - Snieder, Harold
A1  - Stefansson, Kari
A1  - Hung, Adriana M.
A1  - Heid, Iris M.
A1  - Scholz, Markus
A1  - Teumer, Alexander
A1  - Kottgen, Anna
A1  - Pattaro, Cristian
T1  - A catalog of genetic loci associated with kidney function from analyses of a million individuals
JF  - Nature genetics
N2  - Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through transancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these,147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research.
Y1  - 2019
U6  - https://doi.org/10.1038/s41588-019-0407-x
SN  - 1061-4036
SN  - 1546-1718
VL  - 51
IS  - 6
SP  - 957
EP  - +
PB  - Nature Publ. Group
CY  - New York
ER  - 
TY  - THES
A1  - Wolf, Johannes
T1  - Analysis and visualization of transport infrastructure based on large-scale geospatial mobile mapping data
T1  - Analyse und Visualisierung von Verkehrsinfrastruktur basierend auf großen Mobile-Mapping-Datensätzen
N2  - 3D point clouds are a universal and discrete digital representation of three-dimensional objects and environments. For geospatial applications, 3D point clouds have become a fundamental type of raw data acquired and generated using various methods and techniques. In particular, 3D point clouds serve as raw data for creating digital twins of the built environment.

This thesis concentrates on the research and development of concepts, methods, and techniques for preprocessing, semantically enriching, analyzing, and visualizing 3D point clouds for applications around transport infrastructure. It introduces a collection of preprocessing techniques that aim to harmonize raw 3D point cloud data, such as point density reduction and scan profile detection. Metrics such as, e.g., local density, verticality, and planarity are calculated for later use. One of the key contributions tackles the problem of analyzing and deriving semantic information in 3D point clouds. Three different approaches are investigated: a geometric analysis, a machine learning approach operating on synthetically generated 2D images, and a machine learning approach operating on 3D point clouds without intermediate representation.

In the first application case, 2D image classification is applied and evaluated for mobile mapping data focusing on road networks to derive road marking vector data. The second application case investigates how 3D point clouds can be merged with ground-penetrating radar data for a combined visualization and to automatically identify atypical areas in the data. For example, the approach detects pavement regions with developing potholes. The third application case explores the combination of a 3D environment based on 3D point clouds with panoramic imagery to improve visual representation and the detection of 3D objects such as traffic signs.

The presented methods were implemented and tested based on software frameworks for 3D point clouds and 3D visualization. In particular, modules for metric computation, classification procedures, and visualization techniques were integrated into a modular pipeline-based C++ research framework for geospatial data processing, extended by Python machine learning scripts. All visualization and analysis techniques scale to large real-world datasets such as road networks of entire cities or railroad networks.

The thesis shows that some use cases allow taking advantage of established image vision methods to analyze images rendered from mobile mapping data efficiently. The two presented semantic classification methods working directly on 3D point clouds are use case independent and show similar overall accuracy when compared to each other. While the geometry-based method requires less computation time, the machine learning-based method supports arbitrary semantic classes but requires training the network with ground truth data. Both methods can be used in combination to gradually build this ground truth with manual corrections via a respective annotation tool.

This thesis contributes results for IT system engineering of applications, systems, and services that require spatial digital twins of transport infrastructure such as road networks and railroad networks based on 3D point clouds as raw data. It demonstrates the feasibility of fully automated data flows that map captured 3D point clouds to semantically classified models. This provides a key component for seamlessly integrated spatial digital twins in IT solutions that require up-to-date, object-based, and semantically enriched information about the built environment.
N2  - 3D-Punktwolken sind eine universelle und diskrete digitale Darstellung von dreidimensionalen Objekten und Umgebungen. Für raumbezogene Anwendungen sind 3D-Punktwolken zu einer grundlegenden Form von Rohdaten geworden, die mit verschiedenen Methoden und Techniken erfasst und erzeugt werden. Insbesondere dienen 3D-Punktwolken als Rohdaten für die Erstellung digitaler Zwillinge der bebauten Umwelt.

Diese Arbeit konzentriert sich auf die Erforschung und Entwicklung von Konzepten, Methoden und Techniken zur Vorverarbeitung, semantischen Anreicherung, Analyse und Visualisierung von 3D-Punktwolken für Anwendungen im Bereich der Verkehrsinfrastruktur. Es wird eine Sammlung von Vorverarbeitungstechniken vorgestellt, die auf die Harmonisierung von 3D-Punktwolken-Rohdaten abzielen, so z.B. die Reduzierung der Punktdichte und die Erkennung von Scanprofilen. Metriken wie bspw. die lokale Dichte, Vertikalität und Planarität werden zur späteren Verwendung berechnet. Einer der Hauptbeiträge befasst sich mit dem Problem der Analyse und Ableitung semantischer Informationen in 3D-Punktwolken. Es werden drei verschiedene Ansätze untersucht: Eine geometrische Analyse sowie zwei maschinelle Lernansätze, die auf synthetisch erzeugten 2D-Bildern, bzw. auf 3D-Punktwolken ohne Zwischenrepräsentation arbeiten.

Im ersten Anwendungsfall wird die 2D-Bildklassifikation für Mobile-Mapping-Daten mit Fokus auf Straßennetze angewendet und evaluiert, um Vektordaten für Straßenmarkierungen abzuleiten. Im zweiten Anwendungsfall wird untersucht, wie 3D-Punktwolken mit Bodenradardaten für eine kombinierte Visualisierung und automatische Identifikation atypischer Bereiche in den Daten zusammengeführt werden können. Der Ansatz erkennt zum Beispiel Fahrbahnbereiche mit entstehenden Schlaglöchern. Der dritte Anwendungsfall untersucht die Kombination einer 3D-Umgebung auf Basis von 3D-Punktwolken mit Panoramabildern, um die visuelle Darstellung und die Erkennung von 3D-Objekten wie Verkehrszeichen zu verbessern.

Die vorgestellten Methoden wurden auf Basis von Software-Frameworks für 3D-Punktwolken und 3D-Visualisierung implementiert und getestet. Insbesondere wurden Module für Metrikberechnungen, Klassifikationsverfahren und Visualisierungstechniken in ein modulares, pipelinebasiertes C++-Forschungsframework für die Geodatenverarbeitung integriert, das durch Python-Skripte für maschinelles Lernen erweitert wurde. Alle Visualisierungs- und Analysetechniken skalieren auf große reale Datensätze wie Straßennetze ganzer Städte oder Eisenbahnnetze.

Die Arbeit zeigt, dass es in einigen Anwendungsfällen möglich ist, die Vorteile etablierter Bildverarbeitungsmethoden zu nutzen, um aus Mobile-Mapping-Daten gerenderte Bilder effizient zu analysieren. Die beiden vorgestellten semantischen Klassifikationsverfahren, die direkt auf 3D-Punktwolken arbeiten, sind anwendungsfallunabhängig und zeigen im Vergleich zueinander eine ähnliche Gesamtgenauigkeit. Während die geometriebasierte Methode weniger Rechenzeit benötigt, unterstützt die auf maschinellem Lernen basierende Methode beliebige semantische Klassen, erfordert aber das Trainieren des Netzwerks mit Ground-Truth-Daten. Beide Methoden können in Kombination verwendet werden, um diese Ground Truth mit manuellen Korrekturen über ein entsprechendes Annotationstool schrittweise aufzubauen.

Diese Arbeit liefert Ergebnisse für das IT-System-Engineering von Anwendungen, Systemen und Diensten, die räumliche digitale Zwillinge von Verkehrsinfrastruktur wie Straßen- und Schienennetzen auf der Basis von 3D-Punktwolken als Rohdaten benötigen. Sie demonstriert die Machbarkeit von vollautomatisierten Datenflüssen, die erfasste 3D-Punktwolken auf semantisch klassifizierte Modelle abbilden. Dies stellt eine Schlüsselkomponente für nahtlos integrierte räumliche digitale Zwillinge in IT-Lösungen dar, die aktuelle, objektbasierte und semantisch angereicherte Informationen über die bebaute Umwelt benötigen.
KW  - 3D point cloud
KW  - geospatial data
KW  - mobile mapping
KW  - semantic classification
KW  - 3D visualization
KW  - 3D-Punktwolke
KW  - räumliche Geodaten
KW  - Mobile Mapping
KW  - semantische Klassifizierung
KW  - 3D-Visualisierung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-536129
ER  - 
TY  - JOUR
A1  - Wittig, Alice
A1  - Miranda, Fabio Malcher
A1  - Hölzer, Martin
A1  - Altenburg, Tom
A1  - Bartoszewicz, Jakub Maciej
A1  - Beyvers, Sebastian
A1  - Dieckmann, Marius Alfred
A1  - Genske, Ulrich
A1  - Giese, Sven Hans-Joachim
A1  - Nowicka, Melania
A1  - Richard, Hugues
A1  - Schiebenhoefer, Henning
A1  - Schmachtenberg, Anna-Juliane
A1  - Sieben, Paul
A1  - Tang, Ming
A1  - Tembrockhaus, Julius
A1  - Renard, Bernhard Y.
A1  - Fuchs, Stephan
T1  - CovRadar
BT  - continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance
JF  - Bioinformatics
N2  - The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast.
Y1  - 2022
U6  - https://doi.org/10.1093/bioinformatics/btac411
SN  - 1367-4803
SN  - 1367-4811
VL  - 38
IS  - 17
SP  - 4223
EP  - 4225
PB  - Oxford Univ. Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Wiemker, Veronika
A1  - Bunova, Anna
A1  - Neufeld, Maria
A1  - Gornyi, Boris
A1  - Yurasova, Elena
A1  - Konigorski, Stefan
A1  - Kalinina, Anna
A1  - Kontsevaya, Anna
A1  - Ferreira-Borges, Carina
A1  - Probst, Charlotte
T1  - Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare
JF  - Digital health
N2  - Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond.
KW  - Alcohol use assessment
KW  - Alcohol Use Disorders Identification Test
KW  - screening tools
KW  - digital health
KW  - mobile applications
KW  - Russia
KW  - primary healthcare
KW  - usability
KW  - acceptability
Y1  - 2022
U6  - https://doi.org/10.1177/20552076211074491
SN  - 2055-2076
VL  - 8
PB  - Sage Publications
CY  - London
ER  - 
TY  - GEN
A1  - Welearegai, Gebrehiwet B.
A1  - Schlueter, Max
A1  - Hammer, Christian
T1  - Static security evaluation of an industrial web application
T2  - Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
N2  - JavaScript is the most popular programming language for web applications. Static analysis of JavaScript applications is highly challenging due to its dynamic language constructs and event-driven asynchronous executions, which also give rise to many security-related bugs. Several static analysis tools to detect such bugs exist, however, research has not yet reported much on the precision and scalability trade-off of these analyzers. As a further obstacle, JavaScript programs structured in Node. js modules need to be collected for analysis, but existing bundlers are either specific to their respective analysis tools or not particularly suitable for static analysis.
KW  - JavaScript
KW  - WALA
KW  - SAFE
KW  - comparison
Y1  - 2019
SN  - 978-1-4503-5933-7
U6  - https://doi.org/10.1145/3297280.3297471
SP  - 1952
EP  - 1961
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - BOOK
A1  - Weber, Benedikt
T1  - Human pose estimation for decubitus prophylaxis
T1  - Verwendung von Posenabschätzung zur Dekubitusprophylaxe
N2  - Decubitus is one of the most relevant diseases in nursing and the most expensive to treat. It is caused by sustained pressure on tissue, so it particularly affects bed-bound patients. This work lays a foundation for pressure mattress-based decubitus prophylaxis by implementing a solution to the single-frame 2D Human Pose Estimation problem.
For this, methods of Deep Learning are employed. Two approaches are examined, a coarse-to-fine Convolutional Neural Network for direct regression of joint coordinates and a U-Net for the derivation of probability distribution heatmaps.

We conclude that training our models on a combined dataset of the publicly available Bodies at Rest and SLP data yields the best results. Furthermore, various preprocessing techniques are investigated, and a hyperparameter optimization is performed to discover an improved model architecture.
Another finding indicates that the heatmap-based approach outperforms direct regression.
This model achieves a mean per-joint position error of 9.11 cm for the Bodies at Rest data and 7.43 cm for the SLP data.
We find that it generalizes well on data from mattresses other than those seen during training but has difficulties detecting the arms correctly.

Additionally, we give a brief overview of the medical data annotation tool annoto we developed in the bachelor project and furthermore conclude that the Scrum framework and agile practices enhanced our development workflow.
N2  - Dekubitus ist eine der relevantesten Krankheiten in der Krankenpflege und die kostspieligste in der Behandlung. Sie wird durch anhaltenden Druck auf Gewebe verursacht, betrifft also insbesondere bettlägerige Patienten. Diese Arbeit legt eine Grundlage für druckmatratzenbasierte Dekubitusprophylaxe, indem eine Lösung für das Einzelbild-2D-Posenabschätzungsproblem implementiert wird.
Dafür werden Methoden des tiefen Lernens verwendet. Zwei Ansätze, basierend auf einem Gefalteten Neuronalen grob-zu-fein Netzwerk zur direkten Regression der Gelenkkoordinaten und auf einem U-Netzwerk zur Ableitung von Wahrscheinlichkeitsverteilungsbildern, werden untersucht.

Wir schlussfolgern, dass das Training unserer Modelle auf einem kombinierten Datensatz, bestehend aus den frei verfügbaren Bodies at Rest und SLP Daten, die besten Ergebnisse liefert. Weiterhin werden diverse Vorverarbeitungsverfahren untersucht und eine Hyperparameteroptimierung zum Finden einer verbesserten Modellarchitektur durchgeführt.
Der wahrscheinlichkeitsverteilungsbasierte Ansatz übertrifft die direkte Regression.
Dieses Modell erreicht einen durchschnittlichen Pro-Gelenk-Positionsfehler von 9,11 cm auf den Bodies at Rest und von 7,43 cm auf den SLP Daten. Wir sehen, dass es gut auf Daten anderer als der im Training verwendeten Matratzen funktioniert, aber Schwierigkeiten mit der korrekten Erkennung der Arme hat. 

Weiterhin geben wir eine kurze Übersicht des medizinischen Datenannotationstools annoto, welches wir im Zusammenhang mit dem Bachelorprojekt entwickelt haben, und schlussfolgern außerdem, dass Scrum und agile Praktiken unseren Entwicklungsprozess verbessert haben.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 153 
KW  - machine learning
KW  - deep learning
KW  - convolutional neural networks
KW  - pose estimation
KW  - decubitus
KW  - telemedicine
KW  - maschinelles Lernen
KW  - tiefes Lernen
KW  - gefaltete neuronale Netze
KW  - Posenabschätzung
KW  - Dekubitus
KW  - Telemedizin
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-567196
SN  - 978-3-86956-551-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 153
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Vollmer, Jan Ole
A1  - Trapp, Matthias
A1  - Schumann, Heidrun
A1  - Döllner, Jürgen Roland Friedrich
T1  - Hierarchical spatial aggregation for level-of-detail visualization of 3D thematic data
JF  - ACM transactions on spatial algorithms and systems
N2  - Thematic maps are a common tool to visualize semantic data with a spatial reference. Combining thematic data with a geometric representation of their natural reference frame aids the viewer’s ability in gaining an overview, as well as perceiving patterns with respect to location; however, as the amount of data for visualization continues to increase, problems such as information overload and visual clutter impede perception, requiring data aggregation and level-of-detail visualization techniques. While existing aggregation techniques for thematic data operate in a 2D reference frame (i.e., map), we present two aggregation techniques for 3D spatial and spatiotemporal data mapped onto virtual city models that hierarchically aggregate thematic data in real time during rendering to support on-the-fly and on-demand level-of-detail generation. An object-based technique performs aggregation based on scene-specific objects and their hierarchy to facilitate per-object analysis, while the scene-based technique aggregates data solely based on spatial locations, thus supporting visual analysis of data with arbitrary reference geometry. Both techniques can apply different aggregation functions (mean, minimum, and maximum) for ordinal, interval, and ratio-scaled data and can be easily extended with additional functions. Our implementation utilizes the programmable graphics pipeline and requires suitably encoded data, i.e., textures or vertex attributes. We demonstrate the application of both techniques using real-world datasets, including solar potential analyses and the propagation of pressure waves in a virtual city model.
KW  - Level-of-detail visualization
KW  - spatial aggregation
KW  - real-time rendering
Y1  - 2018
U6  - https://doi.org/10.1145/3234506
SN  - 2374-0353
SN  - 2374-0361
VL  - 4
IS  - 3
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - THES
A1  - Vogel, Thomas
T1  - Model-driven engineering of self-adaptive software
T1  - Modellgetriebene Entwicklung von Selbst-Adaptiver Software
N2  - The development of self-adaptive software requires the engineering of an adaptation engine that controls the underlying adaptable software by a feedback loop. State-of-the-art approaches prescribe the feedback loop in terms of numbers, how the activities (e.g., monitor, analyze, plan, and execute (MAPE)) and the knowledge are structured to a feedback loop, and the type of knowledge. Moreover, the feedback loop is usually hidden in the implementation or framework and therefore not visible in the architectural design. Additionally, an adaptation engine often employs runtime models that either represent the adaptable software or capture strategic knowledge such as reconfiguration strategies. State-of-the-art approaches do not systematically address the interplay of such runtime models, which would otherwise allow developers to freely design the entire feedback loop.

This thesis presents ExecUtable RuntimE MegAmodels (EUREMA), an integrated model-driven engineering (MDE) solution that rigorously uses models for engineering feedback loops. EUREMA provides a domain-specific modeling language to specify and an interpreter to execute feedback loops. The language allows developers to freely design a feedback loop concerning the activities and runtime models (knowledge) as well as the number of feedback loops. It further supports structuring the feedback loops in the adaptation engine that follows a layered architectural style. Thus, EUREMA makes the feedback loops explicit in the design and enables developers to reason about design decisions. 

To address the interplay of runtime models, we propose the concept of a runtime megamodel, which is a runtime model that contains other runtime models as well as activities (e.g., MAPE) working on the contained models. This concept is the underlying principle of EUREMA. The resulting EUREMA (mega)models are kept alive at runtime and they are directly executed by the EUREMA interpreter to run the feedback loops. Interpretation provides the flexibility to dynamically adapt a feedback loop. In this context, EUREMA supports engineering self-adaptive software in which feedback loops run independently or in a coordinated fashion within the same layer as well as on top of each other in different layers of the adaptation engine. Moreover, we consider preliminary means to evolve self-adaptive software by providing a maintenance interface to the adaptation engine.

This thesis discusses in detail EUREMA by applying it to different scenarios such as single, multiple, and stacked feedback loops for self-repairing and self-optimizing the mRUBiS application. Moreover, it investigates the design and expressiveness of EUREMA, reports on experiments with a running system (mRUBiS) and with alternative solutions, and assesses EUREMA with respect to quality attributes such as performance and scalability.

The conducted evaluation provides evidence that EUREMA as an integrated and open MDE approach for engineering self-adaptive software seamlessly integrates the development and runtime environments using the same formalism to specify and execute feedback loops, supports the dynamic adaptation of feedback loops in layered architectures, and achieves an efficient execution of feedback loops by leveraging incrementality.
N2  - Die Entwicklung von selbst-adaptiven Softwaresystemen erfordert die Konstruktion einer geschlossenen Feedback Loop, die das System zur Laufzeit beobachtet und falls nötig anpasst. Aktuelle Konstruktionsverfahren schreiben eine bestimmte Feedback Loop im Hinblick auf Anzahl und Struktur vor. Die Struktur umfasst die vorhandenen Aktivitäten der Feedback Loop (z. B. Beobachtung, Analyse, Planung und Ausführung einer Adaption) und die Art des hierzu verwendeten Systemwissens. Dieses System- und zusätzlich das strategische Wissen (z. B. Adaptionsregeln) werden in der Regel in Laufzeitmodellen erfasst und in die Feedback Loop integriert. Aktuelle Verfahren berücksichtigen jedoch nicht systematisch die Laufzeitmodelle und deren Zusammenspiel, so dass Entwickler die Feedback Loop nicht frei entwerfen und gestalten können. Folglich wird die Feedback Loop während des Entwurfs der Softwarearchitektur häufig nicht explizit berücksichtigt. 

Diese Dissertation stellt mit EUREMA ein neues Konstruktionsverfahren für Feedback Loops vor. Basierend auf Prinzipien der modellgetriebenen Entwicklung (MDE) setzt EUREMA auf die konsequente Nutzung von Modellen für die Konstruktion, Ausführung und Adaption von selbst-adaptiven Softwaresystemen. Hierzu wird eine domänenspezifische Modellierungssprache (DSL) vorgestellt, mit der Entwickler die Feedback Loop frei entwerfen und gestalten können, d. h. ohne Einschränkung bezüglich der Aktivitäten, Laufzeitmodelle und Anzahl der Feedback Loops. Zusätzlich bietet die DSL eine Architektursicht auf das System, die die Feedback Loops berücksichtigt. Daher stellt die DSL Konstrukte zur Verfügung, mit denen Entwickler während des Entwurfs der Architektur die Feedback Loops explizit definieren und berücksichtigen können.

Um das Zusammenspiel der Laufzeitmodelle zu erfassen, wird das Konzept eines sogenannten Laufzeitmegamodells vorgeschlagen, das alle Aktivitäten und Laufzeitmodelle einer Feedback Loop erfasst. Dieses Konzept dient als Grundlage der vorgestellten DSL. Die bei der Konstruktion und mit der DSL erzeugten (Mega-)Modelle werden zur Laufzeit bewahrt und von einem Interpreter ausgeführt, um das spezifizierte Adaptionsverhalten zu realisieren. Der Interpreteransatz bietet die notwendige Flexibilität, um das Adaptionsverhalten zur Laufzeit anzupassen. Dies ermöglicht über die Entwicklung von Systemen mit mehreren Feedback Loops auf einer Ebene hinaus das Schichten von Feedback Loops im Sinne einer adaptiven Regelung. Zusätzlich bietet EUREMA eine Schnittstelle für Wartungsprozesse an, um das Adaptionsverhalten im laufendem System anzupassen.

Die Dissertation diskutiert den EUREMA-Ansatz und wendet diesen auf verschiedene Problemstellungen an, u. a. auf einzelne, mehrere und koordinierte als auch geschichtete Feedback Loops. Als Anwendungsbeispiel dient die Selbstheilung und Selbstoptimierung des Online-Marktplatzes mRUBiS. Für die Evaluierung von EUREMA werden Experimente mit dem laufenden mRUBiS und mit alternativen Lösungen durchgeführt, das Design und die Ausdrucksmächtigkeit der DSL untersucht und Qualitätsmerkmale wie Performanz und Skalierbarkeit betrachtet. Die Ergebnisse der Evaluierung legen nahe, dass EUREMA als integrierter und offener Ansatz für die Entwicklung selbst-adaptiver Softwaresysteme folgende Beiträge zum Stand der Technik leistet: eine nahtlose Integration der Entwicklungs- und Laufzeitumgebung durch die konsequente Verwendung von Modellen, die dynamische Anpassung des Adaptionsverhaltens in einer Schichtenarchitektur und eine effiziente Ausführung von Feedback Loops durch inkrementelle Verarbeitungsschritte.
KW  - model-driven engineering
KW  - self-adaptive software
KW  - domain-specific modeling
KW  - runtime models
KW  - software evolution
KW  - modellgetriebene Entwicklung
KW  - Selbst-Adaptive Software
KW  - Domänenspezifische Modellierung
KW  - Laufzeitmodelle
KW  - Software-Evolution
Y1  - 2018
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-409755
ER  -