TY  - JOUR
A1  - Ehrig, Lukas
A1  - Wagner, Ann-Christin
A1  - Wolter, Heike
A1  - Correll, Christoph U.
A1  - Geisel, Olga
A1  - Konigorski, Stefan
T1  - FASDetect as a machine learning-based screening app for FASD in youth with ADHD
JF  - npj Digital Medicine
N2  - Fetal alcohol-spectrum disorder (FASD) is underdiagnosed and often misdiagnosed as attention-deficit/hyperactivity disorder (ADHD). Here, we develop a screening tool for FASD in youth with ADHD symptoms. To develop the prediction model, medical record data from a German University outpatient unit are assessed including 275 patients aged 0-19 years old with FASD with or without ADHD and 170 patients with ADHD without FASD aged 0-19 years old. We train 6 machine learning models based on 13 selected variables and evaluate their performance. Random forest models yield the best prediction models with a cross-validated AUC of 0.92 (95% confidence interval [0.84, 0.99]). Follow-up analyses indicate that a random forest model with 6 variables - body length and head circumference at birth, IQ, socially intrusive behaviour, poor memory and sleep disturbance - yields equivalent predictive accuracy. We implement the prediction model in a web-based app called FASDetect - a user-friendly, clinically scalable FASD risk calculator that is freely available at https://fasdetect.dhc-lab.hpi.de.
KW  - Medical research
KW  - Psychiatric disorders
Y1  - 2023
U6  - https://doi.org/10.1038/s41746-023-00864-1
SN  - 2398-6352
VL  - 6
IS  - 1
PB  - Macmillan Publishers Limited
CY  - Basingstoke
ER  - 
TY  - JOUR
A1  - Slosarek, Tamara
A1  - Ibing, Susanne
A1  - Schormair, Barbara
A1  - Heyne, Henrike
A1  - Böttinger, Erwin
A1  - Andlauer, Till
A1  - Schurmann, Claudia
T1  - Implementation and evaluation of personal genetic testing as part of genomics analysis courses in German universities
JF  - BMC Medical Genomics
N2  - Purpose
Due to the increasing application of genome analysis and interpretation in medical disciplines, professionals require adequate education. Here, we present the implementation of personal genotyping as an educational tool in two genomics courses targeting Digital Health students at the Hasso Plattner Institute (HPI) and medical students at the Technical University of Munich (TUM).

Methods
We compared and evaluated the courses and the students ' perceptions on the course setup using questionnaires.

Results
During the course, students changed their attitudes towards genotyping (HPI: 79% [15 of 19], TUM: 47% [25 of 53]). Predominantly, students became more critical of personal genotyping (HPI: 73% [11 of 15], TUM: 72% [18 of 25]) and most students stated that genetic analyses should not be allowed without genetic counseling (HPI: 79% [15 of 19], TUM: 70% [37 of 53]). Students found the personal genotyping component useful (HPI: 89% [17 of 19], TUM: 92% [49 of 53]) and recommended its inclusion in future courses (HPI: 95% [18 of 19], TUM: 98% [52 of 53]).

Conclusion
Students perceived the personal genotyping component as valuable in the described genomics courses. The implementation described here can serve as an example for future courses in Europe.
KW  - Genomics education
KW  - Personal genotyping
KW  - Personalized medicine
Y1  - 2023
U6  - https://doi.org/10.1186/s12920-023-01503-0
SN  - 1755-8794
VL  - 16
IS  - 1
PB  - BMC
CY  - London
ER  - 
TY  - THES
A1  - Taleb, Aiham
T1  - Self-supervised deep learning methods for medical image analysis
T1  - Selbstüberwachte Deep Learning Methoden für die medizinische Bildanalyse
N2  - Deep learning has seen widespread application in many domains, mainly for its ability to learn data representations from raw input data. Nevertheless, its success has so far been coupled with the availability of large annotated (labelled) datasets. This is a requirement that is difficult to fulfil in several domains, such as in medical imaging. Annotation costs form a barrier in extending deep learning to clinically-relevant use cases. The labels associated with medical images are scarce, since the generation of expert annotations of multimodal patient data at scale is non-trivial, expensive, and time-consuming. This substantiates the need for algorithms that learn from the increasing amounts of unlabeled data. Self-supervised representation learning algorithms offer a pertinent solution, as they allow solving real-world (downstream) deep learning tasks with fewer annotations. Self-supervised approaches leverage unlabeled samples to acquire generic features about different concepts, enabling annotation-efficient downstream task solving subsequently.
Nevertheless, medical images present multiple unique and inherent challenges for existing self-supervised learning approaches, which we seek to address in this thesis: (i) medical images are multimodal, and their multiple modalities are heterogeneous in nature and imbalanced in quantities, e.g. MRI and CT; (ii) medical scans are multi-dimensional, often in 3D instead of 2D; (iii) disease patterns in medical scans are numerous and their incidence exhibits a long-tail distribution, so it is oftentimes essential to fuse knowledge from different data modalities, e.g. genomics or clinical data, to capture disease traits more comprehensively; (iv) Medical scans usually exhibit more uniform color density distributions, e.g. in dental X-Rays, than natural images. Our proposed self-supervised methods meet these challenges, besides significantly reducing the amounts of required annotations.
We evaluate our self-supervised methods on a wide array of medical imaging applications and tasks. Our experimental results demonstrate the obtained gains in both annotation-efficiency and performance; our proposed methods outperform many approaches from related literature. Additionally, in case of fusion with genetic modalities, our methods also allow for cross-modal interpretability. In this thesis, not only we show that self-supervised learning is capable of mitigating manual annotation costs, but also our proposed solutions demonstrate how to better utilize it in the medical imaging domain. Progress in self-supervised learning has the potential to extend deep learning algorithms application to clinical scenarios.
N2  - Deep Learning findet in vielen Bereichen breite Anwendung, vor allem wegen seiner Fähigkeit, Datenrepräsentationen aus rohen Eingabedaten zu lernen. Dennoch war der Erfolg bisher an die Verfügbarkeit großer annotatierter Datensätze geknüpft. Dies ist eine Anforderung, die in verschiedenen Bereichen, z. B. in der medizinischen Bildgebung, schwer zu erfüllen ist. Die Kosten für die Annotation stellen ein Hindernis für die Ausweitung des Deep Learning auf klinisch relevante Anwendungsfälle dar. Die mit medizinischen Bildern verbundenen Annotationen sind rar, da die Erstellung von Experten Annotationen für multimodale Patientendaten in großem Umfang nicht trivial, teuer und zeitaufwändig ist. Dies unterstreicht den Bedarf an Algorithmen, die aus den wachsenden Mengen an unbeschrifteten Daten lernen. Selbstüberwachte Algorithmen für das Repräsentationslernen bieten eine mögliche Lösung, da sie die Lösung realer (nachgelagerter) Deep-Learning-Aufgaben mit weniger Annotationen ermöglichen. Selbstüberwachte Ansätze nutzen unannotierte Stichproben, um generisches Eigenschaften über verschiedene Konzepte zu erlangen und ermöglichen so eine annotationseffiziente Lösung nachgelagerter Aufgaben.
Medizinische Bilder stellen mehrere einzigartige und inhärente Herausforderungen für existierende selbstüberwachte Lernansätze dar, die wir in dieser Arbeit angehen wollen: (i) medizinische Bilder sind multimodal, und ihre verschiedenen Modalitäten sind von Natur aus heterogen und in ihren Mengen unausgewogen, z.B. (ii) medizinische Scans sind mehrdimensional, oft in 3D statt in 2D; (iii) Krankheitsmuster in medizinischen Scans sind zahlreich und ihre Häufigkeit weist eine Long-Tail-Verteilung auf, so dass es oft unerlässlich ist, Wissen aus verschiedenen Datenmodalitäten, z. B. Genomik oder klinische Daten, zu verschmelzen, um Krankheitsmerkmale umfassender zu erfassen; (iv) medizinische Scans weisen in der Regel eine gleichmäßigere Farbdichteverteilung auf, z. B. in zahnmedizinischen Röntgenaufnahmen, als natürliche Bilder. Die von uns vorgeschlagenen selbstüberwachten Methoden adressieren diese Herausforderungen und reduzieren zudem die Menge der erforderlichen Annotationen erheblich.
Wir evaluieren unsere selbstüberwachten Methoden in verschiedenen Anwendungen und Aufgaben der medizinischen Bildgebung. Unsere experimentellen Ergebnisse zeigen, dass die von uns vorgeschlagenen Methoden sowohl die Effizienz der Annotation als auch die Leistung steigern und viele Ansätze aus der verwandten Literatur übertreffen. Darüber hinaus ermöglichen unsere Methoden im Falle der Fusion mit genetischen Modalitäten auch eine modalübergreifende Interpretierbarkeit. In dieser Arbeit zeigen wir nicht nur, dass selbstüberwachtes Lernen in der Lage ist, die Kosten für manuelle Annotationen zu senken, sondern auch, wie man es in der medizinischen Bildgebung besser nutzen kann. Fortschritte beim selbstüberwachten Lernen haben das Potenzial, die Anwendung von Deep-Learning-Algorithmen auf klinische Szenarien auszuweiten.
KW  - Artificial Intelligence
KW  - machine learning
KW  - unsupervised learning
KW  - representation learning
KW  - Künstliche Intelligenz
KW  - maschinelles Lernen
KW  - Representationlernen
KW  - selbstüberwachtes Lernen
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-644089
ER  - 
TY  - JOUR
A1  - Shams, Boshra
A1  - Wang, Ziqian
A1  - Roine, Timo
A1  - Aydogan, Dogu Baran
A1  - Vajkoczy, Peter
A1  - Lippert, Christoph
A1  - Picht, Thomas
A1  - Fekonja, Lucius Samo
T1  - Machine learning-based prediction of motor status in glioma patients using diffusion MRI metrics along the corticospinal tract
JF  - Brain communications
N2  - Shams et al. report that glioma patients' motor status is predicted accurately by diffusion MRI metrics along the corticospinal tract based on support vector machine method, reaching an overall accuracy of 77%. They show that these metrics are more effective than demographic and clinical variables. 

Along tract statistics enables white matter characterization using various diffusion MRI metrics. These diffusion models reveal detailed insights into white matter microstructural changes with development, pathology and function. Here, we aim at assessing the clinical utility of diffusion MRI metrics along the corticospinal tract, investigating whether motor glioma patients can be classified with respect to their motor status. We retrospectively included 116 brain tumour patients suffering from either left or right supratentorial, unilateral World Health Organization Grades II, III and IV gliomas with a mean age of 53.51 +/- 16.32 years. Around 37% of patients presented with preoperative motor function deficits according to the Medical Research Council scale. At group level comparison, the highest non-overlapping diffusion MRI differences were detected in the superior portion of the tracts' profiles. Fractional anisotropy and fibre density decrease, apparent diffusion coefficient axial diffusivity and radial diffusivity increase. To predict motor deficits, we developed a method based on a support vector machine using histogram-based features of diffusion MRI tract profiles (e.g. mean, standard deviation, kurtosis and skewness), following a recursive feature elimination method. Our model achieved high performance (74% sensitivity, 75% specificity, 74% overall accuracy and 77% area under the curve). We found that apparent diffusion coefficient, fractional anisotropy and radial diffusivity contributed more than other features to the model. Incorporating the patient demographics and clinical features such as age, tumour World Health Organization grade, tumour location, gender and resting motor threshold did not affect the model's performance, revealing that these features were not as effective as microstructural measures. These results shed light on the potential patterns of tumour-related microstructural white matter changes in the prediction of functional deficits.
KW  - machine learning
KW  - support vector machine
KW  - tractography
KW  - diffusion MRI;
KW  - corticospinal tract
Y1  - 2022
U6  - https://doi.org/10.1093/braincomms/fcac141
SN  - 2632-1297
VL  - 4
IS  - 3
PB  - Oxford University Press
CY  - Oxford
ER  - 
TY  - JOUR
A1  - Ring, Raphaela M.
A1  - Eisenmann, Clemens
A1  - Kandil, Farid
A1  - Steckhan, Nico
A1  - Demmrich, Sarah
A1  - Klatte, Caroline
A1  - Kessler, Christian S.
A1  - Jeitler, Michael
A1  - Boschmann, Michael
A1  - Michalsen, Andreas
A1  - Blakeslee, Sarah B.
A1  - Stöckigt, Barbara
A1  - Stritter, Wiebke
A1  - Koppold-Liebscher, Daniela A.
T1  - Mental and behavioural responses to Bahá’í fasting: Looking behind the scenes of a religiously motivated intermittent fast using a mixed methods approach
JF  - Nutrients
N2  - Background/Objective: Historically, fasting has been practiced not only for medical but also for religious reasons. Baha'is follow an annual religious intermittent dry fast of 19 days. We inquired into motivation behind and subjective health impacts of Baha'i fasting. Methods: A convergent parallel mixed methods design was embedded in a clinical single arm observational study. Semi-structured individual interviews were conducted before (n = 7), during (n = 8), and after fasting (n = 8). Three months after the fasting period, two focus group interviews were conducted (n = 5/n = 3). A total of 146 Baha'i volunteers answered an online survey at five time points before, during, and after fasting. Results: Fasting was found to play a central role for the religiosity of interviewees, implying changes in daily structures, spending time alone, engaging in religious practices, and experiencing social belonging. Results show an increase in mindfulness and well-being, which were accompanied by behavioural changes and experiences of self-efficacy and inner freedom. Survey scores point to an increase in mindfulness and well-being during fasting, while stress, anxiety, and fatigue decreased. Mindfulness remained elevated even three months after the fast. Conclusion: Baha'i fasting seems to enhance participants' mindfulness and well-being, lowering stress levels and reducing fatigue. Some of these effects lasted more than three months after fasting.
KW  - intermittent food restriction
KW  - mindfulness
KW  - self-efficacy
KW  - well-being
KW  - mixed methods
KW  - health behaviour
KW  - coping ability
KW  - religiously motivated
KW  - dry fasting
Y1  - 2022
U6  - https://doi.org/10.3390/nu14051038
SN  - 2072-6643
VL  - 14
IS  - 5
PB  - MDPI
CY  - Basel
ER  - 
TY  - BOOK
A1  - Kuban, Robert
A1  - Rotta, Randolf
A1  - Nolte, Jörg
A1  - Chromik, Jonas
A1  - Beilharz, Jossekin Jakob
A1  - Pirl, Lukas
A1  - Friedrich, Tobias
A1  - Lenzner, Pascal
A1  - Weyand, Christopher
A1  - Juiz, Carlos
A1  - Bermejo, Belen
A1  - Sauer, Joao
A1  - Coelh, Leandro dos Santos
A1  - Najafi, Pejman
A1  - Pünter, Wenzel
A1  - Cheng, Feng
A1  - Meinel, Christoph
A1  - Sidorova, Julia
A1  - Lundberg, Lars
A1  - Vogel, Thomas
A1  - Tran, Chinh
A1  - Moser, Irene
A1  - Grunske, Lars
A1  - Elsaid, Mohamed Esameldin Mohamed
A1  - Abbas, Hazem M.
A1  - Rula, Anisa
A1  - Sejdiu, Gezim
A1  - Maurino, Andrea
A1  - Schmidt, Christopher
A1  - Hügle, Johannes
A1  - Uflacker, Matthias
A1  - Nozza, Debora
A1  - Messina, Enza
A1  - Hoorn, André van
A1  - Frank, Markus
A1  - Schulz, Henning
A1  - Alhosseini Almodarresi Yasin, Seyed Ali
A1  - Nowicki, Marek
A1  - Muite, Benson K.
A1  - Boysan, Mehmet Can
A1  - Bianchi, Federico
A1  - Cremaschi, Marco
A1  - Moussa, Rim
A1  - Abdel-Karim, Benjamin M.
A1  - Pfeuffer, Nicolas
A1  - Hinz, Oliver
A1  - Plauth, Max
A1  - Polze, Andreas
A1  - Huo, Da
A1  - Melo, Gerard de
A1  - Mendes Soares, Fábio
A1  - Oliveira, Roberto Célio Limão de
A1  - Benson, Lawrence
A1  - Paul, Fabian
A1  - Werling, Christian
A1  - Windheuser, Fabian
A1  - Stojanovic, Dragan
A1  - Djordjevic, Igor
A1  - Stojanovic, Natalija
A1  - Stojnev Ilic, Aleksandra
A1  - Weidmann, Vera
A1  - Lowitzki, Leon
A1  - Wagner, Markus
A1  - Ifa, Abdessatar Ben
A1  - Arlos, Patrik
A1  - Megia, Ana
A1  - Vendrell, Joan
A1  - Pfitzner, Bjarne
A1  - Redondo, Alberto
A1  - Ríos Insua, David
A1  - Albert, Justin Amadeus
A1  - Zhou, Lin
A1  - Arnrich, Bert
A1  - Szabó, Ildikó
A1  - Fodor, Szabina
A1  - Ternai, Katalin
A1  - Bhowmik, Rajarshi
A1  - Campero Durand, Gabriel
A1  - Shevchenko, Pavlo
A1  - Malysheva, Milena
A1  - Prymak, Ivan
A1  - Saake, Gunter
ED  - Meinel, Christoph
ED  - Polze, Andreas
ED  - Beins, Karsten
ED  - Strotmann, Rolf
ED  - Seibold, Ulrich
ED  - Rödszus, Kurt
ED  - Müller, Jürgen
T1  - HPI Future SOC Lab – Proceedings 2019
N2  - The “HPI Future SOC Lab” is a cooperation of the Hasso Plattner Institute (HPI) and industry partners. Its mission is to enable and promote exchange and interaction between the research community and the industry partners.
  The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores and 2 TB main memory. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies.
  This technical report presents results of research projects executed in 2019. Selected projects have presented their results on April 9th and November 12th 2019 at the Future SOC Lab Day events.
N2  - Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie.
  Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. 
  In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2019 vorgestellt.  Ausgewählte Projekte stellten ihre Ergebnisse am 09. April und 12. November 2019 im Rahmen des Future SOC Lab Tags vor.
T3  - Technische Berichte des Hasso-Plattner-Instituts für Digital Engineering an der Universität Potsdam - 158 
KW  - Future SOC Lab
KW  - research projects
KW  - multicore architectures
KW  - in-memory technology
KW  - cloud computing
KW  - machine learning
KW  - artifical intelligence
KW  - Future SOC Lab
KW  - Forschungsprojekte
KW  - Multicore Architekturen
KW  - In-Memory Technologie
KW  - Cloud Computing
KW  - maschinelles Lernen
KW  - künstliche Intelligenz
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-597915
SN  - 978-3-86956-564-4
SN  - 1613-5652
SN  - 2191-1665
IS  - 158
PB  - Universitätsverlag Potsdam
CY  - Potsdam
ER  - 
TY  - THES
A1  - Richly, Keven
T1  - Memory-efficient data management for spatio-temporal applications
BT  - workload-driven fine-grained configuration optimization for storing spatio-temporal data in columnar In-memory databases
N2  - The wide distribution of location-acquisition technologies means that large volumes of spatio-temporal data are continuously being accumulated. Positioning systems such as GPS enable the tracking of various moving objects' trajectories, which are usually represented by a chronologically ordered sequence of observed locations. The analysis of movement patterns based on detailed positional information creates opportunities for applications that can improve business decisions and processes in a broad spectrum of industries (e.g., transportation, traffic control, or medicine). Due to the large data volumes generated in these applications, the cost-efficient storage of spatio-temporal data is desirable, especially when in-memory database systems are used to achieve interactive performance requirements. 

To efficiently utilize the available DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes structures). By considering horizontal data partitioning, we can independently apply different tuning options on a fine-grained level. However, the selection of cost and performance-balancing configurations is challenging, due to the vast number of possible setups consisting of mutually dependent individual decisions. 

In this thesis, we introduce multiple approaches to improve spatio-temporal data management by automatically optimizing diverse tuning options for the application-specific access patterns and data characteristics. Our contributions are as follows:
(1) We introduce a novel approach to determine fine-grained table configurations for spatio-temporal workloads. Our linear programming (LP) approach jointly optimizes the (i) data compression, (ii) ordering, (iii) indexing, and (iv) tiering. We propose different models which address cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload, memory budgets, and data characteristics. To yield maintainable and robust configurations, we further extend our LP-based approach to incorporate reconfiguration costs as well as optimizations for multiple potential workload scenarios. 
(2) To optimize the storage layout of timestamps in columnar databases, we present a heuristic approach for the workload-driven combined selection of a data layout and compression scheme. By considering attribute decomposition strategies, we are able to apply application-specific optimizations that reduce the memory footprint and improve performance.   
(3) We introduce an approach that leverages past trajectory data to improve the dispatch processes of transportation network companies. Based on location probabilities, we developed risk-averse dispatch strategies that reduce critical delays.
(4) Finally, we used the use case of a transportation network company to evaluate our database optimizations on a real-world dataset. We demonstrate that workload-driven fine-grained optimizations allow us to reduce the memory footprint (up to 71% by equal performance) or increase the performance (up to 90% by equal memory size) compared to established rule-based heuristics. 

Individually, our contributions provide novel approaches to the current challenges in spatio-temporal data mining and database research. Combining them allows in-memory databases to store and process spatio-temporal data more cost-efficiently.
N2  - Durch die starke Verbreitung von Systemen zur Positionsbestimmung werden fortlaufend große Mengen an Bewegungsdaten mit einem räumlichen und zeitlichen Bezug gesammelt. Ortungssysteme wie GPS ermöglichen, die Bewegungen verschiedener Objekte (z. B. Personen oder Fahrzeuge) nachzuverfolgen. Diese werden in der Regel durch eine chronologisch geordnete Abfolge beobachteter Aufenthaltsorte repräsentiert. Die Analyse von Bewegungsmustern auf der Grundlage detaillierter Positionsinformationen schafft in unterschiedlichsten Branchen (z. B. Transportwesen, Verkehrssteuerung oder Medizin) die Möglichkeit Geschäftsentscheidungen und -prozesse zu verbessern. Aufgrund der großen Datenmengen, die bei diesen Anwendungen auftreten, stellt die kosteneffiziente Speicherung von Bewegungsdaten eine Herausforderung dar. Dies ist insbesondere der Fall, wenn Hauptspeicherdatenbanken zur Speicherung eingesetzt werden, um die Anforderungen bezüglich interaktiver Antwortzeiten zu erfüllen.

Um die verfügbaren Speicherkapazitäten effizient zu nutzen, unterstützen moderne Datenbanksysteme verschiedene Optimierungsmöglichkeiten, um den Speicherbedarf zu reduzieren (z. B. durch Datenkomprimierung) oder die Performance zu erhöhen (z. B. durch Indexstrukturen). Dabei ermöglicht eine horizontale Partitionierung der Daten, dass unabhängig voneinander verschiedene Optimierungen feingranular auf einzelnen Bereichen der Daten angewendet werden können. Die Auswahl von Konfigurationen, die sowohl die Kosten als auch Leistungsanforderungen berücksichtigen, ist jedoch aufgrund der großen Anzahl möglicher Kombinationen -- die aus voneinander abhängigen Einzelentscheidungen bestehen -- komplex.

In dieser Dissertation präsentieren wir mehrere Ansätze zur Verbesserung der Datenverwaltung, indem wir die Auswahl verschiedener Datenbankoptimierungen automatisch für die anwendungsspezifischen Zugriffsmuster und Dateneigenschaften anpassen. Diesbezüglich leistet die vorliegende Dissertation die folgenden Beiträge: (1) Wir stellen einen neuen Ansatz vor, um feingranulare Tabellenkonfigurationen für räumlich-zeitliche Workloads zu bestimmen. In diesem Zusammenhang optimiert unser Linear Programming (LP) Ansatz gemeinsam (i) die Datenkompression, (ii) die Sortierung, (iii) die Indizierung und (iv) die Datenplatzierung. Hierzu schlagen wir verschiedene Modelle mit unterschiedlichen Kostenabhängigkeiten vor, um optimierte Konfigurationen für einen gegebenen Workload, ein Speicherbudget und die vorliegenden Dateneigenschaften zu berechnen. Durch die Erweiterung des LP-basierten Ansatzes zur Berücksichtigung von Modifikationskosten und verschiedener potentieller Workloads ist es möglich, die Wartbarkeit und Robustheit der bestimmten Tabellenkonfiguration zu erhöhen.
(2) Um die Speicherung von Timestamps in spalten-orientierten Datenbanken zu optimieren, stellen wir einen heuristischen Ansatz für die kombinierte Auswahl eines Speicherlayouts und eines Kompressionsschemas vor. Zudem sind wir durch die Berücksichtigung von Strategien zur Aufteilung von Attributen in der Lage, anwendungsspezifische Optimierungen anzuwenden, die den Speicherbedarf reduzieren und die Performance verbessern.
(3) Wir stellen einen Ansatz vor, der in der Vergangenheit beobachtete Bewegungsmuster nutzt, um die Zuweisungsprozesse von Vermittlungsdiensten zur Personenbeförderung zu verbessern. Auf der Grundlage von Standortwahrscheinlichkeiten haben wir verschiedene Strategien für die Vergabe von Fahraufträgen an Fahrer entwickelt, die kritische Verspätungen reduzieren.
(4) Abschließend haben wir unsere Datenbankoptimierungen anhand eines realen Datensatzes eines Transportdienstleisters evaluiert. In diesem Zusammenhang zeigen wir, dass wir durch feingranulare workload-basierte Optimierungen den Speicherbedarf (um bis zu 71% bei vergleichbarer Performance) reduzieren oder die Performance (um bis zu 90% bei gleichem Speicherverbrauch) im Vergleich zu regelbasierten Heuristiken verbessern können.

Die einzelnen Beiträge stellen neuartige Ansätze für aktuelle Herausforderungen im Bereich des Data Mining und der Datenbankforschung dar. In Kombination ermöglichen sie eine kosteneffizientere Speicherung und Verarbeitung von Bewegungsdaten in Hauptspeicherdatenbanken.
KW  - spatio-temporal data management
KW  - trajectory data
KW  - columnar databases
KW  - in-memory data management
KW  - database tuning
KW  - spaltenorientierte Datenbanken
KW  - Datenbankoptimierung
KW  - Hauptspeicher Datenmanagement
KW  - Datenverwaltung für Daten mit räumlich-zeitlichem Bezug
KW  - Trajektoriendaten
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-635473
ER  - 
TY  - JOUR
A1  - Rosin, Paul L.
A1  - Lai, Yu-Kun
A1  - Mould, David
A1  - Yi, Ran
A1  - Berger, Itamar
A1  - Doyle, Lars
A1  - Lee, Seungyong
A1  - Li, Chuan
A1  - Liu, Yong-Jin
A1  - Semmo, Amir
A1  - Shamir, Ariel
A1  - Son, Minjung
A1  - Winnemöller, Holger
T1  - NPRportrait 1.0: A three-level benchmark for non-photorealistic rendering of portraits
JF  - Computational visual media
N2  - Recently, there has been an upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer (NST). However, the state of performance evaluation in this field is poor, especially compared to the norms in the computer vision and machine learning communities. Unfortunately, the task of evaluating image stylisation is thus far not well defined, since it involves subjective, perceptual, and aesthetic aspects. To make progress towards a solution, this paper proposes a new structured, three-level, benchmark dataset for the evaluation of stylised portrait images. Rigorous criteria were used for its construction, and its consistency was validated by user studies. Moreover, a new methodology has been developed for evaluating portrait stylisation algorithms, which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces. We perform evaluation for a wide variety of image stylisation methods (both portrait-specific and general purpose, and also both traditional NPR approaches and NST) using the new benchmark dataset.
KW  - non-photorealistic rendering (NPR)
KW  - image stylization
KW  - style transfer
KW  - portrait
KW  - evaluation
KW  - benchmark
Y1  - 2022
U6  - https://doi.org/10.1007/s41095-021-0255-3
SN  - 2096-0433
SN  - 2096-0662
VL  - 8
IS  - 3
SP  - 445
EP  - 465
PB  - Springer Nature
CY  - London
ER  - 
TY  - JOUR
A1  - Vitagliano, Gerardo
A1  - Hameed, Mazhar
A1  - Jiang, Lan
A1  - Reisener, Lucas
A1  - Wu, Eugene
A1  - Naumann, Felix
T1  - Pollock: a data loading benchmark
JF  - Proceedings of the VLDB Endowment
N2  - Any system at play in a data-driven project has a fundamental requirement: the ability to load data. The de-facto standard format to distribute and consume raw data is CSV. Yet, the plain text and flexible nature of this format make such files often difficult to parse and correctly load their content, requiring cumbersome data preparation steps. We propose a benchmark to assess the robustness of systems in loading data from non-standard CSV formats and with structural inconsistencies. First, we formalize a model to describe the issues that affect real-world files and use it to derive a systematic lpollutionz process to generate dialects for any given grammar. Our benchmark leverages the pollution framework for the csv format. To guide pollution, we have surveyed thousands of real-world, publicly available csv files, recording the problems we encountered. We demonstrate the applicability of our benchmark by testing and scoring 16 different systems: popular csv parsing frameworks, relational database tools, spreadsheet systems, and a data visualization tool.
Y1  - 2023
U6  - https://doi.org/10.14778/3594512.3594518
SN  - 2150-8097
VL  - 16
IS  - 8
SP  - 1870
EP  - 1882
PB  - Association for Computing Machinery
CY  - New York
ER  - 
TY  - JOUR
A1  - Wiemker, Veronika
A1  - Bunova, Anna
A1  - Neufeld, Maria
A1  - Gornyi, Boris
A1  - Yurasova, Elena
A1  - Konigorski, Stefan
A1  - Kalinina, Anna
A1  - Kontsevaya, Anna
A1  - Ferreira-Borges, Carina
A1  - Probst, Charlotte
T1  - Pilot study to evaluate usability and acceptability of the 'Animated Alcohol Assessment Tool' in Russian primary healthcare
JF  - Digital health
N2  - Background and aims: Accurate and user-friendly assessment tools quantifying alcohol consumption are a prerequisite to effective prevention and treatment programmes, including Screening and Brief Intervention. Digital tools offer new potential in this field. We developed the ‘Animated Alcohol Assessment Tool’ (AAA-Tool), a mobile app providing an interactive version of the World Health Organization's Alcohol Use Disorders Identification Test (AUDIT) that facilitates the description of individual alcohol consumption via culturally informed animation features. This pilot study evaluated the Russia-specific version of the Animated Alcohol Assessment Tool with regard to (1) its usability and acceptability in a primary healthcare setting, (2) the plausibility of its alcohol consumption assessment results and (3) the adequacy of its Russia-specific vessel and beverage selection. Methods: Convenience samples of 55 patients (47% female) and 15 healthcare practitioners (80% female) in 2 Russian primary healthcare facilities self-administered the Animated Alcohol Assessment Tool and rated their experience on the Mobile Application Rating Scale – User Version. Usage data was automatically collected during app usage, and additional feedback on regional content was elicited in semi-structured interviews. Results: On average, patients completed the Animated Alcohol Assessment Tool in 6:38 min (SD = 2.49, range = 3.00–17.16). User satisfaction was good, with all subscale Mobile Application Rating Scale – User Version scores averaging >3 out of 5 points. A majority of patients (53%) and practitioners (93%) would recommend the tool to ‘many people’ or ‘everyone’. Assessed alcohol consumption was plausible, with a low number (14%) of logically impossible entries. Most patients reported the Animated Alcohol Assessment Tool to reflect all vessels (78%) and all beverages (71%) they typically used. Conclusion: High acceptability ratings by patients and healthcare practitioners, acceptable completion time, plausible alcohol usage assessment results and perceived adequacy of region-specific content underline the Animated Alcohol Assessment Tool's potential to provide a novel approach to alcohol assessment in primary healthcare. After its validation, the Animated Alcohol Assessment Tool might contribute to reducing alcohol-related harm by facilitating Screening and Brief Intervention implementation in Russia and beyond.
KW  - Alcohol use assessment
KW  - Alcohol Use Disorders Identification Test
KW  - screening tools
KW  - digital health
KW  - mobile applications
KW  - Russia
KW  - primary healthcare
KW  - usability
KW  - acceptability
Y1  - 2022
U6  - https://doi.org/10.1177/20552076211074491
SN  - 2055-2076
VL  - 8
PB  - Sage Publications
CY  - London
ER  -