TY  - JOUR
A1  - Thamsen, Lauritz
A1  - Beilharz, Jossekin Jakob
A1  - Vinh Thuy Tran, 
A1  - Nedelkoski, Sasho
A1  - Kao, Odej
T1  - Mary, Hugo, and Hugo*
BT  - learning to schedule distributed data-parallel processing jobs on shared clusters
JF  - Concurrency and computation : practice & experience
N2  - Distributed data-parallel processing systems like MapReduce, Spark, and Flink are popular for analyzing large datasets using cluster resources. Resource management systems like YARN or Mesos in turn allow multiple data-parallel processing jobs to share cluster resources in temporary containers. Often, the containers do not isolate resource usage to achieve high degrees of overall resource utilization despite overprovisioning and the often fluctuating utilization of specific jobs. However, some combinations of jobs utilize resources better and interfere less with each other when running on the same shared nodes than others. This article presents an approach for improving the resource utilization and job throughput when scheduling recurring distributed data-parallel processing jobs in shared clusters. The approach is based on reinforcement learning and a measure of co-location goodness to have cluster schedulers learn over time which jobs are best executed together on shared resources. We evaluated this approach over the last years with three prototype schedulers that build on each other: Mary, Hugo, and Hugo*. For the evaluation we used exemplary Flink and Spark jobs from different application domains and clusters of commodity nodes managed by YARN. The results of these experiments show that our approach can increase resource utilization and job throughput significantly.
KW  - cluster resource management
KW  - distributed data-parallel processing
KW  - job
KW  - co-location
KW  - reinforcement learning
KW  - self-learning scheduler
Y1  - 2020
U6  - https://doi.org/10.1002/cpe.5823
SN  - 1532-0626
SN  - 1532-0634
VL  - 33
IS  - 18
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - THES
A1  - Tan, Jing
T1  - Multi-Agent Reinforcement Learning for Interactive Decision-Making
T1  - Multiagenten Verstärkendes Lernen für Interaktive Entscheidungsfindung
N2  - Distributed decision-making studies the choices made among a group of interactive and self-interested agents. Specifically, this thesis is concerned with the optimal sequence of choices an agent makes as it tries to maximize its achievement on one or multiple objectives in the dynamic environment. The optimization of distributed decision-making is important in many real-life applications, e.g., resource allocation (of products, energy, bandwidth, computing power, etc.) and robotics (heterogeneous agent cooperation on games or tasks), in various fields such as vehicular network, Internet of Things, smart grid, etc.
This thesis proposes three multi-agent reinforcement learning algorithms combined with game-theoretic tools to study strategic interaction between decision makers, using resource allocation in vehicular network as an example. Specifically, the thesis designs an interaction mechanism based on second-price auction, incentivizes the agents to maximize multiple short-term and long-term, individual and system objectives, and simulates a dynamic environment with realistic mobility data to evaluate algorithm performance and study agent behavior. 

Theoretical results show that the mechanism has Nash equilibria, is a maximization of social welfare and Pareto optimal allocation of resources in a stationary environment. Empirical results show that in the dynamic environment, our proposed learning algorithms outperform state-of-the-art algorithms in single and multi-objective optimization, and demonstrate very good generalization property in significantly different environments. Specifically, with the long-term multi-objective learning algorithm, we demonstrate that by considering the long-term impact of decisions, as well as by incentivizing the agents with a system fairness reward, the agents achieve better results in both individual and system objectives, even when their objectives are private, randomized, and changing over time. Moreover, the agents show competitive behavior to maximize individual payoff when resource is scarce, and cooperative behavior in achieving a system objective when resource is abundant; they also learn the rules of the game, without prior knowledge, to overcome disadvantages in initial parameters (e.g., a lower budget).

To address practicality concerns, the thesis also provides several computational performance improvement methods, and tests the algorithm in a single-board computer. Results show the feasibility of online training and inference in milliseconds. 

There are many potential future topics following this work. 1) The interaction mechanism can be modified into a double-auction, eliminating the auctioneer, resembling a completely distributed, ad hoc network; 2) the objectives are assumed to be independent in this thesis, there may be a more realistic assumption regarding correlation between objectives, such as a hierarchy of objectives; 3) current work limits information-sharing between agents, the setup befits applications with privacy requirements or sparse signaling; by allowing more information-sharing between the agents, the algorithms can be modified for more cooperative scenarios such as robotics.
N2  - Die Verteilte Entscheidungsfindung untersucht Entscheidungen innerhalb einer Gruppe von interaktiven und eigennützigen Agenten. Diese Arbeit befasst sich insbesondere mit der optimalen Folge von Entscheidungen eines Agenten, der das Erreichen eines oder mehrerer Ziele in einer dynamischen Umgebung zu maximieren versucht. Die Optimierung einer verteilten Entscheidungsfindung ist in vielen alltäglichen Anwendungen relevant, z.B. zur Allokation von Ressourcen (Produkte, Energie, Bandbreite, Rechenressourcen etc.) und in der Robotik (heterogene Agenten-Kooperation in Spielen oder Aufträgen) in diversen Feldern wie Fahrzeugkommunikation, Internet of Things, Smart Grid, usw.
Diese Arbeit schlägt drei Multi-Agenten Reinforcement Learning Algorithmen kombiniert mit spieltheoretischen Ansätzen vor, um die strategische Interaktion zwischen Entscheidungsträgern zu untersuchen. Dies wird am Beispiel einer Ressourcenallokation in der Fahrzeug-zu-X-Kommunikation (vehicle-to-everything) gezeigt. Speziell wird in der Arbeit ein Interaktionsmechanismus entwickelt, der auf Basis einer Zweitpreisauktion den Agenten zur Maximierung mehrerer kurz- und langfristiger Ziele sowie individueller und Systemziele anregt. Dabei wird eine dynamische Umgebung mit realistischen Mobilitätsdaten simuliert, um die Leistungsfähigkeit des Algorithmus zu evaluieren und das Agentenverhalten zu untersuchen.

Eine theoretische Analyse zeigt, dass bei diesem Mechanismus das Nash-Gleichgewicht sowie eine Maximierung von Wohlfahrt und Pareto-optimaler Ressourcenallokation in einer statischen Umgebung vorliegen. Empirische Untersuchungen ergeben, dass in einer dynamischen Umgebung der vorgeschlagene Lernalgorithmus den aktuellen Stand der Technik bei ein- und mehrdimensionaler Optimierung übertrifft, und dabei sehr gut auch auf stark abweichende Umgebungen generalisiert werden kann.

Speziell mit dem langfristigen mehrdimensionalen Lernalgorithmus wird gezeigt, dass bei Berücksichtigung von langfristigen Auswirkungen von Entscheidungen, als auch durch einen Anreiz zur Systemgerechtigkeit, die Agenten in individuellen als auch Systemzielen bessere Ergebnisse liefern, und das auch, wenn ihre Ziele privat, zufällig und zeitveränderlich sind. Weiter zeigen die Agenten Wettbewerbsverhalten, um ihre eigenen Ziele zu maximieren, wenn die Ressourcen knapp sind, und kooperatives Verhalten, um Systemziele zu erreichen, wenn die Ressourcen ausreichend sind. Darüber hinaus lernen sie die Ziele des Spiels ohne vorheriges Wissen über dieses, um Startschwierigkeiten, wie z.B. ein niedrigeres Budget, zu überwinden.

Für die praktische Umsetzung zeigt diese Arbeit auch mehrere Methoden auf, welche die Rechenleistung verbessern können, und testet den Algorithmus auf einem handelsüblichen Einplatinencomputer. Die Ergebnisse zeigen die Durchführbarkeit von inkrementellem Lernen und Inferenz innerhalb weniger Millisekunden auf. Ausgehend von den Ergebnissen dieser Arbeit könnten sich verschiedene Forschungsfragen anschließen: 1) Der Interaktionsmechanismus kann zu einer Doppelauktion verändert und dabei der Auktionator entfernt werden. Dies würde einem vollständig verteilten Ad-Hoc-Netzwerk entsprechen. 2) Die Ziele werden in dieser Arbeit als unabhängig betrachtet. Es könnte eine Korrelation zwischen mehreren Zielen angenommen werden, so wie eine Zielhierarchie. 3) Die aktuelle Arbeit begrenzt den Informationsaustausch zwischen Agenten. Diese Annahme passt zu Anwendungen mit Anforderungen an den Schutz der Privatsphäre oder bei spärlichen Signalen. Indem der Informationsaustausch erhöht wird, könnte der Algorithmus auf stärker kooperative Anwendungen wie z.B. in der Robotik erweitert werden.
KW  - V2X
KW  - distributed systems
KW  - reinforcement learning
KW  - game theory
KW  - auction
KW  - decision making
KW  - behavioral sciences
KW  - multi-objective
KW  - V2X
KW  - Verteilte Systeme
KW  - Spieltheorie
KW  - Auktion
KW  - Entscheidungsfindung
KW  - Verhaltensforschung
KW  - verstärkendes Lernen
KW  - Multiziel
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-607000
ER  - 
TY  - JOUR
A1  - Panzer, Marcel
A1  - Bender, Benedict
T1  - Deep reinforcement learning in production systems
BT  - a systematic literature review
JF  - International Journal of Production Research
N2  - Shortening product development cycles and fully customizable products pose major challenges for production systems. These not only have to cope with an increased product diversity but also enable high throughputs and provide a high adaptability and robustness to process variations and unforeseen incidents. To overcome these challenges, deep Reinforcement Learning (RL) has been increasingly applied for the optimization of production systems. Unlike other machine learning methods, deep RL operates on recently collected sensor-data in direct interaction with its environment and enables real-time responses to system changes. Although deep RL is already being deployed in production systems, a systematic review of the results has not yet been established. The main contribution of this paper is to provide researchers and practitioners an overview of applications and to motivate further implementations and research of deep RL supported production systems. Findings reveal that deep RL is applied in a variety of production domains, contributing to data-driven and flexible processes. In most applications, conventional methods were outperformed and implementation efforts or dependence on human experience were reduced. Nevertheless, future research must focus more on transferring the findings to real-world systems to analyze safety aspects and demonstrate reliability under prevailing conditions.
KW  - Machine learning
KW  - reinforcement learning
KW  - production control
KW  - production planning
KW  - manufacturing processes
KW  - systematic literature review
Y1  - 2021
U6  - https://doi.org/10.1080/00207543.2021.1973138
SN  - 1366-588X
SN  - 0020-7543
VL  - 13
IS  - 60
PB  - Taylor & Francis
CY  - London
ER  - 
TY  - THES
A1  - Panzer, Marcel
T1  - Design of a hyper-heuristics based control framework for modular production systems
T1  - Design eines auf Hyperheuristiken basierenden Steuerungsframeworks für modulare Produktionssysteme
N2  - Volatile supply and sales markets, coupled with increasing product individualization and complex production processes, present significant challenges for manufacturing companies. These must navigate and adapt to ever-shifting external and internal factors while ensuring robustness against process variabilities and unforeseen events. This has a pronounced impact on production control, which serves as the operational intersection between production planning and the shop- floor resources, and necessitates the capability to manage intricate process interdependencies effectively. Considering the increasing dynamics and product diversification, alongside the need to maintain constant production performances, the implementation of innovative control strategies becomes crucial.
In recent years, the integration of Industry 4.0 technologies and machine learning methods has gained prominence in addressing emerging challenges in production applications. Within this context, this cumulative thesis analyzes deep learning based production systems based on five publications. Particular attention is paid to the applications of deep reinforcement learning, aiming to explore its potential in dynamic control contexts. Analysis reveal that deep reinforcement learning excels in various applications, especially in dynamic production control tasks. Its efficacy can be attributed to its interactive learning and real-time operational model. However, despite its evident utility, there are notable structural, organizational, and algorithmic gaps in the prevailing research. A predominant portion of deep reinforcement learning based approaches is limited to specific job shop scenarios and often overlooks the potential synergies in combined resources. Furthermore, it highlights the rare implementation of multi-agent systems and semi-heterarchical systems in practical settings. A notable gap remains in the integration of deep reinforcement learning into a hyper-heuristic.
To bridge these research gaps, this thesis introduces a deep reinforcement learning based hyper- heuristic for the control of modular production systems, developed in accordance with the design science research methodology. Implemented within a semi-heterarchical multi-agent framework, this approach achieves a threefold reduction in control and optimisation complexity while ensuring high scalability, adaptability, and robustness of the system. In comparative benchmarks, this control methodology outperforms rule-based heuristics, reducing throughput times and tardiness, and effectively incorporates customer and order-centric metrics. The control artifact facilitates a rapid scenario generation, motivating for further research efforts and bridging the gap to real-world applications. The overarching goal is to foster a synergy between theoretical insights and practical solutions, thereby enriching scientific discourse and addressing current industrial challenges.
N2  - Volatile Beschaffungs- und Absatzmärkte sowie eine zunehmende Produktindividualisierung konfrontieren Fertigungsunternehmen mit beträchtlichen Herausforderungen. Diese erfordern eine Anpassung der Produktion an sich ständig wechselnde externe Einflüsse und eine hohe Prozessrobustheit gegenüber unvorhersehbaren Schwankungen. Ein Schlüsselelement in diesem Kontext ist die Produktionssteuerung, die als operative Schnittstelle zwischen der Produktions- planung und den Fertigungsressourcen fungiert und eine effiziente Handhabung zahlreicher Prozessinterdependenzen sicherstellen muss. Angesichts dieser gesteigerten Produktionsdynamik und Produktvielfalt rücken innovative Steuerungsansätze in den Vordergrund.
In jüngerer Zeit wurden daher verstärkt Industrie-4.0-Ansätze und Methoden des maschinellen Lernens betrachtet. Im Kontext der aktuellen Forschung analysiert die vorliegende kumulative Arbeit Deep-Learning basierte Produktionssysteme anhand von fünf Publikationen. Hierbei wird ein besonderes Augenmerk auf die Anwendungen des Deep Reinforcement Learning gelegt, um dessen Potenzial zu ergründen. Die Untersuchungen zeigen, dass das Deep Reinforcement Learning in vielen Produktionsanwendungen sowohl herkömmlichen Ansätzen als auch an- deren Deep-Learning Werkzeugen überlegen ist. Diese Überlegenheit ergibt sich vor allem aus dem interaktiven Lernprinzip und der direkten Interaktion mit der Umwelt, was es für die dynamische Produktionssteuerung besonders geeignet macht. Dennoch werden strukturelle, organisatorische und algorithmische Forschungslücken identifiziert. Die überwiegende Mehrheit der untersuchten Ansätze fokussiert sich auf Werkstattfertigungen und vernachlässigt dabei potenzielle Prozesssynergien modularer Produktionssysteme. Ferner zeigt sich, dass Multi- Agenten- und Mehr-Ebenen-Systeme sowie die Kombination verschiedener algorithmischer Ansätze nur selten zur Anwendung kommen.
Um diese Forschungslücken zu adressieren, wird eine auf Deep Reinforcement Learning basierende Hyper-Heuristik für die Steuerung modularer Produktionssysteme vorgestellt, die nach der Design Science Research Methodology entwickelt wird. Ein semi-heterarchisches Multi-Agenten-System ermöglicht eine dreifache Reduktion der Steuerungs- und Optimierungs- komplexität und gewährleistet gleichzeitig eine hohe Systemadaptabilität und -robustheit. In Benchmarks übertrifft das Steuerungskonzept regelbasierte Ansätze, minimiert Durchlaufzeiten und Verspätungen und berücksichtigt kunden- sowie auftragsorientierte Kennzahlen. Die ent- wickelte Steuerungsmethodik ermöglicht einen schnellen Szenarienentwurf, um dadurch weitere Forschungsbemühungen zu stimulieren und die bestehende Transferlücke zur Realität weiter zu überbrücken. Das Ziel dieser Forschungsarbeit ist es, eine Synergie zwischen theoretischen Erkenntnissen und Praxis-relevanten Lösungen zu schaffen, um sowohl den wissenschaftlichen Diskurs zu bereichern als auch Antworten auf aktuelle industrielle Herausforderungen zu bieten.
KW  - modular production
KW  - deep learning
KW  - modulare Produktion
KW  - Produktionssteuerung
KW  - Deep Learning
KW  - Reinforcement Learning
KW  - Simulation
KW  - production control
KW  - reinforcement learning
KW  - simulation
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-633006
ER  - 
TY  - JOUR
A1  - Nebe, Stephan
A1  - Kroemer, Nils B.
A1  - Schad, Daniel
A1  - Bernhardt, Nadine
A1  - Sebold, Miriam Hannah
A1  - Mueller, Dirk K.
A1  - Scholl, Lucie
A1  - Kuitunen-Paul, Sören
A1  - Heinz, Andreas
A1  - Rapp, Michael Armin
A1  - Huys, Quentin J. M.
A1  - Smolka, Michael N.
T1  - No association of goal-directed and habitual control with alcohol consumption in young adults
JF  - Addiction biology
N2  - Alcohol dependence is a mental disorder that has been associated with an imbalance in behavioral control favoring model-free habitual over model-based goal-directed strategies. It is as yet unknown, however, whether such an imbalance reflects a predisposing vulnerability or results as a consequence of repeated and/or excessive alcohol exposure. We, therefore, examined the association of alcohol consumption with model-based goal-directed and model-free habitual control in 188 18-year-old social drinkers in a two-step sequential decision-making task while undergoing functional magnetic resonance imaging before prolonged alcohol misuse could have led to severe neurobiological adaptations. Behaviorally, participants showed a mixture of model-free and model-based decision-making as observed previously. Measures of impulsivity were positively related to alcohol consumption. In contrast, neither model-free nor model-based decision weights nor the trade-off between them were associated with alcohol consumption. There were also no significant associations between alcohol consumption and neural correlates of model-free or model-based decision quantities in either ventral striatum or ventromedial prefrontal cortex. Exploratory whole-brain functional magnetic resonance imaging analyses with a lenient threshold revealed early onset of drinking to be associated with an enhanced representation of model-free reward prediction errors in the posterior putamen. These results suggest that an imbalance between model-based goal-directed and model-free habitual control might rather not be a trait marker of alcohol intake per se.
KW  - alcohol
KW  - goal-directed
KW  - reinforcement learning
Y1  - 2017
U6  - https://doi.org/10.1111/adb.12490
SN  - 1355-6215
SN  - 1369-1600
VL  - 23
IS  - 1
SP  - 379
EP  - 393
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - THES
A1  - Maier, Corinna
T1  - Bayesian data assimilation and reinforcement learning for model-informed precision dosing in oncology
T1  - Bayes’sche Datenassimilation und Reinforcement Learning für die modellinformierte Präzisionsdosierung in der Onkologie
N2  - While patients are known to respond differently to drug therapies, current clinical practice often still follows a standardized dosage regimen for all patients. For drugs with a narrow range of both effective and safe concentrations, this approach may lead to a high incidence of adverse events or subtherapeutic dosing in the presence of high patient variability. Model-informedprecision dosing (MIPD) is a quantitative approach towards dose individualization based on mathematical modeling of dose-response relationships integrating therapeutic drug/biomarker monitoring (TDM) data. MIPD may considerably improve the efficacy and safety of many drug therapies. Current MIPD approaches, however, rely either on pre-calculated dosing tables or on simple point predictions of the therapy outcome. These
approaches lack a quantification of uncertainties and the ability to account for effects that are delayed. In addition, the underlying models are not improved while applied to patient data. Therefore, current approaches are not well suited for informed clinical decision-making based on a differentiated understanding of the individually predicted therapy outcome.

The objective of this thesis is to develop mathematical approaches for MIPD, which (i) provide efficient fully Bayesian forecasting of the individual therapy outcome including associated uncertainties, (ii) integrate Markov decision processes via reinforcement learning (RL) for a comprehensive decision framework for dose individualization, (iii) allow for continuous learning across patients and hospitals. Cytotoxic anticancer chemotherapy with its major dose-limiting toxicity, neutropenia, serves as a therapeutically relevant application example.

For more comprehensive therapy forecasting, we apply Bayesian data assimilation (DA) approaches, integrating patient-specific TDM data into mathematical models of chemotherapy-induced neutropenia that build on prior population analyses. The value of uncertainty quantification is demonstrated as it allows reliable computation of the patient-specific probabilities of relevant clinical quantities, e.g., the neutropenia grade. In view of novel home monitoring devices that increase the amount of TDM data available, the data processing of
sequential DA methods proves to be more efficient and facilitates handling of the variability between dosing events.

By transferring concepts from DA and RL we develop novel approaches for MIPD. While DA-guided dosing integrates individualized uncertainties into dose selection, RL-guided dosing provides a framework to consider delayed effects of dose selections. The combined
DA-RL approach takes into account both aspects simultaneously and thus represents a holistic approach towards MIPD. Additionally, we show that RL can be used to gain insights into important patient characteristics for dose selection. The novel dosing strategies substantially reduce the occurrence of both subtherapeutic and life-threatening neutropenia grades in a simulation study based on a recent clinical study (CEPAC-TDM trial) compared to currently used MIPD approaches.

If MIPD is to be implemented in routine clinical practice, a certain model bias with respect to the underlying model is inevitable, as the models are typically based on data from comparably small clinical trials that reflect only to a limited extent the diversity in real-world patient populations. We propose a sequential hierarchical Bayesian inference framework that enables continuous cross-patient learning to learn the underlying model parameters of the target patient population. It is important to note that the approach only requires summary information of the individual patient data to update the model. This separation of the individual inference from population inference enables implementation across different centers of care.

The proposed approaches substantially improve current MIPD approaches, taking into account new trends in health care and aspects of practical applicability. They enable progress towards more informed clinical decision-making, ultimately increasing patient benefits beyond the current practice.
N2  - Obwohl Patienten sehr unterschiedlich auf medikamentöse Therapien ansprechen, werden in der klinischen Praxis häufig noch standardisierte Dosierungsschemata angewendet. Bei Arzneimitteln mit engen therapeutischen Fenstern zwischen minimal wirksamen und toxischen Konzentrationen kann dieser Ansatz bei hoher interindividueller Variabilität zu häufigem Auftreten von Toxizitäten oder subtherapeutischen Konzentrationen führen. Die modellinformierte Präzisionsdosierung (MIPD) ist ein quantitativer Ansatz zur Dosisindividualisierung, der auf der mathematischen Modellierung von Dosis-Wirkungs-Beziehungen beruht und Daten aus dem therapeutischen Drug/Biomarker-Monitoring (TDM) einbezieht. Die derzeitigen MIPD-Ansätze verwenden entweder Dosierungstabellen oder einfache Punkt-Vorhersagen des Therapieverlaufs. Diesen Ansätzen fehlt eine Quantifizierung der Unsicherheiten, verzögerte Effekte werden nicht berücksichtigt und die zugrunde liegenden Modelle werden im Laufe der Anwendung nicht verbessert. Daher sind die derzeitigen Ansätze nicht ideal für eine fundierte klinische Entscheidungsfindung auf Grundlage eines differenzierten Verständnisses des individuell vorhergesagten Therapieverlaufs.
Das Ziel dieser Arbeit ist es, mathematische Ansätze für das MIPD zu entwickeln, die (i) eine effiziente, vollständig Bayes’sche Vorhersage des individuellen Therapieverlaufs einschließlich der damit verbundenen Unsicherheiten ermöglichen, (ii) Markov-Entscheidungsprozesse mittels Reinforcement Learning (RL) in einen umfassenden Entscheidungsrahmen zur Dosisindividualisierung integrieren, und (iii) ein kontinuierliches Lernen zwischen Patienten erlauben. Die antineoplastische Chemotherapie mit ihrer wichtigen dosislimitierenden Toxizität, der Neutropenie, dient als therapeutisch relevantes Anwendungsbeispiel.
Für eine umfassendere Therapievorhersage wenden wir Bayes’sche Datenassimilationsansätze (DA) an, um TDM-Daten in mathematische Modelle der Chemotherapie-induzierten Neutropenie zu integrieren. Wir zeigen, dass die Quantifizierung von Unsicherheiten einen großen Mehrwert bietet, da sie eine zuverlässige Berechnung der Wahrscheinlichkeiten relevanter klinischer Größen, z.B. des Neutropeniegrades, ermöglicht. Im Hinblick auf neue Home-Monitoring-Geräte, die die Anzahl der verfügbaren TDM-Daten erhöhen, erweisen sich sequenzielle DA-Methoden als effizienter und erleichtern den Umgang mit der Unsicherheit zwischen Dosierungsereignissen. Basierend auf Konzepten aus DA und RL, entwickeln wir neue Ansätze für MIPD.
Während die DA-geleitete Dosierung individualisierte Unsicherheiten in die Dosisauswahl integriert, berücksichtigt die RL-geleitete Dosierung verzögerte Effekte der Dosisauswahl. Der kombinierte DA-RL-Ansatz vereint beide Aspekte und stellt somit einen ganzheitlichen Ansatz für MIPD dar. Zusätzlich zeigen wir, dass RL Informationen über die für die Dosisauswahl relevanten Patientencharakteristika liefert. Der Vergleich zu derzeit verwendeten MIPD Ansätzen in einer auf einer klinischen Studie (CEPAC-TDM-Studie) basierenden Simulationsstudie zeigt, dass die entwickelten Dosierungsstrategien das Auftreten subtherapeutischer Konzentrationen sowie lebensbedrohlicher Neutropenien drastisch reduzieren.
Wird MIPD in der klinischen Routine eingesetzt, ist eine gewisse Modellverzerrung unvermeidlich. Die Modelle basieren in der Regel auf Daten aus vergleichsweise kleinen klinischen Studien, die die Heterogenität realer Patientenpopulationen nur begrenzt widerspiegeln. Wir schlagen einen sequenziellen hierarchischen Bayes’schen Inferenzrahmen vor, der ein kontinuierliches patientenübergreifendes Lernen ermöglicht, um die zugrunde liegenden Modellparameter der Ziel-Patientenpopulation zu erlernen. Zur Aktualisierung des Modells erfordert dieser Ansatz lediglich zusammenfassende Informationen der individuellen Patientendaten, was eine Umsetzung über verschiedene Versorgungszentren hinweg erlaubt.
Die vorgeschlagenen Ansätze verbessern die derzeitigen MIPD-Ansätze erheblich, wobei neue Trends in der Gesundheitsversorgung und Aspekte der praktischen Anwendbarkeit berücksichtigt werden. Damit stellen sie einen Fortschritt in Richtung einer fundierteren klinischen Entscheidungsfindung dar.
KW  - data assimilation
KW  - Datenassimilation
KW  - reinforcement learning
KW  - model-informed precision dosing
KW  - pharmacometrics
KW  - oncology
KW  - modellinformierte Präzisionsdosierung
KW  - Onkologie
KW  - Pharmakometrie
KW  - Reinforcement Learning
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-515870
ER  - 
TY  - THES
A1  - Koßmann, Jan
T1  - Unsupervised database optimization
BT  - efficient index selection & data dependency-driven query optimization
N2  - The amount of data stored in databases and the complexity of database workloads are ever- increasing. Database management systems (DBMSs) offer many configuration options, such as index creation or unique constraints, which must be adapted to the specific instance to efficiently process large volumes of data. Currently, such database optimization is complicated, manual work performed by highly skilled database administrators (DBAs). In cloud scenarios, manual database optimization even becomes infeasible: it exceeds the abilities of the best DBAs due to the enormous number of deployed DBMS instances (some providers maintain millions of instances), missing domain knowledge resulting from data privacy requirements, and the complexity of the configuration tasks.

Therefore, we investigate how to automate the configuration of DBMSs efficiently with the help of unsupervised database optimization. While there are numerous configuration options, in this thesis, we focus on automatic index selection and the use of data dependencies, such as functional dependencies, for query optimization. Both aspects have an extensive performance impact and complement each other by approaching unsupervised database optimization from different perspectives.

Our contributions are as follows: (1) we survey automated state-of-the-art index selection algorithms regarding various criteria, e.g., their support for index interaction. We contribute an extensible platform for evaluating the performance of such algorithms with industry-standard datasets and workloads. The platform is well-received by the community and has led to follow-up research. With our platform, we derive the strengths and weaknesses of the investigated algorithms. We conclude that existing solutions often have scalability issues and cannot quickly determine (near-)optimal solutions for large problem instances. (2) To overcome these limitations, we present two new algorithms. Extend determines (near-)optimal solutions with an iterative heuristic. It identifies the best index configurations for the evaluated benchmarks. Its selection runtimes are up to 10 times lower compared with other near-optimal approaches. SWIRL is based on reinforcement learning and delivers solutions instantly. These solutions perform within 3 % of the optimal ones. Extend and SWIRL are available as open-source implementations.

(3) Our index selection efforts are complemented by a mechanism that analyzes workloads to determine data dependencies for query optimization in an unsupervised fashion. We describe and classify 58 query optimization techniques based on functional, order, and inclusion dependencies as well as on unique column combinations. The unsupervised mechanism and three optimization techniques are implemented in our open-source research DBMS Hyrise. Our approach reduces the Join Order Benchmark’s runtime by 26 % and accelerates some TPC-DS queries by up to 58 times.

Additionally, we have developed a cockpit for unsupervised database optimization that allows interactive experiments to build confidence in such automated techniques. In summary, our contributions improve the performance of DBMSs, support DBAs in their work, and enable them to contribute their time to other, less arduous tasks.
N2  - Sowohl die Menge der in Datenbanken gespeicherten Daten als auch die Komplexität der Datenbank-Workloads steigen stetig an. Datenbankmanagementsysteme bieten viele Konfigurationsmöglichkeiten, zum Beispiel das Anlegen von Indizes oder die Definition von Unique Constraints. Diese Konfigurations-möglichkeiten müssen für die spezifische Datenbankinstanz angepasst werden, um effizient große Datenmengen verarbeiten zu können. Heutzutage wird die komplizierte Datenbankoptimierung manuell von hochqualifizierten Datenbankadministratoren vollzogen. In Cloud-Szenarien ist die manuelle Daten-bankoptimierung undenkbar: Die enorme Anzahl der verwalteten Systeme (einige Anbieter verwalten Millionen von Instanzen), das fehlende Domänenwissen durch Datenschutzanforderungen und die Kom-plexität der Konfigurationsaufgaben übersteigen die Fähigkeiten der besten Datenbankadministratoren.

Aus diesen Gründen betrachten wir, wie die Konfiguration von Datenbanksystemen mit der Hilfe von Unsupervised Database Optimization effizient automatisiert werden kann. Während viele Konfigura-tionsmöglichkeiten existieren, konzentrieren wir uns auf die automatische Indexauswahl und die Nutzung von Datenabhängigkeiten, zum Beispiel Functional Dependencies, für die Anfrageoptimierung. Beide Aspekte haben großen Einfluss auf die Performanz und ergänzen sich gegenseitig, indem sie Unsupervised Database Optimization aus verschiedenen Perspektiven betrachten. 

Wir leisten folgende Beiträge: (1) Wir untersuchen dem Stand der Technik entsprechende automatisierte Indexauswahlalgorithmen hinsichtlich verschiedener Kriterien, zum Beispiel bezüglich ihrer Berücksichtigung von Indexinteraktionen. Wir stellen eine erweiterbare Plattform zur Leistungsevaluierung solcher Algorithmen mit Industriestandarddatensätzen und -Workloads zur Verfügung. Diese Plattform wird von der Forschungsgemeinschaft aktiv verwendet und hat bereits zu weiteren Forschungsarbeiten geführt. Mit unserer Plattform leiten wir die Stärken und Schwächen der untersuchten Algorithmen ab. Wir kommen zu dem Schluss, dass bestehende Lösung häufig Skalierungsschwierigkeiten haben und nicht in der Lage sind, schnell (nahezu) optimale Lösungen für große Problemfälle zu ermitteln. (2) Um diese Einschränkungen zu bewältigen, stellen wir zwei neue Algorithmen vor. Extend ermittelt (nahezu) optimale Lösungen mit einer iterativen Heuristik. Das Verfahren identifiziert die besten Indexkonfigurationen für die evaluierten Benchmarks und seine Laufzeit ist bis zu 10-mal geringer als die Laufzeit anderer nahezu optimaler Ansätze. SWIRL basiert auf Reinforcement Learning und ermittelt Lösungen ohne Wartezeit. Diese Lösungen weichen maximal 3 % von den optimalen Lösungen ab. Extend und SWIRL sind verfügbar als Open-Source-Implementierungen.

(3) Ein Mechanismus, der mittels automatischer Workload-Analyse Datenabhängigkeiten für die Anfrageoptimierung bestimmt, ergänzt die vorigen Beiträge. Wir beschreiben und klassifizieren 58 Techniken, die auf Functional, Order und Inclusion Dependencies sowie Unique Column Combinations basieren. Der Analysemechanismus und drei Optimierungstechniken sind in unserem Open-Source-Forschungsdatenbanksystem Hyrise implementiert. Der Ansatz reduziert die Laufzeit des Join Order Benchmark um 26 % und erreicht eine bis zu 58-fache Beschleunigung einiger TPC-DS-Anfragen.

Darüber hinaus haben wir ein Cockpit für Unsupervised Database Optimization entwickelt. Dieses Cockpit ermöglicht interaktive Experimente, um Vertrauen in automatisierte Techniken zur Datenbankoptimie-rung zu schaffen. Zusammenfassend lässt sich festhalten, dass unsere Beiträge die Performanz von Datenbanksystemen verbessern, Datenbankadministratoren in ihrer Arbeit unterstützen und ihnen ermöglichen, ihre Zeit anderen, weniger mühsamen, Aufgaben zu widmen.
KW  - Datenbank
KW  - Datenbanksysteme
KW  - database
KW  - DBMS
KW  - Hyrise
KW  - index selection
KW  - database systems
KW  - RL
KW  - reinforcement learning
KW  - query optimization
KW  - data dependencies
KW  - functional dependencies
KW  - order dependencies
KW  - unique column combinations
KW  - inclusion dependencies
KW  - funktionale Abhängigkeiten
KW  - Anfrageoptimierung
KW  - Query-Optimierung
KW  - extend
KW  - SWIRL
KW  - unsupervised
KW  - database optimization
KW  - self-driving
KW  - autonomous
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-589490
ER  - 
TY  - GEN
A1  - Friedel, Eva
A1  - Schlagenhauf, Florian
A1  - Beck, Anne
A1  - Dolan, Raymond J.
A1  - Huys, Quentin J. M.
A1  - Rapp, Michael Armin
A1  - Heinz, Andreas
T1  - The effects of life stress and neural learning signals on fluid intelligence
T2  - Postprints der Universität Potsdam : Humanwissenschaftliche Reihe
N2  - Fluid intelligence (fluid IQ), defined as the capacity for rapid problem solving and behavioral adaptation, is known to be modulated by learning and experience. Both stressful life events (SLES) and neural correlates of learning [specifically, a key mediator of adaptive learning in the brain, namely the ventral striatal representation of prediction errors (PE)] have been shown to be associated with individual differences in fluid IQ. Here, we examine the interaction between adaptive learning signals (using a well-characterized probabilistic reversal learning task in combination with fMRI) and SLES on fluid IQ measures. We find that the correlation between ventral striatal BOLD PE and fluid IQ, which we have previously reported, is quantitatively modulated by the amount of reported SLES. Thus, after experiencing adversity, basic neuronal learning signatures appear to align more closely with a general measure of flexible learning (fluid IQ), a finding complementing studies on the effects of acute stress on learning. The results suggest that an understanding of the neurobiological correlates of trait variables like fluid IQ needs to take socioemotional influences such as chronic stress into account.
T3  - Zweitveröffentlichungen der Universität Potsdam : Humanwissenschaftliche Reihe - 621 
KW  - reinforcement learning
KW  - prediction error signal
KW  - ventral striatum
KW  - stress
KW  - intelligence
Y1  - 2020
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-435140
SN  - 1866-8372
IS  - 621
SP  - 35
EP  - 43
ER  - 
TY  - THES
A1  - Afifi, Haitham
T1  - Wireless In-Network Processing for Multimedia Applications
T1  - Drahtlose In-Network-Verarbeitung für Multimedia-Anwendungen
N2  - With the recent growth of sensors, cloud computing handles the data processing of many applications. Processing some of this data on the cloud raises, however, many concerns regarding, e.g., privacy, latency, or single points of failure. Alternatively, thanks to the development of embedded systems, smart wireless devices can share their computation capacity, creating a local wireless cloud for in-network processing. In this context, the processing of an application is divided into smaller jobs so that a device can run one or more jobs.
The contribution of this thesis to this scenario is divided into three parts. In part one, I focus on wireless aspects, such as power control and interference management, for deciding which jobs to run on which node and how to route data between nodes. Hence, I formulate optimization problems and develop heuristic and meta-heuristic algorithms to allocate wireless and computation resources. Additionally, to deal with multiple applications competing for these resources, I develop a reinforcement learning (RL) admission controller to decide which application should be admitted. Next, I look into acoustic applications to improve wireless throughput by using microphone clock synchronization to synchronize wireless transmissions.
In the second part, I jointly work with colleagues from the acoustic processing field to optimize both network and application (i.e., acoustic) qualities. My contribution focuses on the network part, where I study the relation between acoustic and network qualities when selecting a subset of microphones for collecting audio data or selecting a subset of optional jobs for processing these data; too many microphones or too many jobs can lessen quality by unnecessary delays. Hence, I develop RL solutions to select the subset of microphones under network constraints when the speaker is moving while still providing good acoustic quality. Furthermore, I show that autonomous vehicles carrying microphones improve the acoustic qualities of different applications. Accordingly, I develop RL solutions (single and multi-agent ones) for controlling these vehicles.
In the third part, I close the gap between theory and practice. I describe the features of my open-source framework used as a proof of concept for wireless in-network processing. Next, I demonstrate how to run some algorithms developed by colleagues from acoustic processing using my framework. I also use the framework for studying in-network delays (wireless and processing) using different distributions of jobs and network topologies.
N2  - Mit der steigenden Anzahl von Sensoren übernimmt Cloud Computing die Datenverarbeitung vieler Anwendungen. Dies wirft jedoch viele Bedenken auf, z. B. in Bezug auf Datenschutz, Latenzen oder Fehlerquellen. Alternativ und dank der Entwicklung eingebetteter Systeme können drahtlose intelligente Geräte für die lokale Verarbeitung verwendet werden, indem sie ihre Rechenkapazität gemeinsam nutzen und so eine lokale drahtlose Cloud für die netzinterne Verarbeitung schaffen. In diesem Zusammenhang wird eine Anwendung in kleinere Aufgaben unterteilt, so dass ein Gerät eine oder mehrere Aufgaben ausführen kann. Der Beitrag dieser Arbeit zu diesem Szenario gliedert sich in drei Teile.

 Im ersten Teil konzentriere ich mich auf drahtlose Aspekte wie Leistungssteuerung und Interferenzmanagement um zu entscheiden, welche Aufgaben auf welchem Knoten ausgeführt werden sollen und wie die Daten zwischen den Knoten weitergeleitet werden sollen. Daher formuliere ich Optimierungsprobleme und entwickle heuristische und metaheuristische Algorithmen zur Zuweisung von Ressourcen eines drahtlosen Netzwerks. Um mit mehreren Anwendungen, die um diese Ressourcen konkurrieren, umgehen zu können, entwickle ich außerdem einen Reinforcement Learning (RL) Admission Controller, um zu entscheiden, welche Anwendung zugelassen werden soll. Als Nächstes untersuche ich akustische Anwendungen zur Verbesserung des drahtlosen Durchsatzes, indem ich Mikrofon-Taktsynchronisation zur Synchronisierung drahtloser Übertragungen verwende.

Im zweiten Teil arbeite ich mit Kollegen aus dem Bereich der Akustikverarbeitung zusammen, um sowohl die Netzwerk- als auch die Anwendungsqualitäten (d.h. die akustischen) zu optimieren. Mein Beitrag konzentriert sich auf den Netzwerkteil, wo ich die Beziehung zwischen akustischen und Netzwerkqualitäten bei der Auswahl einer Teilmenge von Mikrofonen für die Erfassung von Audiodaten oder der Auswahl einer Teilmenge von optionalen Aufgaben für die Verarbeitung dieser Daten untersuche; zu viele Mikrofone oder zu viele Aufgaben können die Qualität
durch unnötige Verzögerungen verringern. Daher habe ich RL-Lösungen entwickelt, um die Teilmenge der Mikrofone unter Netzwerkbeschränkungen auszuwählen, wenn sich der Sprecher bewegt, und dennoch eine gute akustische Qualität gewährleistet. Außerdem zeige ich, dass autonome Fahrzeuge, die Mikrofone mit sich führen, die akustische Qualität verschiedener Anwendungen verbessern. Dementsprechend entwickle ich RL-Lösungen (Einzel- und Multi-Agenten-Lösungen) für die Steuerung dieser Fahrzeuge.

Im dritten Teil schließe ich die Lücke zwischen Theorie und Praxis. Ich beschreibe die Eigenschaften meines Open-Source-Frameworks, das als Prototyp für die drahtlose netzinterne Verarbeitung verwendet wird. Anschließend zeige ich, wie einige Algorithmen, die von Kollegen aus der Akustikverarbeitung entwickelt wurden, mit meinem Framework ausgeführt werden können. Außerdem verwende ich das Framework für die Untersuchung von netzinternen Verzögerungen unter Verwendung verschiedener Aufgabenverteilungen und Netzwerktopologien.
KW  - wireless networks
KW  - reinforcement learning
KW  - network optimization
KW  - Netzoptimierung
KW  - bestärkendes Lernen
KW  - drahtloses Netzwerk
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-604371
ER  -