TY  - JOUR
A1  - Panzer, Marcel
A1  - Bender, Benedict
T1  - Deep reinforcement learning in production systems
BT  - a systematic literature review
JF  - International Journal of Production Research
N2  - Shortening product development cycles and fully customizable products pose major challenges for production systems. These not only have to cope with an increased product diversity but also enable high throughputs and provide a high adaptability and robustness to process variations and unforeseen incidents. To overcome these challenges, deep Reinforcement Learning (RL) has been increasingly applied for the optimization of production systems. Unlike other machine learning methods, deep RL operates on recently collected sensor-data in direct interaction with its environment and enables real-time responses to system changes. Although deep RL is already being deployed in production systems, a systematic review of the results has not yet been established. The main contribution of this paper is to provide researchers and practitioners an overview of applications and to motivate further implementations and research of deep RL supported production systems. Findings reveal that deep RL is applied in a variety of production domains, contributing to data-driven and flexible processes. In most applications, conventional methods were outperformed and implementation efforts or dependence on human experience were reduced. Nevertheless, future research must focus more on transferring the findings to real-world systems to analyze safety aspects and demonstrate reliability under prevailing conditions.
KW  - Machine learning
KW  - reinforcement learning
KW  - production control
KW  - production planning
KW  - manufacturing processes
KW  - systematic literature review
Y1  - 2021
U6  - https://doi.org/10.1080/00207543.2021.1973138
SN  - 1366-588X
SN  - 0020-7543
VL  - 13
IS  - 60
PB  - Taylor & Francis
CY  - London
ER  - 
TY  - GEN
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - Deep reinforcement learning in production planning and control
BT  - A systematic literature review
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - Increasingly fast development cycles and individualized products pose major challenges for today's smart production systems in times of industry 4.0. The systems must be flexible and continuously adapt to changing conditions while still guaranteeing high throughputs and robustness against external disruptions. Deep reinforcement learning (RL) algorithms, which already reached impressive success with Google DeepMind's AlphaGo, are increasingly transferred to production systems to meet related requirements. Unlike supervised and unsupervised machine learning techniques, deep RL algorithms learn based on recently collected sensorand process-data in direct interaction with the environment and are able to perform decisions in real-time. As such, deep RL algorithms seem promising given their potential to provide decision support in complex environments, as production systems, and simultaneously adapt to changing circumstances. While different use-cases for deep RL emerged, a structured overview and integration of findings on their application are missing. To address this gap, this contribution provides a systematic literature review of existing deep RL applications in the field of production planning and control as well as production logistics. From a performance perspective, it became evident that deep RL can beat heuristics significantly in their overall performance and provides superior solutions to various industrial use-cases. Nevertheless, safety and reliability concerns must be overcome before the widespread use of deep RL is possible which presumes more intensive testing of deep RL in real world applications besides the already ongoing intensive simulations.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 198 
KW  - deep reinforcement learning
KW  - machine learning
KW  - production planning
KW  - production control
KW  - systematic literature review
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-605722
SN  - 2701-6277
SN  - 1867-5808
ER  - 
TY  - CHAP
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - Deep reinforcement learning in production planning and control
BT  - A systematic literature review
T2  - Proceedings of the Conference on Production Systems and Logistics
N2  - Increasingly fast development cycles and individualized products pose major challenges for today's smart production systems in times of industry 4.0. The systems must be flexible and continuously adapt to changing conditions while still guaranteeing high throughputs and robustness against external disruptions. Deep rein- forcement learning (RL) algorithms, which already reached impressive success with Google DeepMind's AlphaGo, are increasingly transferred to production systems to meet related requirements. Unlike supervised and unsupervised machine learning techniques, deep RL algorithms learn based on recently collected sensor- and process-data in direct interaction with the environment and are able to perform decisions in real-time. As such, deep RL algorithms seem promising given their potential to provide decision support in complex environments, as production systems, and simultaneously adapt to changing circumstances. While different use-cases for deep RL emerged, a structured overview and integration of findings on their application are missing. To address this gap, this contribution provides a systematic literature review of existing deep RL applications in the field of production planning and control as well as production logistics. From a performance perspective, it became evident that deep RL can beat heuristics significantly in their overall performance and provides superior solutions to various industrial use-cases. Nevertheless, safety and reliability concerns must be overcome before the widespread use of deep RL is possible which presumes more intensive testing of deep RL in real world applications besides the already ongoing intensive simulations.
KW  - deep reinforcement learning
KW  - machine learning
KW  - production planning
KW  - production control
KW  - systematic literature review
Y1  - 2021
U6  - https://doi.org/10.15488/11238
SN  - 2701-6277
SP  - 535
EP  - 545
PB  - publish-Ing.
CY  - Hannover
ER  - 
TY  - THES
A1  - Theuer, Hanna Katharina
T1  - Beherrschung komplexer Produktionsprozesse durch Autonomie
T1  - Mastering complex production processes by autonomy
N2  - Modern technologies enable the actors of a production process to autonomous decision-making, information-processing, and decision- execution. It devolves hierarchical controlled relationships and distributes decision-making among many actors. Positive consequences include using local competencies and fast on-site action without (time-)consuming cross-process planning run by a central control instance. Evaluating the decentralization of the process helps to compare different control strategies and thus contributes to the mastery of more complex production processes. Although the importance of the communication structure of these actors increases, no method uses this as a basis for operationalizing decentralization. 
	
This motivates the focus of this thesis. It develops a three-level evaluation model determining the decentralization of a production process based on two determinants: the communication and decision-making structure of the autonomous actors involved. Based on a definition of decentralization of production processes, it set requirements for a key value that determines the structural autonomy of the actors and selects a suitable social network analysis metric. The possibility of integrated decision-making and decision execution justifies the additional consideration of the decision structure. The differentiation of both factors forms the basis for the classification of actors; the multiplication of both values results in the characteristic value real autonomy describing the autonomy of an actor, which is the key figure of the model's first level. 
	
Homogeneous actor autonomy characterizes a high decentralization of the process step, which is the object of consideration of the second level of the model. 

Comparing the existing with the maximum possible decentralization of the process steps determines the Autonomy Index. This figure operationalizes the decentralization of the process at the third level of the model. 
	
A simulation study with two simulation experiments - a central and a decentral controlled process - at Zentrum Industrie 4.0 validates the evaluation model. The application of the model to an industrial production process underlines the practical applicability.
N2  - Moderne Technologien befähigen die beteiligten Akteure eines Produktionsprozesses die Informationsaufnahme, Entscheidungsfindung und -ausführung selbstständig auszuführen. Hierarchische Kontrollbeziehungen werden aufgelöst und die Entscheidungsfindung auf eine Vielzahl von Akteuren verteilt. Positive Folgen sind unter anderem die Nutzung lokaler Kompetenzen und ein schnelles Handeln vor Ort ohne (zeit-)aufwändige prozessübergreifende Planungsläufe durch eine zentrale Steuerungsinstanz. Die Bewertung der Dezentralität des Prozesses hilft beim Vergleich verschiedener Steuerungsstrategien und trägt so zur Beherrschung komplexerer Produktionsprozesse bei. 
	
Obwohl die Kommunikationsstruktur der an der Entscheidungsfindung beteiligten Akteure zunehmend an Bedeutung gewinnt, existiert keine Methode, welche diese als Grundlage für die Operationalisierung der Dezentralität verwendet. Hier setzt diese Arbeit an. Es wird ein dreistufiges Bewertungsmodell entwickelt, dass die Dezentralität eines Produktionsprozesses auf Basis der Kommunikations- und Entscheidungsstruktur der am Prozess beteiligten, autonomen Akteure ermittelt.
	
Aufbauend auf einer Definition von Dezentralität von Produktionsprozessen werden Anforderungen an eine Kennzahl erhoben und - auf Basis der Kommunikationsstruktur - eine die strukturelle Autonomie der Akteure bestimmenden Kenngröße der sozialen Netzwerkanalyse ermittelt. Die Notwendigkeit der zusätzlichen Berücksichtigung der Entscheidungsstruktur wird basierend auf der Möglichkeit der Integration von Entscheidungsfindung und -ausführung begründet.
	
Die Differenzierung beider Faktoren bildet die Grundlage für die Klassifikation der Akteure; die Multiplikation beider Werte resultiert in dem die Autonomie eines Akteurs beschreibenden Kennwert tatsächliche Autonomie, welcher das Ergebnis der ersten Stufe des Modells darstellt. Homogene Akteurswerte charakterisieren eine hohe Dezentralität des Prozessschrittes, welcher Betrachtungsobjekt der zweiten Stufe ist. Durch einen Vergleich der vorhandenen mit der maximal möglichen Dezentralität der Prozessschritte wird auf der dritten Stufe der Autonomie Index ermittelt, welcher die Dezentralität des Prozesses operationalisiert.
	
Das erstellte Bewertungsmodell wird anhand einer Simulationsstudie im Zentrum Industrie 4.0 validiert. Dafür wird das Modell auf zwei Simulationsexperimente - einmal mit einer zentralen und einmal mit einer dezentralen Steuerung - angewendet und die Ergebnisse verglichen. Zusätzlich wird es auf einen umfangreichen Produktionsprozess aus der Praxis angewendet.
KW  - Produktion
KW  - Autonomie
KW  - Prozessverbesserung
KW  - Dezentralität
KW  - Produktionssteuerung
KW  - autonomy
KW  - decentrality
KW  - production
KW  - production control
KW  - process improvement
Y1  - 2022
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-541842
ER  - 
TY  - GEN
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - A deep reinforcement learning based hyper-heuristic for modular production control
T2  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe
N2  - In nowadays production, fluctuations in demand, shortening product life-cycles, and highly configurable products require an adaptive and robust control approach to maintain competitiveness. This approach must not only optimise desired production objectives but also cope with unforeseen machine failures, rush orders, and changes in short-term demand. Previous control approaches were often implemented using a single operations layer and a standalone deep learning approach, which may not adequately address the complex organisational demands of modern manufacturing systems. To address this challenge, we propose a hyper-heuristics control model within a semi-heterarchical production system, in which multiple manufacturing and distribution agents are spread across pre-defined modules. The agents employ a deep reinforcement learning algorithm to learn a policy for selecting low-level heuristics in a situation-specific manner, thereby leveraging system performance and adaptability. We tested our approach in simulation and transferred it to a hybrid production environment. By that, we were able to demonstrate its multi-objective optimisation capabilities compared to conventional approaches in terms of mean throughput time, tardiness, and processing of prioritised orders in a multi-layered production system. The modular design is promising in reducing the overall system complexity and facilitates a quick and seamless integration into other scenarios.
T3  - Zweitveröffentlichungen der Universität Potsdam : Wirtschafts- und Sozialwissenschaftliche Reihe - 173 
KW  - production control
KW  - modular production
KW  - multi-agent system
KW  - deep reinforcement learning
KW  - deep learning
KW  - multi-objective optimisation
Y1  - 2023
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-605642
SN  - 1867-5808
ER  - 
TY  - JOUR
A1  - Panzer, Marcel
A1  - Bender, Benedict
A1  - Gronau, Norbert
T1  - A deep reinforcement learning based hyper-heuristic for modular production control
JF  - International journal of production research
N2  - In nowadays production, fluctuations in demand, shortening product life-cycles, and highly configurable products require an adaptive and robust control approach to maintain competitiveness. This approach must not only optimise desired production objectives but also cope with unforeseen machine failures, rush orders, and changes in short-term demand. Previous control approaches were often implemented using a single operations layer and a standalone deep learning approach, which may not adequately address the complex organisational demands of modern manufacturing systems. To address this challenge, we propose a hyper-heuristics control model within a semi-heterarchical production system, in which multiple manufacturing and distribution agents are spread across pre-defined modules. The agents employ a deep reinforcement learning algorithm to learn a policy for selecting low-level heuristics in a situation-specific manner, thereby leveraging system performance and adaptability. We tested our approach in simulation and transferred it to a hybrid production environment. By that, we were able to demonstrate its multi-objective optimisation capabilities compared to conventional approaches in terms of mean throughput time, tardiness, and processing of prioritised orders in a multi-layered production system. The modular design is promising in reducing the overall system complexity and facilitates a quick and seamless integration into other scenarios.
KW  - production control
KW  - modular production
KW  - multi-agent system
KW  - deep reinforcement learning
KW  - deep learning
KW  - multi-objective optimisation
Y1  - 2023
U6  - https://doi.org/10.1080/00207543.2023.2233641
SN  - 0020-7543
SN  - 1366-588X
SN  - 0278-6125
SP  - 1
EP  - 22
PB  - Taylor & Francis
CY  - London
ER  - 
TY  - JOUR
A1  - Panzer, Marcel
A1  - Gronau, Norbert
T1  - Enhancing economic efficiency in modular production systems through deep reinforcement learning
JF  - Procedia CIRP
N2  - In times of increasingly complex production processes and volatile customer demands, the production adaptability is crucial for a company's profitability and competitiveness. The ability to cope with rapidly changing customer requirements and unexpected internal and external events guarantees robust and efficient production processes, requiring a dedicated control concept at the shop floor level. Yet in today's practice, conventional control approaches remain in use, which may not keep up with the dynamic behaviour due to their scenario-specific and rigid properties. To address this challenge, deep learning methods were increasingly deployed due to their optimization and scalability properties. However, these approaches were often tested in specific operational applications and focused on technical performance indicators such as order tardiness or total throughput. In this paper, we propose a deep reinforcement learning based production control to optimize combined techno-financial performance measures. Based on pre-defined manufacturing modules that are supplied and operated by multiple agents, positive effects were observed in terms of increased revenue and reduced penalties due to lower throughput times and fewer delayed products. The combined modular and multi-staged approach as well as the distributed decision-making further leverage scalability and transferability to other scenarios.
KW  - modular production
KW  - production control
KW  - multi-agent system
KW  - deep reinforcement learning
KW  - discrete event simulation
Y1  - 2024
U6  - https://doi.org/10.1016/j.procir.2023.09.229
SN  - 2212-8271
VL  - 121
SP  - 55
EP  - 60
PB  - Elsevier
CY  - Amsterdam
ER  - 
TY  - THES
A1  - Panzer, Marcel
T1  - Design of a hyper-heuristics based control framework for modular production systems
T1  - Design eines auf Hyperheuristiken basierenden Steuerungsframeworks für modulare Produktionssysteme
N2  - Volatile supply and sales markets, coupled with increasing product individualization and complex production processes, present significant challenges for manufacturing companies. These must navigate and adapt to ever-shifting external and internal factors while ensuring robustness against process variabilities and unforeseen events. This has a pronounced impact on production control, which serves as the operational intersection between production planning and the shop- floor resources, and necessitates the capability to manage intricate process interdependencies effectively. Considering the increasing dynamics and product diversification, alongside the need to maintain constant production performances, the implementation of innovative control strategies becomes crucial.
In recent years, the integration of Industry 4.0 technologies and machine learning methods has gained prominence in addressing emerging challenges in production applications. Within this context, this cumulative thesis analyzes deep learning based production systems based on five publications. Particular attention is paid to the applications of deep reinforcement learning, aiming to explore its potential in dynamic control contexts. Analysis reveal that deep reinforcement learning excels in various applications, especially in dynamic production control tasks. Its efficacy can be attributed to its interactive learning and real-time operational model. However, despite its evident utility, there are notable structural, organizational, and algorithmic gaps in the prevailing research. A predominant portion of deep reinforcement learning based approaches is limited to specific job shop scenarios and often overlooks the potential synergies in combined resources. Furthermore, it highlights the rare implementation of multi-agent systems and semi-heterarchical systems in practical settings. A notable gap remains in the integration of deep reinforcement learning into a hyper-heuristic.
To bridge these research gaps, this thesis introduces a deep reinforcement learning based hyper- heuristic for the control of modular production systems, developed in accordance with the design science research methodology. Implemented within a semi-heterarchical multi-agent framework, this approach achieves a threefold reduction in control and optimisation complexity while ensuring high scalability, adaptability, and robustness of the system. In comparative benchmarks, this control methodology outperforms rule-based heuristics, reducing throughput times and tardiness, and effectively incorporates customer and order-centric metrics. The control artifact facilitates a rapid scenario generation, motivating for further research efforts and bridging the gap to real-world applications. The overarching goal is to foster a synergy between theoretical insights and practical solutions, thereby enriching scientific discourse and addressing current industrial challenges.
N2  - Volatile Beschaffungs- und Absatzmärkte sowie eine zunehmende Produktindividualisierung konfrontieren Fertigungsunternehmen mit beträchtlichen Herausforderungen. Diese erfordern eine Anpassung der Produktion an sich ständig wechselnde externe Einflüsse und eine hohe Prozessrobustheit gegenüber unvorhersehbaren Schwankungen. Ein Schlüsselelement in diesem Kontext ist die Produktionssteuerung, die als operative Schnittstelle zwischen der Produktions- planung und den Fertigungsressourcen fungiert und eine effiziente Handhabung zahlreicher Prozessinterdependenzen sicherstellen muss. Angesichts dieser gesteigerten Produktionsdynamik und Produktvielfalt rücken innovative Steuerungsansätze in den Vordergrund.
In jüngerer Zeit wurden daher verstärkt Industrie-4.0-Ansätze und Methoden des maschinellen Lernens betrachtet. Im Kontext der aktuellen Forschung analysiert die vorliegende kumulative Arbeit Deep-Learning basierte Produktionssysteme anhand von fünf Publikationen. Hierbei wird ein besonderes Augenmerk auf die Anwendungen des Deep Reinforcement Learning gelegt, um dessen Potenzial zu ergründen. Die Untersuchungen zeigen, dass das Deep Reinforcement Learning in vielen Produktionsanwendungen sowohl herkömmlichen Ansätzen als auch an- deren Deep-Learning Werkzeugen überlegen ist. Diese Überlegenheit ergibt sich vor allem aus dem interaktiven Lernprinzip und der direkten Interaktion mit der Umwelt, was es für die dynamische Produktionssteuerung besonders geeignet macht. Dennoch werden strukturelle, organisatorische und algorithmische Forschungslücken identifiziert. Die überwiegende Mehrheit der untersuchten Ansätze fokussiert sich auf Werkstattfertigungen und vernachlässigt dabei potenzielle Prozesssynergien modularer Produktionssysteme. Ferner zeigt sich, dass Multi- Agenten- und Mehr-Ebenen-Systeme sowie die Kombination verschiedener algorithmischer Ansätze nur selten zur Anwendung kommen.
Um diese Forschungslücken zu adressieren, wird eine auf Deep Reinforcement Learning basierende Hyper-Heuristik für die Steuerung modularer Produktionssysteme vorgestellt, die nach der Design Science Research Methodology entwickelt wird. Ein semi-heterarchisches Multi-Agenten-System ermöglicht eine dreifache Reduktion der Steuerungs- und Optimierungs- komplexität und gewährleistet gleichzeitig eine hohe Systemadaptabilität und -robustheit. In Benchmarks übertrifft das Steuerungskonzept regelbasierte Ansätze, minimiert Durchlaufzeiten und Verspätungen und berücksichtigt kunden- sowie auftragsorientierte Kennzahlen. Die ent- wickelte Steuerungsmethodik ermöglicht einen schnellen Szenarienentwurf, um dadurch weitere Forschungsbemühungen zu stimulieren und die bestehende Transferlücke zur Realität weiter zu überbrücken. Das Ziel dieser Forschungsarbeit ist es, eine Synergie zwischen theoretischen Erkenntnissen und Praxis-relevanten Lösungen zu schaffen, um sowohl den wissenschaftlichen Diskurs zu bereichern als auch Antworten auf aktuelle industrielle Herausforderungen zu bieten.
KW  - modular production
KW  - deep learning
KW  - modulare Produktion
KW  - Produktionssteuerung
KW  - Deep Learning
KW  - Reinforcement Learning
KW  - Simulation
KW  - production control
KW  - reinforcement learning
KW  - simulation
Y1  - 2024
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-633006
ER  -