TY - JOUR A1 - Li, Yuanqing A1 - Breitenreiter, Anselm A1 - Andjelkovic, Marko A1 - Chen, Junchao A1 - Babic, Milan A1 - Krstić, Miloš T1 - Double cell upsets mitigation through triple modular redundancy JF - Microelectronics Journal N2 - A triple modular redundancy (TMR) based design technique for double cell upsets (DCUs) mitigation is investigated in this paper. This technique adds three extra self-voter circuits into a traditional TMR structure to enable the enhanced error correction capability. Fault-injection simulations show that the soft error rate (SER) of the proposed technique is lower than 3% of that of TMR. The implementation of this proposed technique is compatible with the automatic digital design flow, and its applicability and performance are evaluated on an FIFO circuit. KW - Triple modular redundancy (TMR) KW - Double cell upsets (DCUs) Y1 - 2019 U6 - https://doi.org/10.1016/j.mejo.2019.104683 SN - 0026-2692 SN - 1879-2391 VL - 96 PB - Elsevier CY - Oxford ER - TY - JOUR A1 - Chen, Junchao A1 - Lange, Thomas A1 - Andjelkovic, Marko A1 - Simevski, Aleksandar A1 - Lu, Li A1 - Krstic, Milos T1 - Solar particle event and single event upset prediction from SRAM-based monitor and supervised machine learning JF - IEEE transactions on emerging topics in computing / IEEE Computer Society, Institute of Electrical and Electronics Engineers N2 - The intensity of cosmic radiation may differ over five orders of magnitude within a few hours or days during the Solar Particle Events (SPEs), thus increasing for several orders of magnitude the probability of Single Event Upsets (SEUs) in space-borne electronic systems. Therefore, it is vital to enable the early detection of the SEU rate changes in order to ensure timely activation of dynamic radiation hardening measures. In this paper, an embedded approach for the prediction of SPEs and SRAM SEU rate is presented. The proposed solution combines the real-time SRAM-based SEU monitor, the offline-trained machine learning model and online learning algorithm for the prediction. With respect to the state-of-the-art, our solution brings the following benefits: (1) Use of existing on-chip data storage SRAM as a particle detector, thus minimizing the hardware and power overhead, (2) Prediction of SRAM SEU rate one hour in advance, with the fine-grained hourly tracking of SEU variations during SPEs as well as under normal conditions, (3) Online optimization of the prediction model for enhancing the prediction accuracy during run-time, (4) Negligible cost of hardware accelerator design for the implementation of selected machine learning model and online learning algorithm. The proposed design is intended for a highly dependable and self-adaptive multiprocessing system employed in space applications, allowing to trigger the radiation mitigation mechanisms before the onset of high radiation levels. KW - Machine learning KW - Single event upsets KW - Random access memory KW - monitoring KW - machine learning algorithms KW - predictive models KW - space missions KW - solar particle event KW - single event upset KW - machine learning KW - online learning KW - hardware accelerator KW - reliability KW - self-adaptive multiprocessing system Y1 - 2022 U6 - https://doi.org/10.1109/TETC.2022.3147376 SN - 2168-6750 VL - 10 IS - 2 SP - 564 EP - 580 PB - Institute of Electrical and Electronics Engineers CY - [New York, NY] ER - TY - THES A1 - Chen, Junchao T1 - A self-adaptive resilient method for implementing and managing the high-reliability processing system T1 - Eine selbstadaptive belastbare Methode zum Implementieren und Verwalten von hochzuverlässigen Verarbeitungssysteme N2 - As a result of CMOS scaling, radiation-induced Single-Event Effects (SEEs) in electronic circuits became a critical reliability issue for modern Integrated Circuits (ICs) operating under harsh radiation conditions. SEEs can be triggered in combinational or sequential logic by the impact of high-energy particles, leading to destructive or non-destructive faults, resulting in data corruption or even system failure. Typically, the SEE mitigation methods are deployed statically in processing architectures based on the worst-case radiation conditions, which is most of the time unnecessary and results in a resource overhead. Moreover, the space radiation conditions are dynamically changing, especially during Solar Particle Events (SPEs). The intensity of space radiation can differ over five orders of magnitude within a few hours or days, resulting in several orders of magnitude fault probability variation in ICs during SPEs. This thesis introduces a comprehensive approach for designing a self-adaptive fault resilient multiprocessing system to overcome the static mitigation overhead issue. This work mainly addresses the following topics: (1) Design of on-chip radiation particle monitor for real-time radiation environment detection, (2) Investigation of space environment predictor, as support for solar particle events forecast, (3) Dynamic mode configuration in the resilient multiprocessing system. Therefore, according to detected and predicted in-flight space radiation conditions, the target system can be configured to use no mitigation or low-overhead mitigation during non-critical periods of time. The redundant resources can be used to improve system performance or save power. On the other hand, during increased radiation activity periods, such as SPEs, the mitigation methods can be dynamically configured appropriately depending on the real-time space radiation environment, resulting in higher system reliability. Thus, a dynamic trade-off in the target system between reliability, performance and power consumption in real-time can be achieved. All results of this work are evaluated in a highly reliable quad-core multiprocessing system that allows the self-adaptive setting of optimal radiation mitigation mechanisms during run-time. Proposed methods can serve as a basis for establishing a comprehensive self-adaptive resilient system design process. Successful implementation of the proposed design in the quad-core multiprocessor shows its application perspective also in the other designs. N2 - Infolge der CMOS-Skalierung wurden strahleninduzierte Einzelereignis-Effekte (SEEs) in elektronischen Schaltungen zu einem kritischen Zuverlässigkeitsproblem für moderne integrierte Schaltungen (ICs), die unter rauen Strahlungsbedingungen arbeiten. SEEs können in der kombinatorischen oder sequentiellen Logik durch den Aufprall hochenergetischer Teilchen ausgelöst werden, was zu destruktiven oder nicht-destruktiven Fehlern und damit zu Datenverfälschungen oder sogar Systemausfällen führt. Normalerweise werden die Methoden zur Abschwächung von SEEs statisch in Verarbeitungsarchitekturen auf der Grundlage der ungünstigsten Strahlungsbedingungen eingesetzt, was in den meisten Fällen unnötig ist und zu einem Ressourcen-Overhead führt. Darüber hinaus ändern sich die Strahlungsbedingungen im Weltraum dynamisch, insbesondere während Solar Particle Events (SPEs). Die Intensität der Weltraumstrahlung kann sich innerhalb weniger Stunden oder Tage um mehr als fünf Größenordnungen ändern, was zu einer Variation der Fehlerwahrscheinlichkeit in ICs während SPEs um mehrere Größenordnungen führt. In dieser Arbeit wird ein umfassender Ansatz für den Entwurf eines selbstanpassenden, fehlerresistenten Multiprozessorsystems vorgestellt, um das Problem des statischen Mitigation-Overheads zu überwinden. Diese Arbeit befasst sich hauptsächlich mit den folgenden Themen: (1) Entwurf eines On-Chip-Strahlungsteilchen Monitors zur Echtzeit-Erkennung von Strahlung Umgebungen, (2) Untersuchung von Weltraumumgebungsprognosen zur Unterstützung der Vorhersage von solaren Teilchen Ereignissen, (3) Konfiguration des dynamischen Modus in einem belastbaren Multiprozessorsystem. Daher kann das Zielsystem je nach den erkannten und vorhergesagten Strahlungsbedingungen während des Fluges so konfiguriert werden, dass es während unkritischer Zeiträume keine oder nur eine geringe Strahlungsminderung vornimmt. Die redundanten Ressourcen können genutzt werden, um die Systemleistung zu verbessern oder Energie zu sparen. In Zeiten erhöhter Strahlungsaktivität, wie z. B. während SPEs, können die Abschwächungsmethoden dynamisch und in Abhängigkeit von der Echtzeit-Strahlungsumgebung im Weltraum konfiguriert werden, was zu einer höheren Systemzuverlässigkeit führt. Auf diese Weise kann im Zielsystem ein dynamischer Kompromiss zwischen Zuverlässigkeit, Leistung und Stromverbrauch in Echtzeit erreicht werden. Alle Ergebnisse dieser Arbeit wurden in einem hochzuverlässigen Quad-Core-Multiprozessorsystem evaluiert, das die selbstanpassende Einstellung optimaler Strahlungsschutzmechanismen während der Laufzeit ermöglicht. Die vorgeschlagenen Methoden können als Grundlage für die Entwicklung eines umfassenden, selbstanpassenden und belastbaren Systementwurfsprozesses dienen. Die erfolgreiche Implementierung des vorgeschlagenen Entwurfs in einem Quad-Core-Multiprozessor zeigt, dass er auch für andere Entwürfe geeignet ist. KW - single event upset KW - solar particle event KW - machine learning KW - self-adaptive multiprocessing system KW - maschinelles Lernen KW - selbstanpassendes Multiprozessorsystem KW - strahleninduzierte Einzelereignis-Effekte KW - Sonnenteilchen-Ereignis Y1 - 2023 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:kobv:517-opus4-583139 ER - TY - JOUR A1 - Andjelković, Marko A1 - Chen, Junchao A1 - Simevski, Aleksandar A1 - Schrape, Oliver A1 - Krstić, Miloš A1 - Kraemer, Rolf T1 - Monitoring of particle count rate and LET variations with pulse stretching inverters JF - IEEE transactions on nuclear science : a publication of the IEEE Nuclear and Plasma Sciences Society N2 - This study investigates the use of pulse stretching (skew-sized) inverters for monitoring the variation of count rate and linear energy transfer (LET) of energetic particles. The basic particle detector is a cascade of two pulse stretching inverters, and the required sensing area is obtained by connecting up to 12 two-inverter cells in parallel and employing the required number of parallel arrays. The incident particles are detected as single-event transients (SETs), whereby the SET count rate denotes the particle count rate, while the SET pulsewidth distribution depicts the LET variations. The advantage of the proposed solution is the possibility to sense the LET variations using fully digital processing logic. SPICE simulations conducted on IHP's 130-nm CMOS technology have shown that the SET pulsewidth varies by approximately 550 ps over the LET range from 1 to 100 MeV center dot cm(2) center dot mg(-1). The proposed detector is intended for triggering the fault-tolerant mechanisms within a self-adaptive multiprocessing system employed in space. It can be implemented as a standalone detector or integrated in the same chip with the target system. KW - Particle detector KW - pulse stretching inverters KW - single-event transient KW - (SET) count rate KW - SET pulsewidth distribution Y1 - 2021 U6 - https://doi.org/10.1109/TNS.2021.3076400 SN - 0018-9499 SN - 1558-1578 VL - 68 IS - 8 SP - 1772 EP - 1781 PB - Institute of Electrical and Electronics Engineers CY - New York, NY ER -