# Methodology for Standard Cell-based Design and Implementation of Reliable and Robust Hardware Systems

**Oliver Schrape** 

Dissertation

zur Erlangung des akademischen Grades

Doktor der Ingenieurwissenschaften (Dr.-Ing.)

in der Wissenschaftsdisziplin "Rechnerarchitektur und Fehlertoleranz"

eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät Institut für Informatik und Computational Science der Universität Potsdam und IHP Leibniz-Institut für innovative Mikroelektronik

Ort und Tag der Disputation: Universität Potsdam, am 17. Februar 2023 Unless otherwise indicated, this work is licensed under a Creative Commons License Attribution 4.0 International.

This does not apply to quoted content and works based on other permissions.

To view a copy of this licence visit:

https://creativecommons.org/licenses/by/4.0

| Hauptbetreuer: | Prof. Dr. Miloš Krstić                                        |
|----------------|---------------------------------------------------------------|
|                | (Universität Potsdam und IHP)                                 |
| Betreuer:      | Prof. DrIng. habil. Michael Hübner                            |
|                | (Brandenburgische Technische Universität Cottbus-Senftenberg) |
| Gutachter 1:   | Prof. Dr. Miloš Krstić                                        |
|                | (Universität Potsdam und IHP)                                 |
| Gutachter 2:   | Prof. DrIng. habil. Michael Hübner                            |
|                | (Brandenburgische Technische Universität Cottbus-Senftenberg) |
| Gutachter 3:   | Prof. Ney Laert Vilar Calazans                                |
|                | (Pontifical Catholic University of Rio Grande do Sul (PUCRS)) |

Published online on the

Publication Server of the University of Potsdam: https://doi.org/10.25932/publishup-58932 https://nbn-resolving.org/urn:nbn:de:kobv:517-opus4-589326 "Ausdauer wird früher oder später belohnt – meistens aber später."

— Wilhelm Busch

... for my wife and my daughters

# Contents

| $\mathbf{C}$ | onter | nts            |                                               | i    |
|--------------|-------|----------------|-----------------------------------------------|------|
| A            | bstra | nct            |                                               | vii  |
| Zι           | usam  | menfas         | sung                                          | ix   |
| A            | crony | $\mathbf{yms}$ |                                               | xi   |
| 1            | Intr  | oducti         | on                                            | 1    |
|              | 1.1   | Motiva         | tion                                          | . 1  |
|              | 1.2   | Reliabl        | le and Robust Hardware Systems                | . 3  |
|              | 1.3   | Contri         | bution of this Work                           | . 5  |
|              | 1.4   | Publica        | ations Related to this Work                   | . 6  |
|              | 1.5   | Structu        | ure of the Thesis                             | . 8  |
| <b>2</b>     | Bac   | kgroun         | d                                             | 11   |
|              | 2.1   | Semico         | nductor Technologies                          | . 11 |
|              | 2.2   | Standa         | rd Cell Library Development                   | . 12 |
|              |       | 2.2.1          | View Generation Procedure                     | . 12 |
|              |       | 2.2.2          | Physical View                                 | . 13 |
|              |       | 2.2.3          | Standard Cell Characterization                | . 15 |
|              | 2.3   | Digital        | Design Flow                                   | . 17 |
|              | 2.4   | Design         | -for-Testability and Low-Power Digital Design | . 20 |
|              |       | 2.4.1          | Structural Test – Scan-Test                   | . 21 |
|              |       | 2.4.2          | Clock-Gating                                  | . 22 |
|              | 2.5   | Signal         | Integrity Effects                             | . 22 |
|              | 2.6   | Differe        | ntial Logic and Signaling                     | . 24 |
|              |       | 2.6.1          | Differential Signaling                        | . 24 |
|              |       | 2.6.2          | Current-Mode Logic/Emitter-Coupled Logic      | . 26 |
|              |       | 2.6.3          | Limitation                                    | . 27 |
|              |       | 2.6.4          | Comparison of CML and CMOS Inverter           | . 28 |
|              | 2.7   | Radiat         | ion-Induced Effects                           | . 29 |

|   |     | 2.7.1                              | Total Ionizing Dose                                    | 29 |  |
|---|-----|------------------------------------|--------------------------------------------------------|----|--|
|   |     | 2.7.2                              | Single-Event Effects                                   | 30 |  |
|   |     |                                    | 2.7.2.1 Hard-Errors – Single Event Latchup and Burnout | 31 |  |
|   |     |                                    | 2.7.2.2 Soft-Errors – Single Event Transient and Upset | 31 |  |
|   |     | 2.7.3                              | Charge Generation and Linear Energy Transfer (LET)     | 32 |  |
|   |     | 2.7.4                              | Circuit Cross-Section                                  | 33 |  |
|   |     | 2.7.5                              | Charge-Sharing                                         | 34 |  |
|   |     | 2.7.6                              | Transient Masking                                      | 34 |  |
|   |     | 2.7.7                              | Single Event Transient Quenching and Broadening        | 35 |  |
|   | 2.8 | Fault-                             | Tolerant Circuits                                      | 35 |  |
|   |     | 2.8.1                              | Triple Modular Redundancy (TMR)                        | 35 |  |
|   |     | 2.8.2                              | Reliability of TMR Circuits                            | 37 |  |
|   | 2.9 | Tempo                              | oral Redundancy/Temporal Sampling                      | 38 |  |
| 3 | Rel | ated V                             | Vork                                                   | 41 |  |
|   | 3.1 | Desigr                             | Automation of Differential Circuits                    | 41 |  |
|   |     | 3.1.1                              | Secure Digital Design Flow                             | 42 |  |
|   |     | 3.1.2                              | Via-Programmable Flow                                  | 43 |  |
|   |     | 3.1.3                              | Fat-Wire Approach                                      | 44 |  |
|   | 3.2 | Differe                            | ential CML-based Standard Cell Library Concepts        | 45 |  |
|   |     | 3.2.1                              | MCML Library Generation with Footprints                | 45 |  |
|   |     | 3.2.2                              | ECL Standard Cell Libraries                            | 45 |  |
|   | 3.3 | 3 Standard Flip-Flop Architectures |                                                        |    |  |
|   | 3.4 |                                    |                                                        |    |  |
|   | 3.5 | Single                             | Event Transient Mitigation                             | 49 |  |
|   |     | 3.5.1                              | General Transient Mitigation                           | 49 |  |
|   |     | 3.5.2                              | Filtering Transient with Guard-Gates/C-elements        | 49 |  |
|   |     | 3.5.3                              | Transient Filters on Data Paths in TMR Circuits        | 51 |  |
|   |     | 3.5.4                              | Transient Mitigation in TMR Circuits with Clock Delays | 52 |  |
|   | 3.6 | Harde                              | ning by Structural Modification                        | 53 |  |
|   | 3.7 |                                    | tion-Hard Flip-Flop Architectures                      | 53 |  |
|   |     | 3.7.1                              | Dual Interlocked Storage Cell                          | 53 |  |
|   |     | 3.7.2                              | Quatro Cell                                            | 54 |  |
|   |     | 3.7.3                              | Heavy Ion Tolerant Flip-Flop                           | 55 |  |
|   |     | 3.7.4                              | Further Hardened Flip-Flop Compositions                | 55 |  |
|   | 3.8 | Radia                              | tion-Hardening-by-Design (RHBD) Flip-Flops             | 56 |  |
|   |     | 3.8.1                              | Built-in Soft Error Resilient (BISER)                  | 56 |  |
|   |     | 3.8.2                              | Dual/Double Modular Redundancy Extensions              | 57 |  |
|   |     | 3.8.3                              | Full-Triple Modular Redundancy (FTMR) Flip-Flop        | 57 |  |
|   |     | 3.8.4                              | Robust Triple Modular Redundancy Flip-Flop             | 58 |  |
|   |     | 3.8.5                              | Self-Correction Function                               | 60 |  |
|   |     | 3.8.6                              | Voter Architectures for TMR                            | 61 |  |
|   | 3.9 | Addre                              | essed Open Issues in this Thesis                       | 62 |  |

|   |     | 3.9.1  | Addressed Issues for Differential Logic Design                          |
|---|-----|--------|-------------------------------------------------------------------------|
|   |     | 3.9.2  | Addressed Issues for Radiation-Hardening-by-Design Circuits $\ldots$ 65 |
| 4 | Cor | ncepts | and Methodology for Differential Logic Design 65                        |
|   | 4.1 | -      | uction $\ldots$                                                         |
|   | 4.2 |        | ard Cell Concept                                                        |
|   |     | 4.2.1  | Specification                                                           |
|   |     | 4.2.2  | Supported Bias Concept                                                  |
|   |     | 4.2.3  | Modular Standard Cells – Logical Sections                               |
|   |     | 4.2.4  | Cell Configurations – Speed-Classes                                     |
|   |     | 4.2.5  | Speed-Class Extension                                                   |
|   |     | 4.2.6  | Level-Shifter                                                           |
|   | 4.3 | Librar | y Aspects for Differential Logic Design                                 |
|   |     | 4.3.1  | Standard Cell Set                                                       |
|   |     | 4.3.2  | Single-ended Pseudo-Gates                                               |
|   |     | 4.3.3  | Fat-wire Specification                                                  |
|   |     | 4.3.4  | Fat-wire Compatible Differential Standard Cell Layouts 80               |
|   |     | 4.3.5  | Characterization of Differential Standard Cells                         |
|   |     | 4.3.6  | Physical View Generation                                                |
|   | 4.4 | Design | Methodology for Differential Logic Design                               |
|   |     | 4.4.1  | Design Flow for Differential Standard Cell-based Design 85              |
|   |     | 4.4.2  | Limitations and Features for RTL Design                                 |
|   |     | 4.4.3  | Gate-Level Synthesis                                                    |
|   |     | 4.4.4  | Placement and First Routing-Phase                                       |
|   |     | 4.4.5  | Design Conversion                                                       |
|   |     | 4.4.6  | Second Routing-Phase                                                    |
|   | 4.5 | Summ   | ary                                                                     |
| 5 | Cor | icepts | and Methodology for RHBD Circuits 95                                    |
| 0 | 5.1 | -      | uction                                                                  |
|   | 5.2 |        | ard Cell Library                                                        |
|   |     | 5.2.1  | Additional Cell Set                                                     |
|   |     | 5.2.2  | Robust Driver Cells                                                     |
|   |     | 5.2.3  | Transient Filter Cells                                                  |
|   |     | 5.2.4  | Robust Memory Cells – $\Delta$ TMR Cells                                |
|   | 5.3 | RHBD   | 0 TMR Cells – Logical Sections                                          |
|   |     | 5.3.1  | D-SET Filter Section                                                    |
|   |     | 5.3.2  | Memory Cell Section                                                     |
|   |     | 5.3.3  | Voter Section                                                           |
|   |     | 5.3.4  | Driver Section                                                          |
|   | 5.4 | Specia | l RHBD TMR Cells                                                        |
|   |     | 5.4.1  | RHBD TMR Scan-Flip-Flops                                                |
|   |     | 5.4.2  | RHBD TMR Clock-Gating Cells                                             |

| Bi            | Bibliography 171                    |                 |                                                                                      |          |
|---------------|-------------------------------------|-----------------|--------------------------------------------------------------------------------------|----------|
| $\mathbf{Li}$ | List of Listings 169                |                 |                                                                                      |          |
| $\mathbf{Li}$ | List of Figures163List of Tables167 |                 |                                                                                      | 167      |
| Li            |                                     |                 |                                                                                      | 163      |
| Appendix 162  |                                     |                 | 162                                                                                  |          |
| 7             | Con                                 | clusio          | n                                                                                    | 155      |
| _             | ~                                   |                 | *                                                                                    |          |
|               |                                     | 6.3.2           | Microcontroller Chip                                                                 |          |
|               | 0.0                                 | 6.3.1           | ation of the Design Flow for Realistic ApplicationsA 7.5-15.5 MSPS 14-bit ADC Core   |          |
|               | 6.3                                 | 6.2.8<br>Evolue | Results and Discussion                                                               |          |
|               |                                     | 6.2.7           | Irradiation Campaigns                                                                |          |
|               |                                     | 6.2.6           | Shift Register Test Vehicles                                                         |          |
|               |                                     | 6.2.5           | Development of robust clock-gate $CG-\Delta TMR$                                     |          |
|               |                                     | 6.2.4           | Development of robust T- $\Delta$ TMR cells                                          |          |
|               |                                     | 6.2.3           | Development of robust S- $\Delta$ TMR and SC- $\Delta$ TMR cells                     |          |
|               |                                     | 6.2.2           | Development of robust L- $\Delta$ TMR and LM- $\Delta$ TMR cells                     |          |
|               |                                     | 6.2.1           | Development of robust $\Delta$ TMR cells $\ldots \ldots \ldots \ldots \ldots \ldots$ | . 132    |
|               | 6.2                                 | Develo          | opment and Evaluation of Radiation-Hardened $\Delta$ TMR Standard Cé                 | ells 132 |
|               |                                     | 6.1.3           | Results and Discussion                                                               | . 130    |
|               |                                     | 6.1.2           | Implementation                                                                       | . 129    |
|               |                                     | 6.1.1           | Architecture Overview                                                                |          |
|               | 6.1                                 |                 | iable Digitally-Designed Differential Logic Design                                   |          |
| 6             | Eva                                 | luatior         | n of the Concepts                                                                    | 127      |
|               | 5.8                                 | Summ            | ary                                                                                  | . 126    |
|               |                                     | 5.7.4           | Place and Route                                                                      |          |
|               |                                     | 5.7.3           | Scan-Test – Pattern Generation                                                       | . 124    |
|               |                                     | 5.7.2           | Gate-Level Synthesis                                                                 | . 123    |
|               |                                     | 5.7.1           | Handling of Critical Nets                                                            | . 122    |
|               | 5.7                                 |                 | 1 Methodology for Radiation-Hardening-by-Design                                      |          |
|               | 5.6                                 |                 | t Aspects for RHBD TMR Cells                                                         |          |
|               | 5.5                                 | Chara           | cterization of RHBD TMR Cells                                                        |          |
|               |                                     | 5.4.3           | Self-Correcting TMR Flip-Flops                                                       | . 114    |

## Acknowledgement

First of all, I would like to thank my loved ones, my wife and my two daughters. Thank you so much for the encouragement, the patience, and not-too-loud-grumbling about my non-presence, even though I was actually physically there ...

A special thanks also goes to my parents for not deviating in many things & decisions.

Frank – Thank you for everything and for guiding me through my first steps at IHP.

Gerald, Daniel S.– You triggered my interests for differential logic design – and I really tried to forget, but I could not give it up ;)

Florian, Stephan, and Manuel – Thanks for your support!

Daniel M. – Thanks for your time while bringing these crazy analog things closer to me. I learned a lot and still benefit from it.

Steffen, Pinky, Anselm, Marko, Junchao, Marcus, Goran, Carsten, Klaus, Philip, and Alexey – Everyone needs an own "Quietscheentchen" for small-talk, discussion and brainstorming. Apparently, I have more than one. So, thanks for being mine and never complaining about my (sometimes same) questions, not well-thought out ideas, and also – confusion!

I would also like to thank the whole IHP and the colleagues not announced by name. Many thanks to all of you!

Special thanks go also to Prof. Ney Laert Vilar Calazans for the criticism, the hints and the comments that have greatly contributed to the improvement of this Thesis.

Miloš - thank you so much ... for your time, for the new challenges, the confidence but also the patience, and especially, the discussions and hints – always served with healthy, valuable and friendly criticism ;)

### Abstract

Reliable and robust data processing is one of the hardest requirements for systems in fields such as medicine, security, automotive, aviation, and space, to prevent critical system failures caused by changes in operating or environmental conditions. In particular, Signal Integrity (SI) effects such as crosstalk may distort the signal information in sensitive mixed-signal designs. A challenge for hardware systems used in the space are radiation effects. Namely, Single Event Effects (SEEs) induced by high-energy particle hits may lead to faulty computation, corrupted configuration settings, undesired system behavior, or even total malfunction.

Since these applications require an extra effort in design and implementation, it is beneficial to master the standard cell design process and corresponding design flow methodologies optimized for such challenges. Especially for reliable, low-noise differential signaling logic such as Current Mode Logic (CML), a digital design flow is an orthogonal approach compared to traditional manual design. As a consequence, mandatory preliminary considerations need to be addressed in more detail. First of all, standard cell library concepts with suitable cell extensions for reliable systems and robust space applications have to be elaborated. Resulting design or improve the radiation-hardness. In parallel, the main objectives of the proposed cell architectures are to reduce the occupied area, power, and delay overhead. Second, a special setup for standard cell characterization is additionally required for a proper and accurate logic gate modeling. Last but not least, design methodologies for mandatory design flow stages such as logic synthesis and place and route need to be developed for the respective hardware systems to keep the reliability or the radiation-hardness at an acceptable level.

This Thesis proposes and investigates standard cell-based design methodologies and techniques for reliable and robust hardware systems implemented in a conventional semiconductor technology. The focus of this work is on reliable differential logic design and robust radiation-hardening-by-design circuits. The synergistic connections of the digital design flow stages are systematically addressed for these two types of hardware systems. In more detail, a library for differential logic is extended with single-ended pseudo-gates for intermediate design steps to support the logic synthesis and layout generation with commercial Computer-Aided Design (CAD) tools. Special cell layouts are proposed to relax signal routing. A library set for space applications is similarly extended by novel Radiation-Hardening-by-Design (RHBD) Triple Modular Redundancy (TMR) cells, enabling a one fault correction. Therein, additional optimized architectures for glitch filter cells, robust scannable and self-correcting flip-flops, and clock-gates are proposed. The circuit concepts and the physical layout representation views of the differential logic gates and the RHBD cells are discussed. However, the quality of results of designs depends implicitly on the accuracy of the standard cell characterization which is examined for both types therefore. The entire design flow is elaborated from the hardware design description to the layout representations. A 2-Phase routing approach together with an intermediate design conversion step is proposed after the initial place and route stage for reliable, pure differential designs, whereas a special constraining for RHBD applications in a standard technology is presented.

The digital design flow for differential logic design is successfully demonstrated on a reliable differential bipolar CML application. A balanced routing result of its differential signal pairs is obtained by the proposed 2-Phase-routing approach. Moreover, the elaborated standard cell concepts and design methodology for RHBD circuits are applied to the digital part of a 7.5-15.5 MSPS 14-bit Analog-to-Digital Converter (ADC) and a complex microcontroller architecture. The ADC is implemented in an unhardened standard semiconductor technology and successfully verified by electrical measurements. The overhead of the proposed hardening approach is additionally evaluated by design exploration of the microcontroller application. Furthermore, the first obtained related measurement results of novel RHBD- $\Delta$ TMR flip-flops show a radiation-tolerance up to a threshold Linear Energy Transfer (LET) of 46.1, 52.0, and 62.5 MeV cm<sup>2</sup> mg<sup>-1</sup> and savings in silicon area of 25-50 % for selected TMR standard cell candidates.

As a conclusion, the presented design concepts at the cell and library levels, as well as the design flow modifications are adaptable and transferable to other technology nodes. In particular, the design of hybrid solutions with integrated reliable differential logic modules together with robust radiation-tolerant circuit parts is enabled by the standard cell concepts and design methods proposed in this work.

#### Zusammenfassung

Eine zuverlässige und robuste Datenverarbeitung ist eine der wichtigsten Voraussetzungen für Systeme in Bereichen wie Medizin, Sicherheit, Automobilbau, Luft- und Raumfahrt, um kritische Systemausfälle zu verhindern, welche durch Änderungen der Betriebsbedingung oder Umweltgegebenheiten hervorgerufen werden können. Insbesondere Signalintegritätseffekte (Signal Integrity (SI)) wie das Übersprechen und Überlagern von Signalen (crosstalk) können den Informationsgehalt in empfindlichen Mixed-Signal-Designs verzerren. Eine zusätzliche Herausforderung für Hardwaresysteme für Weltraumanwendungen ist die Strahlung. Resultierende Effekte, die durch hochenergetische Teilchentreffer ausgelöst werden (Single Event Effects (SEEs)), können zu fehlerhaften Berechnungen, beschädigten Konfigurationseinstellungen, unerwünschtem Systemverhalten oder sogar zu völliger Fehlfunktion führen.

Da diese Anwendungen einen zusätzlichen Aufwand beim Entwurf und der Implementierung erfordern, ist es von Vorteil, über Standardzellenentwurfskonzepte und entsprechende Entwurfsablaufmethoden zu verfügen, die für genau solche Herausforderungen optimiert sind. Insbesondere für zuverlässige, rauscharme differenzielle Logik, wie der Current Mode Logic (CML), ist ein digitaler Entwurfsablauf ein orthogonaler Ansatz im Vergleich zum traditionellen manuellen Entwurfskonzept. Infolgedessen müssen obligatorische Vorüberlegungen detaillierter behandelt werden. Zunächst sind Konzepte für Standardzellbibliotheken mit geeigneten Zellerweiterungen für zuverlässige Systeme und robuste Raumfahrtanwendungen zu erarbeiten. Daraus resultierende Entwurfskonzepte auf Zellebene sollten die logische Synthese für den differenziellen Logikentwurf ermöglichen oder die Strahlungshärte eines Designs verbessern. Parallel dazu sind die Hauptziele der vorgeschlagenen Zellarchitekturen, die Verringerung der genutzten Siliziumfläche und der Verlustleistung sowie den Verzögerungs-Overhead zu minimieren. Zweitens ist ein spezieller Aufbau für die Charakterisierung von Standardzellen erforderlich, um eine angemessene und genaue Modellierung der Logikgatter zu ermöglichen. Nicht zuletzt müssen für die jeweiligen Hardwaresysteme Methoden für die Entwurfsphasen wie Logik-Synthese und das Platzieren und Routen (Place and Route (PnR)) entwickelt werden, um die Zuverlässigkeit beziehungsweise die Strahlungshärte auf einem akzeptablen Niveau zu halten.

In dieser Arbeit werden standardisierte Zellen-basierte Entwurfsmethoden und -techniken für zuverlässige und robuste Hardwaresysteme vorgeschlagen und untersucht, welche in einer herkömmlichen Halbleitertechnologie implementiert werden. Dabei werden zuverlässige differenzielle Logikschaltungen und robuste strahlungsgehärtete Schaltungen betrachtet. Die synergetischen Verbindungen des digitalen Entwurfs werden systematisch für diese beiden Hardwaresysteme behandelt. Im Detail wird eine Bibliothek für differentielle Logik mit Single-Ended-Pseudo-Gattern für Zwischenschritte erweitert, die die Logiksynthese und Layout-Generierung mit heutigen Entwicklungswerkzeugen unterstützen. Ein spezieller Rahmen für das Layout der Zellen wird vorgeschlagen, um das Routing der Signale zu vereinfachen. Die Bibliothek für Raumfahrtanwendungen wird in ähnlicher Weise um neuartige Radiation-Hardening-by-Design (RHBD)-Zellen mit dreifacher modularer Redundanz (Triple Modular Redundancy (TMR)) erweitert, welche eine 1-Bit-Fehlerkorrektur erlaubt. Zusätzlich werden optimierte Architekturen für Glitch-Filterzellen, robuste abtastbare (scannable) und selbstkorrigierende Flip-flops und Taktgatter (clock-gates) vorgeschlagen. Die Schaltungskonzepte, die physische Layout-Repräsentation der differentiellen Logikgatter und der vorgeschlagenen RHBD-Zellen werden diskutiert. Die Qualität der Ergebnisse der Entwürfe hängt jedoch implizit von der Genauigkeit der Standardzellencharakterisierung ab, die daher für beide Typen untersucht wird. Der gesamte Entwurfsablauf wird von der Entwurfsbeschreibung der Hardware bis hin zur generierten Layout-Darstellung ausgearbeitet. Infolgedessen wird ein 2-Phasen-Routing-Ansatz zusammen mit einem zwischengeschalteten Design-Konvertierungsschritt nach der initialen PnR-Phase für zuverlässige, differentielle Designs vorgeschlagen, während ein spezielles Constraining für RHBD-Anwendungen vorgestellt wird.

Der digitale Entwurfsablauf für Differenziallogik wird erfolgreich an einer zuverlässigen bipolaren Differenzial-CML-Anwendung demonstriert. Durch den 2-Phasen-Routing-Ansatz wird ein ausgewogenes Routing-Ergebnis differentieller Signalpaare erzielt. Darüber hinaus werden die erarbeiteten Standardzellenkonzepte und die Entwurfsmethodik für RHBD-Schaltungen auf den digitalen Teil eines 7.5-15.5 MSPS 14-bit Analog-Digital-Wandlers (ADC) und einer komplexen Mikrocontroller-Architektur angewandt. Der ADC wurde in einer nicht-gehärteten Standard-Halbleitertechnologie implementiert und erfolgreich durch elektrische Messungen verifiziert. Der Mehraufwand des Härtungsansatzes wird zusätzlich durch Design Exploration der Mikrocontroller-Anwendung bewertet. Ferner zeigen erste Messergebnisse der neuartigen RHBD- $\Delta$ TMR-Flip-flops eine Strahlungstoleranz bis zu einem linearen Energietransfer (Linear Energy Transfers (LET)) Schwellwert von 46.1, 52.0 und 62.5 MeV cm<sup>2</sup> mg<sup>-1</sup> und eine Einsparung an Siliziumfläche von 25-50 % für ausgewählte TMR-Standardzellenkandidaten.

Die vorgestellten Entwurfskonzepte auf Zell- und Bibliotheksebene sowie die Änderungen des Entwurfsablaufs sind anpassbar und übertragbar auf andere Technologieknoten. Insbesondere der Entwurf hybrider Lösungen mit integrierten zuverlässigen differenziellen Logikmodulen zusammen mit robusten strahlungstoleranten Schaltungsteilen wird durch die in dieser Arbeit vorgeschlagenen Konzepte und Entwurfsmethoden ermöglicht.

# Acronyms

ADC Analog-to-Digital Converter. **ALU** Arithmetic Logic Unit. **AMS** Analog Mixed-Signal. **APB** Advanced Peripheral Bus. ASCII American Standard Code for Information Interchange. ASIC Application Specific Integrated Circuit. ATPG Automatic Test Pattern Generation. **BiCMOS** Bipolar CMOS. **BISER** Built-In Soft Error Resilience. **BTS** Buffer Tree Synthesis. **CAD** Computer-Aided Design. CG Clock-Gating [Cell]. **CML** Current-Mode Logic. CMOS Complementary Metal-Oxide Semiconductor. CQFP Ceramic Quad Flat Package. CT Clock Tree. **CTS** Clock Tree Synthesis. **CUT** Cell Under Test. CZ Characterization. **D2S** differential-to-single-ended. **DDF** Digital Design Flow. **DEF** Design Exchange Format (DEF/.deffile). **DfT** Design-for-Testability.

**DICE** Dual Interlocked Storage Cell. **DIFF** differential. **DLG** differential logic gate. **DMR** Dual/Double Modular Redundancy. **DRC** Design Rule Check. **DRV** Design Rule Violation. **DSP** Digital Signal Processing. **ECL** Emitter-Coupled Logic. EGR Enhanced Guard Rings. **ELT** Enclosed Layout Transistor. **ERC** Electrical Rule Check. FF Flip-Flop. FFT Fast Fourier Transform. FPU Floating Point Unit. FTMR full-TMR. FW fat-wire. **GDS** Graphic Database System II. GG Guard-Gate. **GGB** Guard-Gate Buffer. **GGI** Guard-Gate Inverter. **GLS** Gate-Level Synthesis. HBT Hetero Junction Bipolar Transistor [device]. HDL Hardware Description Language. HIT Heavy Ion Tolerant. **IC** Integrated Circuit. ICG Integrated Clock-Gating [cell].

IO Input/Output (e.g. pads, pins, or ports).

**IP** Intellectual Property.

LCG Logically-Connected Group.
LEAP Layout Design through Error-Aware Transistor Positioning.
LEF Layout Exchange Format.
LET Linear Energy Transfer.
LUT Lookup Table.
LVDS Low Voltage Differential Signaling.
LVS Layout versus Schematic.

MBU Multiple-Bit Upset.
MCML MOS Current-Mode Logic.
MNCC Multiple Node Charge Collection.
MOS Metal-Oxide Semiconductor.
MPU Memory Protection Unit.
MS mixed-signal.

 NM Noise Margin.
 NMOS N-channel metal-oxide semiconductor.
 NMR N-Modular Redundancy.
 NRE Non-Recurring Engineering [costs].

**OPC** Operating Current.

PA Power Analysis.
PDK Process Design Kit.
PDN Power Distribution Network.
PGA Pin Grid Array.
PLL Phase-Locked Loop.
PMOS P-channel metal-oxide semiconductor.
PnR Place and Route.
PVT Process, Voltage and Temperature.
RF radio frequency.
RHBD Radiation-Hardness/Hardening-by-Design.

**RHBP** Radiation-Hardening-by-Process.

**RISC** Reduced Instruction Set Computer. **RnR** reliable and robust [[hardware] system]. **RnRS** reliable and robust [hardware] system(s). **RTL** Register Transfer Level. S2D single-ended-to-differential. SCL Source-Coupled Logic. SDC Synopsys Design Constraint Format. **SDF** Standard Delay Format. **SE** single-ended. **SEB** Single Event Burnout. **SEE** Single Event Effect. **SEL** Single Event Latchup. SEPG Single-ended Pseudo-Gate. **SET** Single Event Transient. **SEU** Single Event Upset. SI Signal Integrity. SoC System-on-[a]-Chip. SotA State-of-the-Art. **SPI** Serial Peripheral Interface. **STA** Static Timing Analysis.

TCL Tool command language.
TID Total Ionizing Dose.
TMR Triple Modular Redundancy.
TR Temporal Redundancy.
TSPC True Single-Phase Clock.

VHDL Very High Speed Integrated Circuit Hardware Description Language.

 ${\bf VLSI}$  Very Large Scale Integration.

- **WDDL** Wave Dynamic Differential Logic.
- **XML** Extensible Markup Language.
- **ZTC** Zero Temperature Coefficient (Current).

"Holzhacken ist deshalb so beliebt, weil man bei dieser Tätigkeit den Erfolg sofort sieht." — Albert Einstein

## Chapter 1

# Introduction

This Chapter introduces the motivation of this Thesis, the definition of a reliable and robust hardware system, the contribution, and an overview of the first-author publications and patents related to this work. Finally, the structure of the thesis is listed at the end of this Chapter.

#### 1.1 Motivation

Very Large Scale Integration (VLSI) circuits or System-on-[a]-Chip (SoC) for critical applications in the fields of medicine, security, automotive, aviation, and space have to operate reliably under all circumstances. They require a specific, *fault-tolerant* implementation in order to prevent system failures caused by changes in operating conditions, environmental conditions or component malfunctions. If the system parameters are out of specification, low quality of signals, uncovered functional states or total system failure may result.

One of the root causes of faults in the application fields of aviation and space is radiation leading to bit-flips in memory cells, voltage glitches, displacement damage, or ionizing dose effects [1]. These *radiation-induced* effects can result in structural damages in the semiconductor devices, in long-term device performance degradation or in temporal logical errors. One recent example is the loss of up to 40 SpaceX Starlink satellites due to solar storm [2]. Furthermore, an Airbus 330 was nearly losing the control in 2008 by pitching downward twice in rapid succession [3]. The cause of this malfunction of onboard computer was later traced to errors induced by cosmic rays. Another sad example closely-connected to malfunction of sensors and control in critical systems are the two plane crashes of Boeing 737 MAX 8 in 2018 and 2019 [4, 5, 6].

Other sources that reduce the systems' reliability are issues related to Signal Integrity (SI) which degrade the quality of signals. Among them the most important effects for SoC designers are signal crosstalk, voltage drop, and noise [7, 8]. In addition, noise is also a serious concern in security applications. The secret keys of crypto-cores can be

extracted by analyzing the generated switching noise on the power supply in hardware designs [9, 10].

However, signal integrity and radiation-induced effects reduce the reliability of hardware systems. They increase criticality and may bring the system to an undesirable, uncovered functional state. Therefore, there is a high demand for hardware systems that are able to tolerate faults and errors, mitigate, or manage these kinds of effects. Moreover, when VLSI circuits are targeted, the digital design approach with its standard cell-based design flow is immediately addressed. Any proposed cell concept and design methodology to obtain Reliable and Robust Hardware Systems (RnR/RnR-Systems (RnRS)) for such critical applications is closely-connected to the digital standard cell-based design flow.

Depending on the application field of RnR-Systems, different development approaches can be selected. The use of *differential logic design* with standard cell gates is a wellknown solution to obtain *reliable* hardware systems and to improve the designs' SI with respect to power supply noise [11]. Similarly, fast differential bipolar current-switchbased logic gates are used in [12] also to achieve the target speed requirements for computation with a low impact on noise performance.

For the field of space applications, special radiation-hard standard cell libraries and IPs are offered for some popular CMOS technology nodes, e.g., [13, 14]. They allow to develop robust hardware systems with the use of the conventional standard cell-based design flow. The alternative approach is known as *Radiation-Hardness/Hardening-by-Design (RHBD)* with the use of commercial, unhardened semiconductor technologies, and standard logic gates together with additional design techniques to achieve the target robustness [15]. Moreover, additional *hardware redundancy* can be added to a design to increase its robustness and fault-tolerance. As can be seen, the standard cell-based design approach is an attractive and popular solution to develop reliable and robust hardware systems (RnRSs).

The scope of this Thesis is related to two types of RnR-Systems, i.e., reliable differential logic design and robust radiation-hardening-by-design circuits. Nevertheless, when standard cell-based RnRS are targeted, the following aspects have to be considered.

First of all, any new cell, highly-required for RnR circuits, has to be developed as a standard cell. Several essential views such as circuit schematic, layout view, functional model, timing and power model have to be generated to enable an effective use of the standard digital design flow. As a consequence, the *compliance* to the standard cell-based design process is one of the most important requirements. In particular for RHBD circuit design, compatible concepts for mandatory low-power and Design for Testability (DfT) features have to be considered.

However, reliability and robustness go hand in hand with design overhead. Larger robust cells, complex differential logic gates, or hardware redundancy and introduced modification at design-level result in penalties in silicon *area* occupation, an increase in *power* or energy consumption, or in additional *delay* overhead. Thus, these three criteria have

to be strictly considered for new cell concepts and for interactions during the different design phases.

Furthermore, alternative circuit concepts and design techniques which increase the reliability or robustness of a circuit may require a deep understanding and knowledge in circuit design. Thus, RnR-cells and standard cell libraries with a more modular construction-kit-like development approach are preferable. They abstract the internal circuit complexity and may reduce the Non-Recurring Engineering costs (NRE) of the development phase.

Finally, concepts and methodologies restricted to one technology are not valuable and limit their usage. Thus, a more generic and portable approach for the design of RnRS is more beneficial and also increases the attractiveness for the industry. This could be transferred to other standard semiconductor technology nodes. Moreover, when the concepts of both systems are developed coherent to each other, a new type of RnR hardware systems can be obtained: a *hybrid* system as a reliable and robust mixed signal SoC. It could consist of circuit parts of complex RHBD IP and reliable differential logic designs on the same die in the same semiconductor technology.

### 1.2 Reliable and Robust Hardware Systems

Every system has to operate functionally properly within its specification under welldefined operating conditions. Among them are temperature ranges, voltage level definitions for power supply and data signals, and target timing requirements in terms of operating speed. These conditions are referred to in this Thesis as *first-order conditions* for hardware systems. They are individually specified for each system and have to be fulfilled without any degradation under each condition.

In addition to this group, significant changes in the environmental condition may simultaneously affect the performance of such a hardware system. They may occur during operation by effects of the environment, by nature, or be caused by, e.g., noise of components inside a system in package, or by on-chip components in a SoC. There can be many reasons causing malfunctions. Probably the most relevant among them are self-heating, vibration, humidity, electromagnetic, or functional errors by radiation-induced effects. However, when a hardware system is additionally specified for operations under such conditions, the system must be even more reliable and robust in addition to its electrical specifications. Hence, these conditions are called *second-order conditions* of a hardware system in this work.

Nevertheless, every system or integrated circuit (IC) can generally assume two different acting roles. On the one hand, a system can suffer from the induced effects bringing the overall system into a defensive, *victim* role (see Figure 1.1). The closest example for such a scenario is radiation a circuit may be exposed to. Resulting Single Event Effects (SEEs) such as glitches on the signal paths, or bit-flips of storage cells can corrupt the data processing or control. On the other side, systems can also take on the opposite,

i.e., the offensive role of an *aggressor*. As an example, the quality of an electrical signal of a sensitive Analog Mixed-Signal (AMS) design may be affected by large noisy digital parts with high switching activity. This could also lead to noise on the signal or power supply, glitches, crosstalk between signals or total loss of signals in the worst case.



Figure 1.1: Possible effects and acting roles general ICs have to deal with

In principle, any system designed to meet the target requirements imposed by *first-order* conditions may still be vulnerable to *second-order* conditions. Furthermore, while a common IC would be completely at the mercy of these additional effects, a reliable and robust hardware system (RnRS) is capable to cope them. These kinds of systems have to ensure the illustrated two key actions in Figure 1.2 simultaneously to their basic function (under first-order conditions). As can be seen in the figure, the first action decreases the fact of an RnR acting as an *aggressor*, whereas the second one works against the *victim*-role, respectively.



Figure 1.2: Key actions of an RnR-System (RnRS)

However, two groups of effects are considered in this work that require additional reliability and robustness of a system, which are defined by the *second-order* conditions. The first group of effects are signal integrity effects, i.e., SI effects. The second group are radiation-induced effects, i.e., Total Ionizing Dose (TID), and SEE in particular, such as latchups, transients, and upsets as a result from energetic particle hits in the silicon.

Thus, within the scope of this Thesis, an RnRS can be generally defined as follows:

A reliable and robust hardware system (RnRS) is capable to adapt to changes of *second-order conditions*. It is able to perform the following two key actions, a) minimizing effect generation, and b) maximizing effect mitigation. These systems are able to cope any effect by implemented mechanisms and design techniques inside the hardware system. An RnRS itself is realized in a standard semiconductor technology.

#### **1.3 Contribution of this Work**

In this Thesis, cell concepts and design methodologies are presented to obtain reliable and robust hardware systems. Two types of systems are addressed: reliable differential logic designs and robust radiation-hardening-by-design circuits. The proposed solutions for cell design and standard cell libraries are applicable to the standard digital design flow. Thus, a VLSI standard cell-based RnR system can be developed. However, since reliability and robustness at the hardware level always involve additional effort in design and resources, the focus is also set on the reduction of the introduced overhead in terms of area, power, and delay independent of the application field. Moreover, the modeling aspect of new cells is additionally covered by the introduction of the characterization presented in this work.

#### **Contribution to Reliable Differential Logic Design**

This work proposes standard cell concept and a design flow for bipolar Current Mode Logic (CML)-based standard cell applications. The presented circuit concepts for differential logic gates are restricted to the CMOS design approach and comply with the standard cell design flow. The respective design flow extension enables the layout generation of such circuits. A standard cell-based solution is proposed to obtain in-parallel routes of differential signal pairs for technologies with a low number of usable routing layers (here: three). Moreover, NRE costs for cell and library design are kept as low as possible by a more modular development approach.

#### **Contribution to Robust Radiation-hardened Circuits**

The Thesis proposes new circuit concepts for radiation-tolerant cells, robust Triple Modular Redundancy (TMR)-based gates and discusses the corresponding impact on the design flow during design generation. The novel cells are based on the initial idea and circuit arrangement of  $\Delta$ TMR presented in [16]. Therein, these memory cells are equipped with integrated filters ( $\Delta$ ) for transient mitigation on the data path. The circuit proposals in this work present many novel distinctive  $\Delta$ TMR flip-flop configurations with significant improvements in performance (power, area, and delay), functionality, and robustness against radiation. These cells are essential for a portfolio when radiation-hardeningby-design Application Specific Integrated Circuit (ASIC) design is targeted. They have complementary features, allowing to meet various requirements in realistic applications.

The following cell developments are proposed:

- robust transient filter standard cells for critical net protection
- novel baseline  $\Delta$ TMR flip-flops with:
  - high-density layouts for area savings
  - an improved robustness to heavy ions up to  $62.5\,\mathrm{MeV\,cm^2\,mg^{-1}}$
  - low-area overhead and a short propagation delay
- novel robust  $\Delta$ TMR clock-gate to enable clock-gating function to save power
- novel  $\Delta$ TMR flip-flops with:
  - full scan-test support to improve the testability of a design
  - self-correction function to correct internal errors without clock activity.

All new  $\Delta$ TMR flip-flops are characterized by better quality of results in terms of power, area, and delay overhead compared to recent existing solutions in the same 130 nm technology. The configurations have been developed with the use of their internal logical scheme divided into separate sections. Moreover, the robustness of selected candidates is finally confirmed by irradiation measurements.

#### **1.4 Publications Related to this Work**

In the context of this work, six papers have been published with the author of this Thesis as first author in recent years. The publications listed below already give an overview of the entire picture of this Thesis.

A design methodology for standard cell-based CML circuits together with the proposed 2-Phase routing approach is published in [17]. It allows to generate digitally-designed differential bipolar-based circuits with in-parallel routed differential signaling.

[17] O. Schrape, M. Herrmann, F. Winkler, and M. Krstić, "Routing Approach for Digital, Differential bipolar Designs using Virtual fat-wire Boundary Pins," in 20th IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems, DDECS 2017, Dresden, Germany, April 19-21, 2017, pp. 122–126, 2017.

The challenges with respect to characterization of the proposed  $\Delta$ TMR flip-flop standard cell gates are discussed in [18].

[18] O. Schrape, A. Breitenreiter, S. Zeidler, and M. Krstić, "Aspects on Timing Modeling of Radiation-Hardness by Design Standard Cell-Based  $\Delta$ TMR Flip-Flops," in 2019 22nd Euromicro Conference on Digital System Design (DSD), pp. 639–642, Aug 2019.

An example for demonstration purposes of the applicability of the design methodology for RHBD circuits is published in [19]. The proposed methodology is described and applied to the digital part of an AMS design.

[19] O. Schrape, A. Breitenreiter, L. Lu, E. P. Garcia, and M. Krstić, "Mit konventioneller Technologie zum strahlungsharten AMS-Design," in *Workshop Testmethoden* und Zuverlässigkeit von Schaltungen und Systemen, 2021.

An alternative circuit concept for radiation-hardened D-flip-flops with the use of D-latchbased standard cell gates is published in [20]. The radiation-hardness is confirmed by irradiation measurements.

[20] O. Schrape, A. Breitenreiter, C. Schulze, S. Zeidler, and M. Krstić, "Radiation-Hardness-by-Design Latch-based Triple Modular Redundancy Flip-Flops," in 2021 12th IEEE Latin American Symposium on Circuits and Systems, 2021.

The performance of a  $\Delta$ TMR -based flip-flop is significantly improved with the use of fast True Single-Phase Clock (TSPC) flip-flops. The TSPC cells are used for triplication inside the  $\Delta$ TMR cell as proposed in [21]. Thus, the introduced delay by  $\Delta$ TMR and its area overhead are reduced.

[21] O. Schrape, M. Andjelković, A. Breitenreiter, A. Balashov, and M. Krstić, "Design Concept for Radiation-Hardening of Triple Modular Redundancy TSPC Flip-Flops," in 2020 23rd Euromicro Conference on Digital System Design (DSD), Aug 2020.

One of the recent key papers was published on the IEEE Transaction on Circuits and Systems I Journal. It summarizes the main contribution of the design methods and the evaluation of radiation-hardened TMR standard cell flip-flops.

[22] O. Schrape, M. Andjelković, A. Breitenreiter, S. Zeidler, A. Balashov, and M. Krstić, "Design and Evaluation of Radiation-Hardened Standard Cell Flip-Flops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, pp. 1–14, 2021.

Finally, two patents and one pending patent application are closely connected to this Thesis. They are mentioned here in order to underline commercialization aspects of two out of three proposed circuit concepts for self-correction in RHBD TMR-based memory cells.

One solution for data recovery is achieved by open slave latch compositions, which form the self-corrected feedback together with the evaluated output of the voter. In addition, this feedback is connected to an internal second transient filter.

[EP3965296A1/US11527271B2] O. Schrape, A. Breitenreiter, F. Vater, and M. Krstić, "Self-correcting Modular-Redundancy-Memory Device," Sep. 2022/Dec. 2022.

Alternatively, the feedback is formed by distributed Guard-Gate (GG) components which realize the voter function and calculate the required internal feedback signals in parallel.

[EP20194684.5] O. Schrape, M. Andjelković, A. Breitenreiter, and M. Krstić, "Corrigible Comparator for Triple Modular Redundancy Cell," filed in Sep. 2020.

### 1.5 Structure of the Thesis

The structure of the thesis is as follows:

**Chapter 2** summarizes the technical background related to this Thesis. It starts with a technology overview, the main tasks for standard cell library development and for the digital design of VLSI circuits. Moreover, an overview of signal integrity, radiation effects, differential signaling, and hardware redundancy is given.

**Chapter 3** presents the related work for differential logic design and RHBD circuit solutions. The design flow, and cell concepts are presented therein. Moreover, concepts of single-event effect mitigation and the recent robust flip-flop circuit concepts are presented. The chapter concludes with a discussion of the open issues addressed in this Thesis to obtain RnRSs.

**Chapter 4** introduces the library concept and the respective modular standard cell concepts for differential logic design. The related design methodology with its standard digital design flow extension for reliable applications with differential-signaling is presented. The focus is set on the library development, from cell design and modeling enabling a compliant usage within the standard design flow tool chain. Another aspect is the reduction of the power consumption during design generation. A third one is the objective to enable the in-parallel routing of the differential signal pairs for reliability purpose.

Chapter 5 presents novel standard cell concepts and discusses the proposed design methodology in order to obtain RHBD circuits with the use of a standard semiconductor technology. Both traces together enable the robustness of later applications by radiation effect mitigation. Architectures for transient mitigation cells and novel more complex RHBD- $\Delta$ TMR standard cell flip-flops are proposed to be robust against radiation-induced resulting bit-flips. The circuit schemes and the introduced overhead in terms of area occupation, power and delay are discussed in detail.

**Chapter 6** presents the experimental results from the evaluation of the concepts for both types of hardware systems. The applicability of the methodology for reliable differential logic design is demonstrated on a high-speed part of a clock generator circuitry with the use of the proposed cell concepts and design methodology. The results are extracted from the implementation of the obtained routing solution of this differential design. Afterwards, the novel  $\Delta$ TMR cell concepts are evaluated in test vehicles for electrical verification and irradiation campaigns. The radiation-hardness of the new cells is confirmed by irradiation measurement tests. Selected  $\Delta$ TMR cell configurations are compared to the unhardened reference cells in terms of the key criterion, such as area, power/energy, and delay overhead. This Chapter then concludes with the implementations of two realistic design examples for space applications. They have gone through the proposed design flow for standard cell-based RHBD circuit design. The usability of the methodology and the introduced overhead by the hardening-approach is also discussed.

Chapter 7 summarizes the main results and gives an outlook on future work.

## Chapter 2

# Background

This Chapter is about the closely-related technical background important for this work. In particular, standard cell development, the standard cell-based digital design flow, and an overview of signal integrity and radiation-induced effects are given.

### 2.1 Semiconductor Technologies

Semiconductor foundries provide technologies in different nodes for various application fields. They offer Process Design Kits (PDKs) which contain the essential information and technology-related data for ASIC design with the use of modern CAD tools. A PDK typically consists among other things of parametrized cells (PCells) for active and passive devices, technology data and different views of the cells (e.g., layout representation), verification decks with respective rule files, devices models, and often several metal-stack options. Moreover, standard cell libraries or special IPs are provided to benefit from design reuse to enable the design of more complex VLSIs and SoCs. In addition, some of nowadays silicon CMOS technology foundries also offer radiation-hardened technologies or libraries as a platform for space application development [13], or [14]. This approach is also referred to as Radiation-Hardening-by-Process (RHBP), which is a commercially attractive design solution to cope with radiation effects. On the other side, special RF-PDKs are also available on the market which offer additional devices and/or models to improve the performance of the RF or mixed-signal (MS) circuit.

As an alternative, standard semiconductor technologies can be used instead for the design of reliable and robust hardware systems. They do not offer special devices or process features and are therefore also attractive with respect to design costs. For the evaluation of the proposed concepts and design methodologies in this work, the following standard technologies are selected:

**SGB25V** The implementations and experimental analyses for the differential logic design concepts and design methodology are evaluated with the SGB25V technology of IHP. This is a low-cost 250 nm BiCMOS process. The metal stack of this technology

offers three thin and two thick metal layers. This is a larger-scaled process with 2.5 V core logic and 3.3 V IO-logic CMOS devices. The performance of the devices of the bipolar module is for  $f_T=75$ , and for  $f_{max}=95 \text{ GHz}$  respectively.

**SG13S** The evaluation for RHBD applications is done with the use of IHP's highperformance 130 nm silicon-germanium SG13S technology. It is characterized by 1.2 V low-power core logic CMOS transistors and 3.3 V IO-CMOS devices. The bipolar module of this technology offers npn-HBTs with cut-off frequencies up to an  $f_T$  of 250 GHz and an  $f_{max}$  of 340 GHz. The metal stack allows to use five thin and two thick metal layers for implementation. The technology is commercially-qualified and radiation-assessed [23]. Moreover, the provided digital standard cell library is robust against Single Event Latchup (SEL) up to a Linear Energy Transfer (LET) of 67 MeV cm<sup>2</sup> mg<sup>-1</sup> [24].

### 2.2 Standard Cell Library Development

The Thesis is closely related to the design of standard cell-based ICs. As a consequence, the main design steps of the standard cell library development are discussed in this Section. An overview of the general procedure, the generation of abstract layout representations, and the standard cell characterization are addressed. The result of this task is a standard cell library with all essential views (e.g., layout, timing and behavioral models), ready to use for the design generation using the standard digital design flow. This development process is discussed in the context of Cadence Design Systems CAD tools.

#### 2.2.1 View Generation Procedure

The initial input for the view generation procedure are the layout view and the transistorlevel circuit scheme (i.e., schematics in the following) of every cell of the library to be prepared. The cell layouts of the library database are verified to the design rules, i.e., Design Rule Check (DRC), defined by the selected technology and rule files provided by the PDK. Moreover, the netlist of the cell schematics is compared to the respective layout representation counterparts in a layout versus schematic (LVS) check. Afterwards, all devices, the parasitic capacitances and resistances are extracted for each cell to an analog simulation netlist (here: Spectre .scs-file). These files contain the connectivity of the cells at transistor-level. Together with the models of all devices, more accurate power and timing values can be obtained from later standard cell characterization.

Similarly, as for the schematics, an abstract physical representation view of the detailed layout of every cell has to be generated. These views contain the essential information used by Place and Route (PnR) tools for digital layout design generations. The result is an abstract layout view generated as a human-readable ASCII Layout Exchange Format (LEF) file (.lef) [25].

Both processes for the generation of necessary models/views are introduced in the next two subsections following the chart depicted in Figure 2.1.



Figure 2.1: Chart for standard cell library model/view generation

#### 2.2.2 Physical View

Place and route tools for digital design require layout information of all used cells and macros together with the provided technology information (e.g., tech.lef-file). In this file, all routing metal layers, vias (i.e., the "stairs" between two neighboring layers of the metal stack) and a subset of master layers (such as wells, or implants) are defined with their electrical and technology-related characteristics. This includes the capacitances, resistances, heights and current densities of all layers and vias. The layer definitions follow a strict routing grid G obtained by the respective routing pitch P, i.e., the distance between the centerlines of two adjacent wires. Thus, the pitch P is given by the metal layer width W and the *same-net* spacing S.

$$P_{layer} = W_{layer} + S_{layer}$$
(2.1)

Moreover, all layers are declared with preferred routing directions in the tech.lef-file. Based on the relation given in Eq. 2.1, several routing grids can be generated with the use of the minimum pitch or a multiple layer pitch definition. As a result, available routing tracks with preferred directions and empty routing channels can be derived and considered for routing solutions in later circuit implementations. In addition to the routing grids, the placement grid is defined as a multiple of the pitch in X and Y direction which is referred to a SITE in the LEF context. A *site* is the minimum possible step in micrometer a cell can be moved in X and Y direction. A LEF example for such a SITE definition for core cells is shown in Listing 2.1 on page 14.

All cells assigned to *site* CoreSite will be placed in the pre-defined rows in the core area of the chip. In the given example, they can be moved by  $0.5 \,\mu\text{m}$  in X direction and by  $3.78 \,\mu\text{m}$  in Y direction. Moreover, the cells can be flipped by default in both directions to improve the placement. This site definitions and many other technology

parameters (e.g., layer height and width) can be extracted in this technology LEF file, directly exported by the design environment used for the selected PDK.

Listing 2.1: A 1-row standard cell SITE definition

```
1 SITE CoreSite

CLASS CORE ;

SYMMETRY X Y ;

SIZE 0.5 BY 3.78 ;

END CoreSite
```

Similarly, the abstract layout information must be generated for every individual standard cell gate including the signal shape definitions of every used metal layer. Thus, the PnR tool is capable to find an efficient placement location of the cells and a suitable routing solution. The generation of this information is done with the use of a commercial abstract generation tool (see Figure 2.1).

As an example, the detailed layout of an inverter is depicted in Figure 2.2 (a). It consists of shapes for gate drawings, contacts, metals, vias, wells, and active region. Based on this geometrical structure, the abstract view illustrated in Figure 2.2 (b) is generated.



Figure 2.2: Layout views of an inverter: (a) mask layout, (b) abstract layout

The shapes of all pins are extracted, i.e., the blue shapes in Figure 2.2 (a) for the signal A and Q, and for the special nets VSS and VDD respectively. A resulting H-structure is generated for pin Q, whereas a small rectangle shape is extracted for pin A. Particularly, both special nets have larger rectangles on the north and south side of the cell frame. These shapes limit the total cell width. They act as horizontal supply rails, immediately connected when the standard cells are placed next-to-next to each other in a row. Furthermore, the signal extraction can be controlled to consider the entire pin-shape connection below a pin or the individual pin-shape itself. This implicitly allows to control region and blockage definitions to prevent routing in these areas during implementation. Moreover, as indicated by a small triangle in the corner of the cell frame in Figure 2.2 (b), the default orientation of the standard cell is also specified. This is an essential information for cell flipping and rotation during the placement phase. For illustration purpose, an appropriate LEF representation of the inverter of Figure 2.2 is shown in Listing 7.1 in the Appendix on page 159.

However, with respect to the standard cell library preparation, the abstract layout views are generated from the detailed layout database of all standard cell library gates. As a consequence, a standard cell library is properly prepared for use in place and route phase with tools such as [26] during standard digital design generation. Nevertheless, even though cells can be placed and signals can be physically connected by the router, all cells have to be modeled similarly for their timing and power characteristics. Moreover, a behavioral model for every standard cell gate, which is taken as an input file for timing simulation in later design stages, has to be generated. Thus, the respective development tasks to obtain these essential views are introduced in the next section.

#### 2.2.3 Standard Cell Characterization

The standard cell characterization (CZ) is a high-effort task, which is the left branch of the view generation chart illustrated in Figure 2.1. This step can be manually done by scripting or by the use of commercial standard cell characterization tools or environments, such as, e.g., Synopsys PrimeLib solution [27], or Cadence Liberate<sup>TM</sup> [28]. The discussed procedures in this Section are aligned to the last tool.

Two of the main results of this step are timing/power models in terms of so called .libfiles (also known as Liberty<sup>TM</sup> files) and behavioral models in, e.g., Verilog<sup>®</sup> (.v) format. The .lib-files contain essential timing, power, area information, and logical function of each standard cell gate. They are used by other CAD tools for design tasks such as logic synthesis, and place and route. Their information is considered also during design and timing extraction for power analysis (PA) and Static Timing Analysis (STA). Thus, the design costs and the design overhead can be estimated. As a result, the timing for every pin of each mapped cell and timing path is calculated based on the connected load (fan-out) and input transition times. Moreover, the design is additionally checked for timing violations against the applied timing constraints such as clock periods, input and output delays, or minimum and maximum delays for specific paths. However, in order to have a good quality of results for STA and PA in later design phases, the essential information (.lib-files) needs to be generated accurately.

As can be seen in Figure 2.1, the transistor models (device-models), the pre-defined input cell templates (.tcl), and the extracted netlists of the selected gates are applied as input for the characterization tool. The templates cover global specifications with respect to Lookup Table (LUT) sizes in terms of dimensions for constraint definitions and delay measurements determining the number of automatically generated simulation vectors. Furthermore, additional information such as temperature ranges, model corner settings, and power supply voltage definitions can be specified. As an example, for the selected 130 nm standard technology introduced in Section 2.1, three digital process, voltage and temperature (PVT) corners are typically covered with respect to the low voltage MOS transistor models, i.e., min/fast (1.32 V at -40 °C), typ/avg/nom (1.20 V at 25 °C), and the max/slow (1.08 V at 125 °C) corner.

Based on that information, vectors are generated in order to characterize each cell in terms of delay, energy, leakage power, input capacitances, and timing constraints such as setup/hold, recovery/removal or min-pulse-width for constraint pins. The resulting CZ-database consists of several thousands of vectors and checking directives, which are executed in parallel, launching analog simulations invoked by the characterization tool. When all simulation runs have passed, the model files (.lib/.v-files) can be generated and used during design generation.

Nevertheless, the characterization of sequential cells such as flip-flops, requires some particular attention. One additional constraint is that the data must be stable before the sensitive clock edge arrives. This timing window is called setup time  $(t_{setup})$ . The counterpart window is the hold time  $(t_{hold})$ , in which the data must remain stable after the active clock edge has already occurred. Moreover, if a memory cell is equipped with asynchronous control signals, similar timing windows such as recovery time  $(t_{rec})$  and removal time  $(t_{rem})$  have to be considered additionally.

Figure 2.3 illustrates the most important timing windows of a rising-edge-triggered D-flip-flop with low-active asynchronous reset function. If one of these windows remains violated after implementation, the flip-flop can enter in a meta-stable state and might capture wrong intermediate data. Thus, the resulting output value (node Q, not shown here) is unpredictable and unknown.



Figure 2.3: Important timing windows of a rising-edge-triggered D-flip-flop

As a consequence, these windows have to be accurately calculated during the standard cell characterization task. The logical validation is done by comparison of an expected value to the obtained output value measured at a specified primary probe node Q.

When the CZ-setup and the templates are finalized, the characterization process is executed for all defined PVT corners and cells. The final resulting files, i.e., the timing/power models (.lib) and an appropriate behavioral model file (.v) are extracted. They are ready to use by other CAD tools for the design of standard cell-based applications. Nevertheless, the content of a .lib-file representation of an abstract D-flip-flop is given in the Appendix in Listing 7.2 on page 161.

#### 2.3 Digital Design Flow

As a result of increasing IC complexity over the past six decades – following Moore's Law – the development paradigm has changed from designing small circuits with a few components to a design generation with millions or billions of pre-designed logic gates. Thus, a design solution had to be found for design generation of such complex VLSIs and systems. As a consequence, the standard Digital Design Flow (DDF) has been established. One of the main changes of the paradigm is, that instead of explicitly stating sub-circuits of transistor arrangements to realize a desired function, the design behavior is done by an abstract behavioral description in a register transfer language. It allows to generate complex digital circuits with the use of pre-designed standard cell gates designed in a selected technology node. The flow has been adapted, extended and the related CAD tools have been simultaneously improved to support new technology nodes and design features. The overview of the DDF and its different design stages are illustrated Figure 2.4. Abstractly spoken for everyone, it seems to be mostly like a combination in usage of selected CAD tools in a specific order (design stages) to obtain standard cell-based VLSI.



Figure 2.4: Standard digital design flow in a vertical arrangement

Finally, one result of the standard DDF is a mask set in binary GDS format which as the input for fabrication. As a consequence, this flow is also known as the RTL2GDS flow. The individual design flow stages (DF-stages) are introduced in the next paragraphs. A more detailed overview of the general design steps with the chosen tool chain of this work can also be found in the literature [29].

**DF-Stage 1 – Design Entry** At this stage, the design behavior is described in RTL according to a design specification with the use of hardware description languages (HDLs), such as VHDL, Verilog®, or System-Verilog®. The functionality can be written in technology-unrelated or technology-related manner. Moreover, the behavior can be described in structural and behavioral styles.

**DF-Stage 2 – Simulation** This stage is a continuously repetitive and can be performed after any other design flow stage. Herein, the functional behavior is verified by simulation with the use of HDL simulator environments. The informative value of the simulation and its functional coverage fully depend on the quality of the applied test benches with their stimuli. If the input of a simulation is a pure RTL design, the simulation is called RTL behavioral simulation. Alternatively, if the input is a technology-dependent netlist file, an ideal Gate-Level simulation is started. Moreover, if extracted timing (e.g., an .sdf-file, standard delay format (SDF)) from other design stage such as logic synthesis, or place and route is additionally applied, the simulation is referred to as SDF-Timing Simulation. Nevertheless, the behavior model of each used master component of the design has to be applied to the simulation environment in order to obtain a correct simulator setup.

**DF-Stage 3** – **Synthesis** In a third stage, the HDL source files of the functionallyverified design are analyzed and compiled by an HDL Compiler of a logic synthesis tool. As a result, a technology-dependent gate-level netlist is finally obtained. Within this work, the different synthesis steps are reduced to two parts, i.e., RTL-Synthesis and Gate-Level Synthesis (GLS) step.

During RTL-Synthesis, the analyzed design and its behavior are transformed into a generic, technology-independent netlist. The Boolean equations are solved and realized by interconnecting multiple generic gates. Any sequential description is mapped to a generic memory cells. All explicit hard-macro IP instantiations are directly mapped to their respective model representations. However, no technology information such as timing, area, or power is annotated to the design. At this stage, the design is just prepared for technology mapping and Boolean optimization.

In the next phase (Gate-Level Synthesis), the generic netlist scheme is mapped to the applied technology. The timing and power information of all objects is provided in the pre-characterized model files (see .lib in Section 2.2.3). Furthermore, design rule constraints, power and timing constraints are set by respective SDC or TCL commands. As a result, the design is realized with standard cell gates and IP blocks in the target technology. Afterwards, timing and power are extracted for the design. Thus, all paths are annotated with their respective delays and every violated path or design rule can be optimized incrementally. When all constraints are met, the design parameters in terms of number of gates, area occupation, power estimation, and timing budget can be extracted and analyzed. The gate-level Verilog® netlist (.v-file) is finally exported together with the obtained timing in SDF (.sdf-file). Both output files are taken as input by the design simulator for a post-synthesis SDF timing simulation. When all timing simulations have passed, the layout generation of the synthesized gate-level netlist can be started.

However, one hard limitation has to be considered with respect to the design of reliable and robust hardware systems in this work. Nowadays most popular logic synthesis tools such as Design Compiler [30] or Cadence Genus Synthesis Solution [31] support classical well-known single-ended signaling<sup>1</sup>. Interestingly, a single-ended signaling of differential logic gates was still supported in the early 1990's. As a consequence, and in advance, when differential logic design is addressed for the design of RnR hardware systems, additional effort is required in order to enable the compliance to the design tools and the DDF to benefit from the strengths of modern logic synthesis tools.

**DF-Stage 4 – Place & Route (PnR)** The generated netlist and the design constraints from the second stage are taken as input data for layout generation, i.e., PnR, or physical implementation phase. The technology information is additionally provided in terms of abstract layout views and timing and power models of all applicable cells and IP blocks. All essential internal steps of this stage are described. The flow is explained aligned to the selected Cadence Innovus Implementation System [26] in the context of this work.

The illustrated physical implementation flow of Figure 2.5 consists of five main stages. In a first stage, a floorplan is created for the target design. It covers the creation of the cell rows respective to the specified cell sites (see Listing 2.1 of Section 2.2), and the definition of the routing and placement grids. Moreover, this task further includes an ordered pin and I/O pad placement, pre-placement of IP blocks, and the drawings of a robust Power Distribution Network (PDN) grid with ring and stripe arrangements for special nets, i.e., ground and power supply. Finally, the connections of these special nets of the standard cells are realized for each row by additional horizontal rails. After floorplan creation, the placement of the standard cells and an early signal routing are executed according to the respective grid definitions.



Figure 2.5: Main stages of the physical implementation flow

In the second II-preCTS stage, the placed design is optimized and prepared for Clock Tree (CT) insertion, i.e., Clock Tree Synthesis (CTS). Design rule or timing constraint violations are optimized by gate-up-sizing, netlist restructuring, fan-out optimization, or remapping to standard cells with more beneficial characteristics, e.g., lower area occupation or power consumption. Moreover, high fan-out nets such as clock, and reset signals, are mostly excluded from optimization at this stage. They are declared as ideal nets or networks and are either implemented with dedicated buffer tree insertion engines, or directly optimized during CTS step. Particularly, all high-fanout nets are buffered with special driver cells maintaining the requirements such as, e.g., insertion delay, skew, or the transition time constraints to obtain good quality of the results of a CT. Moreover, when the design is timing critical, alternative clock stealing techniques,

<sup>&</sup>lt;sup>1</sup>also called single-rail logic/signaling, here: one wire carries one information, e.g., one bit

such as, useful skew can be enabled to resolve timing conflicts. After implementing the CT, all previously ideal networks are now fully implemented having a natural signal propagation.

Consequently, the design can be optimized for timing with respect to setup and hold time violations in a third stage (III-postCTS). When the setup time is violated, one solution is to reduce the length of the data path by optimization, whereas the data path is additionally delayed by adding delay cells for hold time violation fix. Alternatively, a similar strategy can be applied to the clock net in opposite direction, e.g., delay the clock for fixing hold time. However, global clocks are critical nets with special routing attributes which may require more routing resources. As a consequence, a local path optimization may be more advantageous in many cases. Nevertheless, when timing constraints are met, logical signals of the design can be routed in detail followed by design optimization in the fourth stage IV-postRoute.

In the last implementation stage V-SignOff, the design is checked for geometry issues, such as overlaps, shorts, unconnected pins and connectivity mismatches. The final timing and the corresponding gate-level netlist are exported afterwards. Moreover, a Design Exchange Format (DEF/.def-file) and/or GDS file is generated, either for further integration, final verification or design submission for fabrication.

**DF-Stage 5 – SignOff** In general, a generated design requires some deeper verification before fabrication. With the DRC, a design is verified against the technology-related design rules, such as, e.g., wire widths, via enclosures, spacings or metal density. The Electrical Rule Check (ERC) checks for the electrical compliance, such as voltage levels, opens and power supply connections of the used transistors. Finally, the LVS checks the design with respect to the contained components and signal connections of a netlist against the extracted netlist of the corresponding layout representation. After passing all checks, the design is *clean* and the layout view is ready for fabrication.

Nevertheless, the results of a digitally-designed chip fully rely on well-modeled timing/power models (.lib), accurate abstract layout information (.lef), and correct behavioral models (e.g., Verilog® models). Consequently, proper modeling and characterization of all used standard cells and IP blocks are essential to efficiently take advantage of the standard DDF. Thus, is it possible to generate good and, above all, correct complex digital designs.

### 2.4 Design-for-Testability and Low-Power Digital Design

The increased complexity of ICs with billions of devices on the same die demands mechanisms at various levels in order to cope, e.g., with design correctness after fabrication or self-heating issue. The first aspect is covered by DfT, and one technique to discover faults in a design is the use of *scan-test* approaches. The self-heating on the other hand can be controlled by low-power techniques such as, e.g., voltage-scaling, power-gating or *clock-gating*. These are well-known design features to improve the design power consumption. In this Section, the concept of *scan-test* and *clock-gating* are introduced as background information in the context of this Thesis.

### 2.4.1 Structural Test – Scan-Test

The insertion of scan chains is one technique to improve the testability of a design and to discover internal faults. For this purpose, modern CAD tools select scannable memory cells at the Gate-Level Synthesis (GLS) step, e.g., available flip-flop parts in the employed standard cell library and combine them into long shift registers. Each shift register forming a *scan chain* and provides external input and output pins with special scan control, additional scan data (SD) and enable signals (SE). Chains can be initialized with pre-defined test patterns. Afterwards, clock cycles are executed and the resulting data is shifted out and compared to the expected values. Thus, defects in the design such as stuck-at faults, open(s) and shorts can be discovered. The scan-test technology itself was introduced by Kobayashi *et al.* in [32] and, e.g., a design with a test coverage of more than 99% for stuck-at faults, is already well testable [33].

Figure 2.6 shows a register-to-register path with combinational logic in between, whereas the scan path realization is illustrated in Figure 2.7. As can be seen, memory cells are replaced by scannable counterparts. The scan path is controllable by the scan-enable signal to bypass the combinational logic part.



Figure 2.6: Register-to-register path with combinational logic



Figure 2.7: Scan path and mapped scannable flip-flops

The respective test pattern for a design is obtained by Automatic Test Pattern Generation (ATPG) tools, such as, e.g., Synopsys TestMAX<sup>TM</sup> [34] or Cadence Modus DFT Software solution [35]. They are generated by processing and tracing the entire connectivity through the logic of a design. Thus, scan chain insertion and a test pattern generation with a sufficient test coverage may also require slight changes on the RTL design.

### 2.4.2 Clock-Gating

Since the dynamic power  $P_{dyn}$  is proportional to the frequency  $f (P_{dyn} \propto f C_L V_{DD}^2)$ , one efficient power reduction method is to temporarily switch-off the clock activity, i.e., perform clock-gating (CG). As known from literature, up to 50 % or even more of the dynamic power can be spent in clock buffers [36]. Moreover, as also reported in [37], CG in a 180 nm chip has shown savings in power of 34 % to 43 % in comparison depending on the operating mode.

Standard cell libraries often provide standard gates to support clock-gating feature. They offer special integrated clock-gating (ICG) cells for this purpose. The clock control or the ICG cell instantiation can be described at the RTL level or be automatically invoked during logic synthesis step. However, clock-gating cells can be realized with different architectures. One standard cell-based solution for clock-gating is a D-latch with an enable signal (EN) as data input triggered by the clock signal (CK) to be gated. The output of the latch is further connected to an AND gate, which additionally controls the original clock signal CK (see Figure 2.8 (a)). As a result, when EN is low, the controllable output clock signal GCLK is safely gated, whereas the next rising-edge of the clock is propagating when the enable signal has been set to logical one as illustrated in Figure 2.8 (b).



Figure 2.8: The clock-gate: (a) standard clock-gate scheme, (b) waveform of a clock-gate

In contrast to alternative pure combinational solutions, such as the use of single AND gate, the arrangement with D-latches allows to control the clock activity without producing any intermediate clock pulse (glitch) on GCLK. Moreover, critical setup and hold time windows for the enable and clock signal are implicitly checked and considered during design generation. Thus, this solution is widely used in the field of digital VLSI design to reliably gate clock signals.

# 2.5 Signal Integrity Effects

Signal Integrity (SI) effects is a generic term for the quality of an electrical signal. Even though signals are interpreted as *zeros* and *ones* in the digital world, they are from analog nature in reality, i.e., a non-discrete, continuous voltage level compared to a reference level. Moreover, signals are also subjected to influences such as noise or distortion caused by changes in environmental conditions, generated by external components in an assembly, or directly on the chip. The quality of an electrical signal depends on many factors such as cell and macro placement, driver strength, routing metals and layers, or wire spacings and signal wire widths. In particular, when long wires are routed in parallel to each other, SI effects such as *crosstalk* (Xtalk) could lead to an undesired and unpredictable behavior of a system. The developed magnetic fields around the wires may interact with each other, causing a cross-coupling of the energy between adjacent signals [38]. As a consequence, a resulting transient pulse can be generated on a closely routed victim net. The glitch may propagate through the network and interact with subsequent connected logic, bringing a system to an undesired or undefined configuration state.

An example of such a Xtalk is shown in Figure 2.9. The illustrated scenario starts with a properly set initialization. The generated field by a transition (aggressor net) interacts with the magnetic field of an asynchronous reset network (victim net). Thus, all subsequent flip-flops are reinitialized to zero by the generated pulse. Another effect caused by SI is *ringing*, which is also known as the settling oscillation of a signal. However, both types of effects may cause additional, undesired current flows and would directly reduce the battery lifetime of mobile or portable systems.



Figure 2.9: Generated Xtalk and impact in a system configuration

The *reflection* of an original signal along a wire is another SI effect. Main reasons are the changes of the wires' electrical characteristics and the resulting impedance mismatches. In particular, the effect of reflections is more amplified at higher switching speeds. One of the mitigation techniques is the use terminating resistors connected to the sensitive signal or a differential signaling transmission. This is also realized in many high-speed interfaces such as DisplayPort, Peripheral Component Interconnect Express (PCIe), Thunderbolt or the newest standard of Universal Serial Bus (USB) operating in the multi-Gbps range above 20 Gbps.

However, the trend to integrate billions of devices on the same die and increasing the system frequency in parallel lead to more switching activity in a chip. In a fully synchronous system, this causes the circuit to pull cycle-by-cycle on the power supply network. The resulting *power supply noise* is increased and may reduce the speed performance of the used transistors when the PDN is not properly stabilized. Similarly, if a high-speed circuit suffers from too high dynamic switching currents, local *voltage drops* (IR-drops)

may occur degrading the performance of the devices in the affected region. In more detail, the resistant path to the nearby-placed power supply connections leads to a drop of the power supply potential in the local area. Moreover, the larger the transient currents on output nodes, the longer the voltage drop in time.

In order to deal with these aforementioned effects in highly-complex ICs, nowadays CAD tools for digital circuit design already address SI issues. They offer, e.g., crosstalk mitigation techniques and optimization strategies. For instance, the routing result of the design is analyzed for SI-hotspot areas [26] and alternative device locations, routes, spacings, and wire widths are chosen in order to decouple aggressor from victim objects. Furthermore, a strategy to address IR-drop reduction is to place special decoupling cells close to the affected regions in order to stabilize the PDN. Another technique is to split high fan-out nets into sub-trees and route the wire segments on different routing layers. Thus, the local dynamic IR-drop can be reduced.

However, all these countermeasures mentioned above are mainly suitable for digital designs realized in the well-known rail-to-rail CMOS logic. When differential logic design is selected with its smaller signal swing and static power consumption characteristic, such SI effects are less frequent.

# 2.6 Differential Logic and Signaling

The mitigation of SI effects such as reflections, attenuation and noise can be obtained with the use of differential signaling. Especially in the field of high-speed applications and off-chip communication, this concept is the most selected signaling technique to maintain a reliable transmission. In addition, circuits realized in differential Current-Mode Logic (CML) benefit from this signaling concept and achieve high-performance. This Section covers the most-relevant aspects of differential logic in the context of this work. It is seen from the digital design point of view. However, more details about differential logic design can be found in the literature, e.g., in [39].

### 2.6.1 Differential Signaling

In comparison to a single-signal transmission, each differential signal is transmitted as a pair of a true signal path and a false counterpart in case of differential signaling. As an example, the differential buffer with a positive/true Ap and a negative/false signal counterpart An is shown in Figure 2.10 (a).

Both signals are pair-wisely transmitted (b) with a defined peak-to-peak voltage  $V_{pp}$  and a voltage offset (i.e., the common mode CM), whereas their difference Ap-An is logically interpreted as the one-signal representation value (c). As can be seen in the figure, the resulting swing for a differential transmission, i.e., the difference, is two times larger than the specified  $V_{pp}$ .



Figure 2.10: Differential signaling: (a) a differential buffer, (b) the respective differential signaling, (c) the difference of the differential signal pair

The conventional circuit scheme of a respective bipolar differential CML amplifier stage is shown in Figure 2.11. Therein, it consists of two bipolar transistors with the collectors connected to resistors each and emitters coupled to a current source, respectively.



Figure 2.11: Differential amplifier CML stage

The peak-to-peak voltage  $V_{pp}$  of the differential signals is given by:

$$V_{pp} = I_{SS} \cdot R_L, \tag{2.2}$$

whereas the resulting voltage swing  $V_{SW}$  of the difference is defined by

$$V_{SW} = I_{SS} \cdot 2R_L = 2 \cdot V_{pp}. \tag{2.3}$$

As a consequence, and derived from Eq. 2.2, the amount of the current  $I_{SS}$  and the size of the resistance  $R_L$  determine the desired  $V_{pp}$  voltage.

In addition, a metric for reliability is the Noise Margin (NM) and as known from literature, the NM strongly depends on the specified voltage swing. According to [39] it can be approximated as,

$$NM \cong \frac{V_{SW}}{2} - V_T \ln\left(\frac{V_{SW}}{V_T} - 2\right), \text{ with } V_T = \frac{kT}{q}, \qquad (2.4)$$

where T is the temperature in Kelvin, k is the Boltzmann's constant, and q is charge of one electron. As a consequence, it can be stated that the larger the voltage swing  $V_{SW}$ , the larger the NM, and the more reliable the signal transmission. Furthermore, the minimum acceptable values of NM are in the order of 100 mV or above [39]. Contrarily to CMOS-based logic circuits, CML cells have a continuous stable current flow to maintain the logic-levels of the differential output signals. The resulting amount of static power consumption  $P_{stat}$  scales linearly with the current  $I_{SS}$  and the selected power supply  $V_{CC}$  of a differential amplifier as:

$$P_{stat} = I_{SS} \cdot V_{CC}. \tag{2.5}$$

As a result, this stable current does not create additional noise on the power supply. Moreover, the lower voltage swing of few hundreds of millivolts together with the constant current flow contribute to better noise performance of differential logic cells.

### 2.6.2 Current-Mode Logic/Emitter-Coupled Logic

One circuit topology for differential signaling is Current-Mode Logic (CML). This type of logic consists of active transistors, mostly arranged as stacked differential amplifiers, a constant current source, and pure passive resistor devices. The early concept used an opposite arrangement with bipolar transistors connected to a negatively supplied current source and resistors connected to the ground potential. It was initially known as current steering logic and invented by Hannon S. Yourke in 1956 [40]. The topology where resistors are connected to a positive power supply is illustrated in Figure 2.12. It shows a CML buffer with additionally connected bipolar transistors to the emitter nodes. This circuit style is known as Emitter-Coupled Logic (ECL). The arrangement in the figure is interfacing a positive power supply. Thus, it is also named positive ECL (pECL). The CML circuits depicted in Figure 2.11 and Figure 2.12 have the same static behavior, voltage swing, and noise margin. They only differ in performance and output node voltage levels.



Figure 2.12: Differential CML amplifier with emitter-follower (ECL)

Emitter-Coupled Logic (ECL) has played a dominant role in high-speed computation throughout the 1960s to 1980s. The ECL implementation of the Motorola 88000 RISC architecture is published in [41]. It consists of 50000 ECL gates arranged in a macro cell array structure. Moreover, the B5000 Sparc integer unit was the first highly-integrated ECL microprocessor [42]. As reported, they had been able to design small ECL systems with lower costs that perform equal at a level to large mainframe computers of these days.

### 2.6.3 Limitation

More complex CML-based cells such as 2- or 3-input gates can be realized by stacking differential amplifiers. As an example, the circuit scheme of a CML 2:1 multiplexer is illustrated in Figure 2.13 (a).



Figure 2.13: CML gates at transistor-level: (a) a differential CML 2:1 multiplexer, (b) an 8-input wired-OR CML gate

Its function forwards the logical value of the differential input pair A to the output nodes Xp/Xn when the difference of Sp-Sn is logical one, and pair B when pair S is logical zero in difference. As can also be seen, the output levels are determined by the power supply  $V_{CC}$  and the voltage swing. The lower nodes are down-shifted by one base-emitter voltage  $(V_{BE})$  which is typically around 0.8 V. As a consequence, the power supply limits the number of applicable stacked differential stages (n) for CML gates following this approach. The relation is listed in Eq. 2.6.

$$n \le \frac{V_{CC} - V_{pp} - V_{ISS}}{V_{BE}} \tag{2.6}$$

One realization of more complex functions can be obtained by concatenation of serial computation with the use of, e.g., 2-input logic gates. An alternative circuit design for an OR function with more inputs is the utilization of the wired-OR function in more complex systems which is illustrated in Figure 2.13 (b). It is advantageous with respect to the delay and power due to the use of a smaller number of active devices and current sources. Nevertheless, the opposite node of the differential switch is biased by a reference voltage  $(V_{ref})$ . Thus, the respective voltage swing and therefore, the NM is reduced by half. Moreover, a similar mode can be applied to full differential CML/ECL gates (e.g., the 2:1 MUX) when all opposite nodes are biased with their corresponding reference voltage. In this case, the cells operate in the single-ended mode (SE-mode).

### 2.6.4 Comparison of CML and CMOS Inverter

Differential signaling has many advantages in comparison to full swing rail-to-tail signaling (ground to power supply) used in most CMOS circuits. It is less dependent on the cell propagation delay, connected fan-out or load in comparison to classical CMOS. It is more robust against signal and power noise and other SI effects. Furthermore, a lower power consumption at higher speeds is also advantageous for circuits in the field of reliable high-speed applications.

For demonstration purpose and comparison, transistor-level simulations are performed of CMOS and CML inverter circuits in the 250 nm BiCMOS standard technology (see Section 2.1). The results illustrated in Figure 2.14 (a) and (b) are obtained with nominal conditions, i.e., 2.5 V power supply, nominal device models and a temperature setting of 25 °C. As can be seen in Figure 2.14 (a), the slope of the fan-out-dependent cell delay is less steep for CMOS than for the CML stage. Similarly, the average current for CMOS inverter depicted in Figure 2.14 (b) scales with the frequency and is also proportional to the fan-out load. In contrast to that, the CML circuit performance benefits from the constant current source of the differential stage. Independent of the connected fan-out indicated by the FX-suffix in the figure, the CML inverter draws less current, especially at higher speeds. The fan-out notation for Figure 2.14 (b) is as follows: F1 for fan-out of one, F3 for three, F5 for five, and F7 for a fan-out of seven unit inverters, respectively.



Figure 2.14: Comparison of CMOS and bipolar CML inverter

The required masks and the used layers of MOS and HBT devices are also different as depicted in Figure 2.15. With respect to routing layers, the terminals (source, gate, and drain) of a single-finger MOS transistor are provided at the lowest metal layer as indicated by the dark-blue shapes.

Accordingly, the HBT layout contains the second metal layer for the emitter (E) connection in the center of the cell. Moreover, the core devices are encapsulated by special guard-ring structure. Thus, the layout frame of the bipolar device is already more than 12 times larger in comparison to the MOS counterpart.



Figure 2.15: Layout views of transistors: (a) NMOS transistor, (b) npn-HBT

As a consequence, each single bipolar transistor has a huge impact on the area occupation when standard cell logic gates are addressed.

However, with increasing complexity in the integration of VLSI and higher switching frequencies of the MOS transistors, the implementation of MOS-based differential switches became more and more important. The CML approach realized with active MOS devices is known as Source-Coupled Logic (SCL), CMOS-CML, and MOS Current Mode Logic (MCML) as well. It provides a good tradeoff between power consumption, area occupation, and speed performance.

On the other side, the routing of differential signal pairs is more challenging and occupy more routing resources of a design. It is known from literature, that the penalty for differential high-performance designs is mainly paid by additional silicon area occupation [39]. In particular, a balanced routing with RC-matched wire characteristics of the differential pair nets is preferable in order to have good signal quality. Nevertheless, when a standard technology provides bipolar devices with switching speeds wide beyond the range of MOS transistors, bipolar CML and ECL are attractive circuit solutions for reliable timing-critical high-speed designs.

# 2.7 Radiation-Induced Effects

Whenever integrated circuits are intended for operation under radiation, e.g., for space applications, the robustness against radiation-induced effects is the main concern [43, 44], and [45]. These effects can be triggered by particles of galactic cosmic rays, solar flares, or radiation belts. With respect to this work, two groups of effects are addressed an RnR-System has to deal with, i.e., TID effect and single-event effect (SEE). They are introduced in the two subsequent sections.

### 2.7.1 Total Ionizing Dose

Total Ionizing Dose (TID) effects are long-term radiation effects as a result of radiationinduced charge deposition in the transistors' gate oxide. They result in threshold voltage shifts of devices leading to an increased leakage current causing a performance degradation of the affected transistor. Thus, a functional error could occur in timing-critical systems. Moreover, larger-scaled CMOS technologies are more sensitive to TID effects due to thicker oxides. The total dose in many space missions is below 100 kRad and CMOS technologies already withstand doses up to 300 kRad [46]. There are also measures at the transistors' layout level in order to improve the TID robustness. MOS transistors can be drawn in an alternative Enclosed Layout Transistor (ELT) arrangement [24, 47, 48, 49] to maintain the leakage current level even after irradiation. The structure of an ELT is shown in Figure 2.16.



Figure 2.16: Abstract layout of an NMOS ELT transistor

However, the ELT layout may require more silicon area and an increase of the dimension of the overall logic gate size has to be considered.

### 2.7.2 Single-Event Effects

The second group of effects are Single-Event Effects (SEEs) which are more important for digital circuits. They can directly result in functional errors or structural defects in the silicon, leading to improper function or system deadlock. Its first observation was already in 1974, while the interaction of cosmic rays led to unexpected triggers in digital circuits. These effects had been later called SEEs, since the erroneous behavior was a result of a single event [1], caused by one single energetic particle strike.

The four most common types of particles, which can lead to SEEs are ions, protons, neutrons or alpha particles. They are present in the cosmic rays, solar flares or the radiation belt of the earth. The energy deposition in the silicon material is a result of generated electron-hole pairs by the ionization effect, or generated ions by *Bremsstrahlung* (braking/deceleration radiation) [1]. Both can lead to undesired current paths in the material,



Figure 2.17: An energetic particle strike which may cause an SEE

which result in changes on the voltage levels in the silicon of semiconductor devices. The phenomena in a MOS transistor is illustrated in Figure 2.17. As can be seen in this cross-section, a strike by a single energetic particle induces a current flow along its ionization path. This current flow could result in a voltage drop on the drain node and finally lead to an SEE in an off-state transistor.

### 2.7.2.1 Hard-Errors – Single Event Latchup (SEL) and Burnout (SEB)

SEEs can be classified into two main groups. The first group are known as *hard-errors*. They can damage the internal structures of a silicon device and are therefore, the most serious type of errors. One of the most famous and dangerous effects is the Single Event Latchup (SEL) [50]. This is a high operating current above the devices' specification as a result of a short-circuit generated by a low-impedance path between the power supply rails. A parasitic structure equivalent to stacked PNP-NPN transistors is triggered, which conduct both, one after another.

In most cases, a power reboot procedure is required in order to recover from such effect. However, a high current up to several amperes might damage or destroy the devices, or even melt the connections. In this case, this hard-error is also classified as a Single Event Burnout (SEB).

### 2.7.2.2 Soft-Errors – Single Event Transient (SET) and Upset (SEU)

The second group are non-destructive, temporary-present errors called *soft-errors*, and two of them are highly-important for digital circuits. As an example, when an energetic particle interacts with combinational logic and deposits sufficient charge, a voltage glitch might be generated. The resulting pulse may propagate through the circuitry. This kind of effects is called Single Event Transient (SET).

If such a transient arrives on the data path of memory cells within the sensitive timing window when the clock edge arrives, the transient pulse is captured by the clock and results in an error, i.e., a Single Event Upset (SEU) – a stored bit-flip as a result of a transient. In addition, when a transient propagates through clock driver cells, the glitch on the critical clock net may also result in upsets. A second scenario for an SEU is a direct hit of the memory cell by a single high-energetic particle. Moreover, when an SEE results in multiple bit-flips in storage cells, a Multiple-Bit Upset (MBU) has occurred. Both aforementioned scenarios are illustrated in Figure 2.18 (a) and (b).

As can be seen, sequential cells such as latches or flip-flops are vulnerable to soft-errors in many different ways. They are sensitive to transients (SET), captured transients resulting in an upset (SEU), and SEUs as result of direct particle hits. Furthermore as reported in [51], induced soft-errors could remain unrecoverable in a system. This leads to the sentence that sequential cells, i.e., cells with memorizing function, have to be considered and treated as the most critical components in a digital design.



Figure 2.18: Particle hit in digital circuits: (a) SET generation and resulting SEU, (b) generated SEU by direct hit of a memory cell

### 2.7.3 Charge Generation and Linear Energy Transfer (LET)

When semiconductor material is stroke by high energetic particles, the particles lose energy along their penetration paths. The deposited energy in the silicon material is a result of electron-hole pairs (free charge carriers) or generated ions which can result in current paths in the material. The resulting generated charge  $Q_{gen}$  by a specific particle hit in a material is defined as the Linear Energy Transfer (LET) which is a measure of the particle's interaction. The LET is given in MeV cm<sup>2</sup> mg<sup>-1</sup> and denotes the energy lost by a particle per unit track length (L). According to [52], [53], the particles' LET and the respective generated charge  $Q_{gen}$  are related by the following equation:

$$Q_{gen} = 1.035 \times 10^{-2} \times L \times LET \tag{2.7}$$

Furthermore, the critical charge  $(Q_{crit})$  is a measure of the sensitivity of a single circuit node, which represents the minimum required collected charge in order to generate a transient (inside a logic gate). The lower the  $Q_{gen}$  is, the more vulnerable a node is with respect to a transient event. However, a change in the logical value only results when the voltage of the affected node exceeds half of the supply voltage of a MOS device. Moreover, its value depends on technology parameters, the applied voltage and nominal temperature.

A similar metric to the critical charge which defines the onset of errors (e.g., transients) is the threshold LET (LET<sub>th</sub>) [53]. The relation between  $Q_{crit}$  and LET<sub>th</sub> can be approximated with Eq. 2.7 to the same relation as

### 2.7. Radiation-Induced Effects

$$Q_{crit} = 1.035 \times 10^{-2} \times L \times LET_{th}.$$
(2.8)

 $Q_{crit}$  can be characterized by simulation in order to obtain a statement of the circuits robustness to transient effects. On the other hand, the robustness of a circuit can be demonstrated by irradiation campaigns in which the resulting logical errors are counted while the circuit is irradiated with particles of a certain ion (with the associated energy (LET)). Campaigns for SEE investigations are called heavy ion tests.

Finally, the effective LET  $(LET_{eff})$  of an LET can be derived as follows:

$$LET_{eff} = \frac{LET}{\cos\theta},\tag{2.9}$$

where  $\theta$  is the angle of a particle strike. When  $\theta = 0^{\circ}$ , it represents the orthogonal position of the beam to the surface [54].

However, most energetic particles in space have the LET below  $30 \,\mathrm{MeV \, cm^2 \, mg^{-1}}$ , while the particles with higher LET are rare [55].

### 2.7.4 Circuit Cross-Section

The probability a particle hits a sensitive region in a logic gate is defined by the ratio between the counted *Errors* and the particle *Fluence*. According to [54], the apparent cross-section  $\sigma$  is calculated as:

$$\sigma = \frac{Errors}{Fluence}.$$
(2.10)

The result is given in  $cm^2$  and fluence denotes the intensity of radiation (exposure rate) as a number of particles per unit area. Moreover, fluence per unit time is known as flux and is given in particles/ $cm^2/s$ .

To compare and to weight a circuit's robustness, the cross-section is plotted over LET based on numbers of errors obtained from heavy ions irradiation measurement campaigns. The extracted points are typically fit by a *Weibull* curve.



Figure 2.19: Cross-section and Weibull fit

The saturation is the flat top of this curve, whereas the threshold LET is the start of the first error. For example, Figure 2.19 shows a *Weibull* fit  $(G_W)$  and the saturation (dashed line) of the cross-section over LET. The cross-section points are given per flip-flop device and for illustration purpose only.

### 2.7.5 Charge-Sharing

When a particle strikes circuits in larger-scaled CMOS technologies, the charge would be collected in one single device. However with technology downscaling, the number of transistors in the same area is increased, whereas the spacing between them is reduced in parallel. Consequently, sensitive nodes are closer and a particle hit may affect the area of more than one transistor. The collected charge is then a result of the *charge-sharing* effect, which is also known as Multiple Node Charge Collection (MNCC) [56].

Based on the analysis reported in [57], the affected area information acquired for a 150 nm SRAM technology can also be used for particle strikes with the same energy in a 45 nm and 15 nm technology. As stated therein, particles with an energy of 144 MeV may affect an area around  $4.61 \,\mu\text{m}^2$ , while 47 MeV affect only  $1.9 \,\mu\text{m}^2$ . Furthermore, a range for a 90 nm technology of  $4.5 - 5.0 \,\mu\text{m}$  from a strike location is reported in [58]. Others propose also a *nodal separation* of  $8.5 \,\mu\text{m}$  to reduce SEUs in memory gates in this technology node [59].

### 2.7.6 Transient Masking

According to [60], transient propagation may not result in errors in memory cells (such as flip-flops or latches) due to one of three phenomena illustrated in Figure 2.20.

- Logical Masking occurs when the propagation of the resulting logical value by the transient is interrupted, blocked or suppressed by subsequent connected logic gates. For instance, if an AND gate drives logical zero with both inputs are zero. No change at the output node will occur in case of any incoming transient at one input.
- **Electrical Masking** is the effect, when the resulting pulse of an SET is attenuated by subsequent connected logic gates due to their electrical characteristics. In more detail, more complex logic gates may have larger pin capacitances or propagation delays and therefore mitigate the incoming transients.
- Latching-window Masking also known as *Temporal Masking* occurs when a transient arrives at the data input of a memory cell such as flip-flops outside the critical capturing window of the cell. The probability of a potentially resulting upset depends on the pulse width of the transient and on the setup and hold time of the memory cell (see also Section 2.2.3).



Figure 2.20: Masking effects of resulting transient by single particle hit

### 2.7.7 Single Event Transient Quenching and Broadening

The width of an SET pulse may vary from a few picoseconds to several nanoseconds depending on the affected interacting logic gates and the selected technology. Transients can be quenched by inter-gate charge-sharing with adjacent transistors [61]. On the other hand, the propagating pulse can be broadened (widened) by subsequent logic gates [62].

However, based on mixed-level simulations, transients approaching the 1 ns mark for LETs above 50 MeV cm<sup>2</sup> mg<sup>-1</sup> had been already predicted for bulk CMOS technology in [63]. SET pulses between 400 ps to 700 ps in a 130 nm CMOS technology, and pulses in the range of 500 ps to 900 ps in a 90 nm process have been verified by measurements as published in [64]. A maximum SET width of 200 ps is reported in [65] for a 65 nm bulk-CMOS technology, whereas 165 ps is the average SET pulse width of a 28 nm planar bulk CMOS process [66].

# 2.8 Fault-Tolerant Circuits

In particular, in space environments it is nearly impossible to provide any maintenance. Hence, a robust implementation of semiconductor circuits with high reliability is the main requirement [67]. One method to improve the fault-tolerance to radiation-induced effects of a circuitry is to add redundancy at software or hardware levels. Two of the most common hardware redundancy techniques are Dual/Double Modular Redundancy (DMR), and Triple Modular Redundancy (TMR). Since TMR is selected as the redundancy concept in this work, an introduction is given in this Section.

### 2.8.1 Triple Modular Redundancy (TMR)

Triple Modular Redundancy is one efficient redundancy design technique to improve the reliability of a design or a system. It can be applied at many levels of abstraction, e.g., entire processor systems, modules, cells, or finer-grained to individual devices respectively. In the context of this work, the TMR is applied to memory cells such as flip-flops. It means to have more than one, i.e., replica circuit, to represent the memory cell content. A term which reflects such an arrangement is *spatial redundancy*.

However, the baseline concept of a TMR circuitry was already introduced by Von Neumann in [68]. It consists of three redundant circuit parts of sensitive logic and majority voter circuits. The bare TMR applied to flip-flops is a simple, feedback-less stable solution that enables a one-bit error correction. The circuit scheme of this baseline TMR circuit for flip-flops is illustrated in Figure 2.21 (a).



Figure 2.21: Scheme and transient response of a TMR gate: (a) baseline TMR scheme applied to flip-flops, (b) transient response

As can be seen in the left figure, the outputs of the triplicated sensitive units (here flipflops) are evaluated in a comparator/voter module (MAJ), which passes the majorityvoted result to the primary output Q. As shown in the transient simulation of a TMR circuit in Figure 2.21 (b), the fault has occurred in the third flip-flop (node q2). Nevertheless, it is finally corrected, i.e., the fault is masked, by the voter logic.

However, when TMR is directly applied at the cell level (flip-flops) by adding two additional replica flip-flops, it interacts locally at the lowest possible level of abstraction. The increased overhead in terms of silicon area, power and cell propagation delay is not negligible. Already for the presented scheme above, the sequential area is immediately triplicated, and the cell delay is prolonged by the voter delay in addition to the flipflops' delay. Moreover and with respect to the overhead reduction of TMR, it is known from the work presented in [69], that e.g., 80% of the soft-errors in ISCAS89 benchmark circuits [70] are related to just 50% of the logic gates. A more effective approach is to only protect sensitive cells or modules in the design, i.e., *selective-hardening*. Thus, the introduced overhead by hardware redundancy can be reduced to an acceptable minimum in real applications.

The baseline TMR is capable to mask one single fault in one of the three flip-flops. The fault may also be a result of an upset, generated by a direct particle hit. However, this basic TMR flip-flop is not robust against SETs and the resulting upsets. Particularly, any transient which is propagating on the data or clock path to all three flip-flops finally leads to an MBU. Nevertheless, TMR is one of the most widely used design techniques in fault-tolerant designs and reliable systems for space applications. However, the reliability

of TMR also depends on the reliability of the individual components, as discussed in the next section.

### 2.8.2 Reliability of TMR Circuits

According to [71], a non-redundant module follows the exponential failure law and has a reliability  $R_M$  with L as the failure rate in failures/hour. The reliability is given as:

$$R_M = e^{-Lt}. (2.11)$$

As reported in [72], the reliability for an N-Modular Redundancy (NMR) system  $(R_{NMR})$  constructed with an ideal voter can be expressed as:

$$R_{NMR} = \sum_{i=0}^{N} \binom{N}{i} (1 - R_M)^i R_M^{N-i}.$$
 (2.12)

 $R_M$  is the reliability a module generates correct outputs, and  $\binom{N}{i} = \frac{N!}{i!(N-i)!}$  represents the amount of possible combinations of *i* correct modules and N-i faulty modules [73]. In such a system, a maximum number of n = (N-1)/2 faulty modules can be reliably tolerated with N redundancy levels. As a consequence, having N = 3 redundant levels as in a TMR system, n = 1 faulty modules can be corrected (e.g., one fault of one flip-flop). The reliability  $R_{TMR}$  of a TMR system with ideal voter can be subsequently derived as,

$$R_{TMR} = \sum_{i=0}^{1} \binom{3}{i} (1 - R_M)^i R_M^{3-i}$$
(2.13)

$$= \binom{3}{0} (1 - R_M)^0 R_M^3 + \binom{3}{1} (1 - R_M)^1 R_M^2$$
(2.14)

$$=3R_M^2 - 2R_M^3.$$
 (2.15)

The relation of Eq. 2.15 implies an ideal behavior of a fault-free voter. When the reliability of the voter  $(R_V)$  is additionally considered, the following relation for the TMR system can be derived by multiplied fault-free probabilities of the modules as:

$$R_{TMR_V} = R_{TMR} \cdot R_V = 3R_M^2 R_V - 2R_M^3 R_V.$$
(2.16)

The reliability  $R_S$  of a single system S, the reliability  $R_{TMR}$  of a fault-free TMR system, and the reliability for a faulty TMR with faulty voter are depicted in Figure 2.22.

Derived from the figure, a replication of an unreliable system to TMR, i.e., if  $R_M \ll 0.5$ , reduces also the reliability of the TMR system. The triplication of an unreliable system is not beneficial, since the risk of faults is increased ( $R_{TMR} \ll R_M \ll 0.5$ ). On the other side, systems with  $R_M > 0.5$  benefit from a triplication ( $R_{TMR} > R_M$ ). Moreover, the reliability is additionally degraded when the voter is not reliable itself (see the dashed curve of  $R_{TMR_V}$  in Figure 2.22).



Figure 2.22: Reliabilities of a single system S (red), a fault-free TMR arrangement (blue), and TMR with faulty voter (black)

# 2.9 Temporal Redundancy/Temporal Sampling

According to [74], temporal sampling is the term for the use of internal data or clock delays, or multiple clocks in TMR arrangements to mitigate SEEs. In both cases the temporal redundancy (TR) is applied to a single signal. Two related approaches can be distinguished. The first one is to add TR on the data path of a TMR structure in such a way that the data path is split and delayed individually by, e.g.,  $\delta$  (see Figure 2.23 (a)). In the second approach, the delays are added on the clock signal resulting in *temporal clocks* as illustrated in Figure 2.23 (b).



Figure 2.23: Temporal redundancy applied to: (a) the data path, (b) the clock path

In both cases, the amount of  $\delta$ -delay is chosen with respect to the maximum expected transient pulse width. The waveforms of both solutions are depicted in Figure 2.24. As can be seen, the delay elements shift the data in time. As long as the pulse width of the transient is shorter than the  $\delta$ -delay, the transient is captured by only one flip-flop. Consequently, the voter corrects this single fault. Similarly, the sample window of the flip-flops is separated in the second approach of Figure 2.23 (b), and the resulting upset,

39

i.e., the captured transient by the second flip-flop (second dashed line of Figure 2.24 (b)) is corrected by the voter.



Figure 2.24: Waveforms of TR approaches: (a) single error correction by data path-related TR, (b) single error correction by clock path-related TR

However, one advantage of the clock-related TR is the robustness against transients on the clock path, i.e., C-SET. Whereas the data path-related TR is capable to mitigate transients on the data path (D-SET), any transient on the clock path would result in an MBU. Contrarily, when a transient occurs on one of the clock paths and the clock network is not shared (i.e., also known as TMR clocks), the second solution with *delaying-the-clock* is less sensitive to C-SET.

"Altern ist ein hochinteressanter Vorgang: Man denkt und denkt und denkt – plötzlich kann man sich an nichts mehr erinnern"

Ephraim Kishon

# Chapter 3

# **Related Work**

This Chapter introduces the State-of-the-Art (SotA) related to both hardware systems addressed in this Thesis. It covers design flows and standard cell concepts for differential logic design and radiation-hardness-by-design circuits. Moreover, the methodologies for SEU and SET mitigation are presented. The chapter concludes with a discussion and the main objectives of this Thesis.

# 3.1 Design Automation of Differential Circuits

Differential logic and its signaling concept are quite popular in the field of analog applications, but remain a niche for digital standard cell-based design. The logical function and the signaling of differential logic gates are not applicable in straightforward manner for design generation with standard cell gates using modern digital design tools. Instead, these CAD tools for digital ASIC design and the respective design flow are optimized for classical single-ended/single-rail CMOS-based VLSIs.

Nevertheless, the aforementioned advantages of differential logic have already expressed interest in the use of CAD tools of the digital design flow. In particular, the dominating logic was bipolar CML/ECL with respect to high performance designs in the early 1990's. An automatic netlist generation with the use of an ECL logic synthesis system is published in [75]. As reported, efficient designs can be obtained with the benefit of wired-OR function in combinational logic. Similar results in area, power, and delay have already been obtained in comparison to a hand-made design. Thus, the applicability of a synthesis tool for ECL logic has been demonstrated. However, the differential signaling and various signal voltage levels are not considered in this work.

Standard design tools for logic synthesis and place and route are not optimized for complex differential gates and their handling during logical optimization. When differential logic design and signaling are targeted, additional design effort is required to enable the design generation with commercial design tools. Thus, one can benefit from the digital standard cell-based design flow. The next sections introduce two of the most relevant approaches, and the idea of the fat-wire design which enables the design generation.

### 3.1.1 Secure Digital Design Flow

A VLSI design flow for generation of secure, side-channel attack (SCA) resistant integrated circuits is presented in [76]. An improved resistance against differential power analysis is obtained with the use of CMOS-based Wave Dynamic Differential Logic (WDDL) gates. These WDDL gates are compositions of cells from a standard cell library. Figure 3.1 shows the conventional AOI22 implementation (a) and the WDDL circuit counterpart (b).



Figure 3.1: Different logic cells: (a) a static CMOS logic cell, (b) the WDDL counterpart

WDDL gates have complementary inputs and outputs. A matching of the interconnect capacitances of the differential signal wires is crucial for the countermeasures to succeed [76]. This can be achieved with parallel signal routes on the same metal layers with similar resistance and capacitance (RC-match). The proposed extensions on the standard design flow to maintain this additional requirement are illustrated in Figure 3.2.



Figure 3.2: Main changes on the standard digital design flow proposed in [76]

The main difference are two additional stages, namely the *cell substitution* after synthesis, and the *interconnect decomposition* after place and route stage. Moreover, the *stream out* of the design to the final layout view, e.g., GDSII uses an alternative library setup. In more detail, the logic synthesis result contains standard static CMOS logic gates (rtl.v). The content of this netlist is substituted afterwards to special WDDL gates by scripting to a fat.v. Moreover, a differential *inverter-less* netlist is generated by crossing the differential signal pairs. The semantic and the behavior of both netlists

are equal, except the differential signal pairs are modeled as a one single representation. The layout is generated with the fat variants of the netlist and the respective physical views in terms of .lef-files of all cells. As a result, a fat-wire layout of the WDDL design in .def-format is generated. This is translated in the interconnection decomposition and stream out stages to a corresponding differential counterpart. In these phases, a duplication and translation of the fat wires, and a width reduction with the differential library representations during the designs' stream out are executed. Thus, a full differential WDDL design is obtained with similar RC-characteristics of the differential signal pairs.

### 3.1.2 Via-Programmable Flow

A design flow and implementation with the use of via-programmable MOS-based CML (MCML) universal logic cell is presented in [11] and [77]. The result is a homogenous, regular array layout with differential signaling. The evaluation of this work is done with circuits in a 180 nm CMOS technology.

The core logic of this flow is the generic universal MCML cell. It is an expanded universal gate that can implement all Boolean functions with three inputs and a significant subset of functions with 4 or 5 inputs. Its function is programmed after place and route stage by usage of special translation layer. As a result, a full differential MCML design is obtained, arranged in a matrix array of these universal MCML gates.

The efficiency of the differential design is compared to a respective CMOS implementation of large-input majority decision units and a Radix 4 complex FFT processor. Independent of the area, power, and delay performance of the MCML processor, one important result is the reduction of the power supply noise. According to simulation results, the MCML solution shows two orders of magnitude lower generated power supply noise compared to CMOS.

The changes on the standard digital design flow to obtain and enable such design generation is shown in Figure 3.3. In contrast to the *Secure Digital Design Flow*, the universal gate is a custom cell. Thus, a standard cell characterization task is required (not shown in the figure). The cell interface with the differential inputs and outputs has to be modeled as single-ended pin counterparts to enable logic synthesis and fat-wire place and route. This is achieved by extraction of information of the universal logic gate's characterization result.



Figure 3.3: Differences of the design flow after [11, 77]

However, an intermediate pseudo-single-ended netlist is generated by commercial logic synthesis tool with the use of the extracted characterization result (ss\_syn\_lib in Figure 3.3). To enable the signal inversion by crossing a differential signal pair, the library is equipped with special inverter cells with zero delay and area. These are removed by an additional netlist conversion step (iss.v  $\rightarrow$  ss.v). The layout generation at place and route stage is done with the single-ended *inverter-less* netlist.

Afterwards, fat wires are split in a wire splitting stage leading to the final differential layout representation. The functional programming is realized by insertion of special translation layer to enable the pin-access to the real pin-shapes of the universal logic gates. Finally, logical equivalence checks can be executed to verify the converted design to its golden functionality.

### 3.1.3 Fat-Wire Approach

All aforementioned design flows utilize an intermediate netlist and layout view in order to obtain in-parallel routed differential signal pairs. Thus, the differential standard cell-based design generation is enabled. The views at logical level are often called *singleended*, whereas the corresponding layout view is a *fat-wire* representation. Both views are used for pseudo-design generation and require subsequent design tasks to obtain real technology data. The design methodology and the essential requirements for cell design are presented in [78]. Both enable the place and route of secure standard cell design with differential logic.

According to [78], a fat wire covers two differential wires and has a width  $W_f$  which is the sum of the pitch  $P_n$  plus the 2 times half of the normal wire width  $W_n$ , i.e.,:

$$W_f = P_n + 2W_n/2.$$
 (3.1)

The fat-wire pitch  $P_f$  is given by:

$$P_f = 2W_f/2 + \Delta \tag{3.2}$$

with  $\Delta$  being a user-defined distance to reduce cross-talk effects.

Figure 3.4 illustrates both wire structures. As can be seen, the centerline of the fatwire segment has the same offset as the centerline of the differential pair. Thus, it is guaranteed that all differential wires are on the same signal routing track after design conversion by wire splitting.



Figure 3.4: Wire structures with pin-access and centerlines: (a) fat-wire signal, (b) the differential (split) counterpart

One further important restriction is the usage of preferred routing direction during routing. If a signal is routed with wire segments on non-preferred routing directions, turns of this signal might be realized with the same metal. This could lead to shorts after splitting the wires. Consequently, it is advantageous to use metals in preferred directions only. Thus, no overlap of split wire segments is generated, since each turn is realized by vias interfacing two segments on valid directions and different metal layers.

With the use of this approach, the routing of a differential WDDL design is obtained with balanced capacitive load at the differential outputs (see also Section 3.1.1). The requirements defined in [78] implement the fundamental concept for fat-wire routing and enable the digital design flow for security applications [76, 79]. The experimental results show that this routing approach is beneficial to successfully complicate differential side channel attacks.

# 3.2 Differential CML-based Standard Cell Library Concepts

This Section gives an overview about two further existing differential standard cell library concepts despite of the approaches mentioned in the previous sections.

### 3.2.1 MCML Library Generation with Footprints

A generic methodology for rapid generation and implementation of standard cell libraries for differential circuit design styles is presented in [80]. In this work, more than 4500 standard cells are generated based on 19 cell footprints. The generation is done, by connectivity evaluation of the differential core logic with binary decision diagrams. An extensive cell library is regenerated starting from the footprint templates. The methodology is presented for MCML. It is evaluated in a 180 nm CMOS technology. The approach is generally developed to be employed for other differential logic styles.

### 3.2.2 ECL Standard Cell Libraries

An ECL standard cell library for high performance applications is specified in [81] for the low-cost 250 nm BiCMOS technology of IHP. It consists of a rich cell set with fullydifferential combinational gates, latches and flip-flops. Cells are compatible to a 2.5 V power supply. More complex NOR gates with four or eight inputs are additionally offered. They are internally realized with the wired-OR function in order to reduce power and area overhead and to improve the propagation delay. The cells are optimized for a robust design with a nominal peak voltage of 400 mV between the high and low level. Furthermore, almost all differential gates could be driven in single-ended mode (SE-mode) with a small increase in propagation delay but an improvement in signal routing.

The library is logically separated in three *Speed-Classes*. This is a term which represents the implemented load resistance of an ECL gate, i.e.,  $300\Omega$ ,  $500\Omega$ , and a special pmos class. For this class, the resistance is realized with standard pmos devices biased by a

control voltage. These gates can be tuned for the required speed and improve the overall power consumption. Whereas the 300 and 500 $\Omega$  class consume up to 5 and 2 mW per gate, and the pmos class can be tuned below down to 80  $\mu$ W respectively.

Moreover, all cells provide the same signal interface in terms of the  $V_{BE}$  levels (see Figure 2.13). Consequently, the essential level-shifters are included inside the ECL standard cell gates. The library also provides converter cells for CMOS and Low Voltage Differential Signaling (LVDS) signaling. In particular, this library offers VHDL behavioral models with generic parameters to enable digital timing simulations. Thus, any structurally-designed ECL application can be immediately simulated with digital simulation tools without design steps such as logic synthesis, or timing extraction. With the use of these standard cells, an ECL clock divider as part of a fractional-N low noise frequency synthesizer has been designed [12]. However, the layout design of the standard cell divider has been done with the use of the analog design approach.

Finally, a similar library concept with additional macros, e.g., Arithmetic Logic Unit (ALU), registers, memories, and Phase-Locked Loop (PLL) is used to obtain high-speed operational nodes [82, 83]. The selected technology node is IHP's 130 nm BiCMOS technology (see Section 2.1). The operational nodes are suitable for systems in the field of communication or real-time measurements with data rates above 30 Gbps per channel.

# 3.3 Standard Flip-Flop Architectures

Rising-edge triggered D-flip-flops capture the incoming data with the rising clock edge. Internally, they are often arranged as a composition of master and slave D-latches (see the transistor-level scheme of Figure 3.5). Both types of latches are controlled by the opposite phase of the clock signal.



Figure 3.5: Transistor-level scheme of a master-slave D-flip-flop

In more detail, while one latch is closed and holds the data (e.g., clock phase is high), the slave latch is transparent and propagates the captured data of the master. This behavior swaps when the clocks phase is low. It can be clearly seen, that this memory cell is logically separated as a master and slave section.

However, this kind of flip-flop architecture utilizes the concept of two clock phases for operation. It consists of more than 20 transistors and has a moderate propagation delay performance. An alternative flip-flop architecture operating with only one clock phase is the True Single-Phase Clock (TSPC) flip-flop. It does not require an inverted clock phase for operation. This type of dynamic logic has been introduced in the 1980s in [84] and [85] and found widespread use in digital design [86]. It is usually characterized by short propagation delays, a low transistor count, and a single clock-phase utilization, which are advantageous characteristics for shorter propagation delays and lower area occupation. Moreover, due to the improved switching speed it may contribute to lower total energy consumption in more complex systems. In addition, a resulting clock network has lower complexity, since the number of clock transistors is reduced in case of an TSPC design. The circuit scheme of an inverting TSPC flip-flop is shown in Figure 3.6.



Figure 3.6: An inverting TSPC flip-flop after [85]

Without an output inverter, it consists of only three inverting stages in the arrangement after [85]. One of the sensitive nodes for discharge by leakage is node i which can bring the internal state to an uncertain value. Moreover, if the clock signal slope is too low, transistors are not properly in cut-off mode leading to a captured glitch in the third stage. However, transistor leakage is one of the main concerns in dynamic circuits. They may corrupt the stored states if the clock period is excessively low [86]. As a consequence, these effects demand for deep investigation by simulations during the cell design phase.

Nevertheless, as reported in [87], an improvement of at least 12% can be derived for the power consumption of TSPC compared to standard master-slave D-flip-flop. Moreover, the propagation delay is improved by more than 40% and 60% of the area is saved due to the low transistor count. Thus, TSPC flip-flops are attractive alternatives for numerous applications.

### 3.4 Single Event Transient Characterization

When a standard cell library is sufficiently tolerant to TID and SEL, it can be selected for the design of RHBD circuits. Consequently, a characterization of possible SET widths is an essential task and a high-effort prework before starting RHBD circuit design. This Section mainly summarizes the results of the SET characterization of the standard cell library for the selected 130 nm SG13S technology (see Section 2.1). The results have been published in the TCAS-I Journal paper [22], whereas the SET characterization is a contribution of M. Andjelković as part of his thesis [53]. Consequently, this is addressed as related work.

An overview of the estimated transient widths obtained by simulation or measurements has been already given in Section 2.7.7. For SET characterization, the estimation is done through simulation experiments at transistor-level with the use of Cadence Spectre [88]. For that purpose, a bias-dependent current model [89] was used for as an SET model for this type of characterization analysis. The timing constants such as rise time of 10 ps, and a fall time of 100 ps of the current model have been chosen based on the results reported in [90]. The simulations are performed at nominal conditions, i.e., 1.2 V power supply voltage and nominal temperature of 25 °C. Finally, a correlation between the threshold LET estimated from the critical charge  $Q_{crit}$  (see Eq. 2.8), and the extracted SET pulse width is obtained.

Since large inverters are required in order to mitigate SETs in critical nets such as reset, or clock trees, the analyses are done for most available driver strengths. Figure 3.7 shows the SET as a function of LET for different inverter gates.



Figure 3.7: SET characterization results of inverter cells

As can be seen in the figure, a pulse width above 550 ps can be expected for the LET range up to 60 MeV cm<sup>2</sup> mg<sup>-1</sup>. Interestingly, the INVX8 inverter already halves the resulting transients in comparison to the weakest INVX1 cell. On the other side, the benefit of the larger INVX16 inverters is low with respect to the doubled area, power penalties compared to INVX8. Thus, the X8 inverter cell in this technology has enough

critical charge and is therefore a robust candidate for most applications with less area and power penalties.

# 3.5 Single Event Transient Mitigation

Propagating Single Event Transients (SETs) may lead to errors in memory cells when captured within the sensitive timing windows (see Section 2.7.2). Resulting upsets can destroy the current configuration of a system. Thus, SET mitigation is an important topic to be addressed in order to maintain a reliable data processing.

### 3.5.1 General Transient Mitigation

A well-known concept is to fork one signal path into several delayed branches by  $\delta$ , whereas the amount of  $\delta$ -delay matches to the maximum size of a transient the circuit can tolerate. The gate-level scheme of this concept is illustrated in Figure 3.8. The depicted structure is capable to filter a transient, which occurs on the D input or on its internal outputs nodes (here: D0, D1, D2). This is the TR concept applied to a single signal acting as a transient filter. It is part of the minimal temporal sampling latch presented in [74].



Figure 3.8: Transient mitigation: (a) block scheme of transient filter with voter stage, (b) waveform shows properly-filtered transient (green) and propagating transient which exceeds the implemented  $\delta$ -delay of the filter (red)

The function is similar to the aforementioned single data path-related TR (see Section 2.9). When the implemented  $\delta$ -delay is longer than the generated transient pulse, the comparison stage (VOT) suppresses a change on the output node Q. Whereas, the transient is propagating to the output node, if the pulse is longer than the implemented  $\delta$ -delay of the SET filter. Thus, it is important to characterize the size of the  $\delta$ -delay by simulation (e.g., as presented in Section 3.4) or by extraction of irradiation experiment results.

### 3.5.2 Filtering Transient with Guard-Gates/C-elements

Another approach is the use of special 2-input logic gates, which hold the data when both inputs are unequal to each other, or propagate (and capture) the value if both are equal. This function can be realized by C-elements/guard-gates (GGs), also known as Two-input AND-Gate (TAG) [91]. These gates are or not frequently offered in standard cell libraries. However, they are well-known in the asynchronous world. The baseline circuit scheme of such a feedback-less GG inverter (GGI) and its symbol are shown in Figure 3.9 (a). It consists of only four transistors. The same circuit with an additional output inverter, or in combination with an optional feedback of a back-to-back inverter pair forms the non-inverting GG or Guard-Gate Buffer (GGB) (b). The respective truth table of the GG/GGB is shown in Figure 3.9 (c).



Figure 3.9: Guard-gate circuits: (a) guard-gate inverter (GGI), (b) GGI with driver and optional feedback inverter (GG/GGB), (c) truth table of a GG/GGB

A transient filter can be arranged with the use of adding a delay of  $\delta$  at the path of one of GG's inputs as depicted in Figure 3.10 (a). When a transient occurs on the input signal D, it is propagating directly on A of the GG. However, the output remains stable at its previous value since the transient is not present at the second input of the GG at the same time due to the  $\delta$ -delay (see Figure 3.10 (b)).



Figure 3.10: State-of-the-art transient filter architecture with guard-gates: (a) circuit scheme, (b) waveform in error-free and error scenario

Nevertheless, a successful filtering always depends on the realized internal filter delay ( $\delta$ -delay) and the input's transient width. One example for an application is the protection of single signals or as part of a memory cell to improve the robustness against SEU [91]. As reported, the threshold LET is increased for a flip-flop with such an integrated

transient filter arrangement. Similarly, this transient filter concept is also used in storage cells for immunity to soft errors in [92].

### 3.5.3 Transient Filters on Data Paths in TMR Circuits

The aforementioned concepts are also suitable for hardware redundancy circuits such as TMR applied to flip-flops. The critical signals can be equipped with transient filter structures in order to improve the TMR's robustness against D-SETs to prevent resulting upsets. One approach is to add  $\delta$ -delay-elements on data input signals to obtain temporal redundancy (TR) on a single signal (see Section 2.9). By splitting the data paths into three individual branches, the data is separated and shifted in time. The delay elements act as transient filters, since at least two of the branches contain the correct data. A TMR flip-flop with such an arrangement and its waveform is shown in Figure 3.11. With respect to this Thesis, this concept of Figure 3.11 (a) is called DSET-D1D2. It is the same structure used for TMR cells presented in [16]. Therein, the delay chains are arranged with alternating high and low driver strength inverters. Thus, the lowpass filter characteristic leads to a more efficient delay generation. However, delaying a signal is costly and additional active silicon area affects the total power consumption of a design.



Figure 3.11: DSET-D1D2 transient filter concept with inverter cells as delay elements: (a) the gate-level scheme, (b) the waveform in the error-free case (green) and a captured SET with resulting upset (red)

Alternatively, filtering of SETs on the data path can be realized by using basic AND and OR gates in combination with an in-front connected  $\delta$ -delay stage. As an example, an arrangement with an additional multiplexer applied for SET suppression for TMR is proposed in [93]. The advantage of this solution is the reduced number of expensive delay stages to one stage only (see block scheme in Figure 3.12 (a)). This improves the overall performance of a flip-flop.

Another TMR-related circuit concept is the transient mitigation with GGs or C-elements in combination with  $\delta$ -delay element structures as proposed in [94]. Therein, the internal nodes D0, D1, D2 in front of the data inputs of the TMR flip-flops are protected by individual guard-gates to filter the transients. The main benefit of this solution is the reduction of the propagation delay and power consumption. However, from functional



Figure 3.12: Transient filter concepts: (a) transient suppressor with AND-OR-MUX structure [93], (b) concept with guard-gates and one delay element [94]

point of view, it is not required to suppress transients with guard-gates on all three internal D-nodes as illustrated in Figure 3.12 (b).

#### 3.5.4 Transient Mitigation in TMR Circuits with Clock Delays

The alternative concept is to delay the individual clock signals (*delaying-the-clock*) of the TMR memory cells separately (see Section 2.9). One example is the STMR concept with triplicated clock trees and skewed clocks [95]. Thus, a temporal sampling of the data is obtained. The delays can be implemented by decisive delay element insertion on the clock paths, or by direct use of common clock tree insertion technique of standard tools.

As published in [96, 97], the concept is applied to TMR pulse-clocked latch (TPL)-based flip-flops with local pulse clock generators in order to mitigate SEE and to reduce the power overhead of TMR. The memory cells are combined to 16-bit flip-flop macros, and the clock trees are implemented in TMR skewed clock tree manner, i.e., every first flipflop of the TMR is connected to the first clock tree, every second to the second a.s.o. (see Figure 3.13).



Figure 3.13: TMR distributed skewed clocks in multi-bit registers designs after [96]

A pulse-generator (PG) generates local pulses for the TPL flip-flops of the multi-bit register. Moreover, the delays of the clocks are generated by programmable clock generator for adjustable SET hardness. With the approach of using TMR clocks, a minimum overall power saving of 15 % is obtained compared to previous designs [96]. In addition, adding clock-gating feature is also simplified with the use of that approach.

However, this delaying-the-clock approach with temporal clocks is a suitable solution for many applications and allows a robust design against D-SET and C-SET effects in parallel. Nevertheless, clock tree implementation is challenging in timing critical designs. Maintaining the  $\delta$ -delay-spaced required clock insertion delay for the SET robustness on the one hand, and to utilize aggressive skewing techniques such as clock stealing/useful skew in parallel might be difficult to achieve simultaneously.

### 3.6 Hardening by Structural Modification

One straightforward measure to harden a cell or device is to increase its critical charge  $Q_{crit}$  by up-sizing the transistors' channel width. The robustness can be additionally improved by applying special layout methods, such as guard-ring insertion [98, 24, 49], well-contact array arrangements [99], dummy transistors [100], or with the increase of additional cell distance [99]. Moreover, the use of special ELT devices is proposed in [47] and [48] in order to improve the radiation-tolerance. In particular, the probability of most dangerous SEL effect can be reduced with the approaches at design-level with a specific well-contact spacing [101], error correcting codes and an intelligent power line implementation [102] or with the use of latchup protection switches [103].

Nevertheless, all presented concepts base on modifications of the internal cell structure or require complex cells. As a consequence, additional significant design effort has to be considered when standard cell-based design and libraries are addressed.

# 3.7 Radiation-Hard Flip-Flop Architectures

One approach to ensure a fault-free operation of flip-flops and memory cells against SEEs is to harden their internal structures. This Section introduces popular circuit concepts and their subsequent further developments.

### 3.7.1 Dual Interlocked Storage Cell (DICE)

One of the well-known and commercially most used approach of radiation-hard flip-flops is DICE [104]. In a DICE architecture, two conventional cross-coupled inverter latch structures are connected by feedback inverters. Internal nodes store the data as two pairs of complementary values. The control is realized by transmission gates. Each of them is controlled by two independent nodes. The circuit scheme of the DICE storage cell and its internal nodes A, B, C, D is illustrated in Figure 3.14.

This standard DICE approach is a suitable radiation-hardening technique for largerscaled technologies. However, it is less effective for lower-scaled technology nodes as published in [105]. Moreover, the standard DICE flip-flop does not address SEEs induced by charge-sharing effect (see Section 2.7.5). An alternative solution is the Double-



Figure 3.14: The DICE storage cell scheme after [104]

DICE/Dual-DICE architecture proposed in [59] evaluated in a 90 nm technology. Furthermore, an error-aware layout hardening concept for DICE is proposed in [106]. This approach is called Layout Design through Error-Aware Transistor Positioning (LEAP) and addresses also charge-sharing and hardens against the potential resulting MBU effect. According to radiation experiments of a 180 nm CMOS test chip, five times fewer errors compared to the reference DICE, and 2.000 times fewer errors compared to a conventional D-flip-flop have been encountered [106].

### 3.7.2 Quatro Cell

Another hardening approach, known as Quatro bases on Cascode Voltage Switch Logic (CVSL). As reported in [107], this memory cell architecture facilitates a differential read access and has an improved noise margin. Moreover, it is characterized by a better leakage current performance, a less transistor count, and is more robust against soft errors with respect to DICE. The circuit scheme of a Quatro cell without external control signals is shown in Figure 3.15.



Figure 3.15: Transistor scheme of a Quatro cell

As also confirmed by heavy-ion experiments, the Quatro cell shows at least 20 times improvement over standard D-flip-flop and 3.5 times improvement over the DICE-flip-flop at higher LET values [108].

### 3.7.3 Heavy Ion Tolerant Flip-Flop

Finally, a radiation-hardened cell concept which is used in space missions and provides high robustness is the Heavy Ion Tolerant (HIT) flip-flop architecture [109]. The HIT-flip-flop concept is incorporated in Imec's Design Against Radiation Effects (DARE) library [110] and the robustness is obtained by the use of larger transistor dimensions. According to [110], the threshold LET is from  $33.1 \text{ to } 59 \text{ MeV cm}^2 \text{ mg}^{-1}$ .

### 3.7.4 Further Hardened Flip-Flop Compositions

Even though SETs are less frequent and can be masked by the circuit's combination logic, they could result in upsets in latches or flip-flops when the corrupted data is captured by the sensitive clock edge or active phase (see Figure 2.18). As a consequence, additional measures at design-level are required in order to address the SET robustness.

A Delay-Filtered DICE (DF-DICE) concept is proposed in [92]. In this approach, SET filters are integrated in front of each input of the DICE flip-flop. The filter structures are realized by two-input C-elements/GGs, which are interfacing a directly-propagated and delayed input signal to enable the filtering (see Figure 3.10 (a)). The designs had been laid out in a 250 nm technology with a low area overhead of less than 13.5%.

A hybrid flip-flop circuitry with temporal master and slave DICE latches in combination with tunable delay elements is proposed in [111]. This architecture is known as temporal-DICE flip-flop and has fewer transistors than a hardened flip-flop using only temporal latches. Test circuits were implemented in a commercial 130 nm bulk CMOS technology and a threshold LET over  $45 \text{ MeV cm}^2 \text{ mg}^{-1}$  is demonstrated. Similarly, a flip-flop in DICE-based architecture with an SET pulse discriminator is proposed in [112]. As published therein, the flip-flops indicated a high immunity against SEU/SET. Pulses on the clock signal have been completely suppressed by the proposed pulse discriminator for ions with an LET of  $64 \text{ MeV cm}^2 \text{ mg}^{-1}$ . The design was implemented in a 90 nm bulk CMOS technology.

A soft error robust TSPC-Quatro flip-flop is proposed in [87]. It consists of a fast transfer unit realized as TSPC flip-flop. The output is shorted to the nodes of a Quatro storage cell to improve the SEE robustness. According to simulation results, a slight increase in area of only 5% has to be considered with similar power-delay-product in comparison to a standard master slave D-flip-flop. The concept was evaluated in a 90 nm CMOS technology.

An alternative solution is the TSPC-DICE flip-flop proposed in [113]. In this solution, the output of the TSPC flip-flop is connected to a DICE latch. The final output is generated by a 2-input C-element (feedback-less GG) to mask the propagation of transients to the output. Thus, another area and power-efficient radiation-hard flip-flop is obtained. The concept of the TSPC-DICE is evaluated in a commercial 65 nm CMOS technology. The abstract block schemes of both TSPC-based approaches, i.e., the TSPC-Quatro and TSPC-DICE concepts are depicted in Figure 3.16.



Figure 3.16: Block schemes: (a) the TSPC-Quatro, and (b), the TSPC-DICE

Finally, one of the most recent radiation-hardened flip-flop solutions is the Dual-Modular TSPC flip-flop (DM-TSPC FF) proposed in [114]. In detail, the input modules consist of two TSPC pre-charge stages those outputs are connected to a clock-controlled feedbackless GG driving an output stage. According to this work, this DM-TSPC FF has the best performance with respect to power-delay-product and the smallest layout area over similar TSPC-Quatro and TSPC-DICE implementations. The circuits have been realized in a 28 nm CMOS technology.

# 3.8 Radiation-Hardening-by-Design (RHBD) Flip-Flops

The solutions presented in the previous section require deeper knowledge of circuit design in order to obtain radiation-hardened memory cells such as flip-flops. An alternative approach is the radiation-hardening of standard cells with the use of special circuit-, layout-, or design-level techniques also known as Radiation-Hardening-by-Design (RHBD) [15]. In this Section, the state-of-the-art circuit solutions of flip-flops and the concepts of RHBD TMR-based flip-flops are presented.

### 3.8.1 Built-in Soft Error Resilient (BISER)

The first example for a radiation-hardened flip-flop designed with unhardened components is the Built-In Soft Error Resilience (BISER) flip-flop [115]. This architecture is made of two standard flip-flops or master slave latches (see Figure 3.17). Their outputs are connected to a C-element/GG.



Figure 3.17: BISER in a latch-based flip-flop design

The input signal of the first flip-flop can be additionally delayed in order to mitigate potential SETs. With the GG connected to two inputs, it is ensured the flip-flops' output remains stable, if a soft-errors occurs in one flip-flop. This architecture implements a kind of Dual/Double Modular Redundancy (DMR) which compares two input stages and processes the output signal. An implementation in a 65 nm technology is published in [116].

#### 3.8.2 Dual/Double Modular Redundancy Extensions

An alternative approach is to modify the baseline DMR concept with an additional feedback loop from output voter in order to enable a self-voting mechanism as presented in [117]. Even though this is a slight increase in area occupation, the transient mitigation is enabled. Nevertheless, a correct and safe implementation of unregistered, e.g., combinational feedback loops for all specified delay corners (e.g., min, typ, and max) is quite challenging and may introduce extra overhead. Moreover, a Guarded Dual Modular Redundancy (GDMR) technique to mitigate SET in combinational logic with less impact on power, area and delay compared to TMR is proposed in [118].

However, the baseline DMR is able to detect an error, but unable to correct without circuit modifications. This can be achieved with the use of an additional second replica circuit of the cell to be protected and one or more decision units. The resulting concept finally leads to TMR-based circuits.

#### 3.8.3 Full-Triple Modular Redundancy (FTMR) Flip-Flop

TMR flip-flops are capable to mask one internal single error. Nonetheless, the basic TMR flip-flop is not robust against SETs as discussed in Section 2.8.1. When a wide-enough transient occurs on the clock path, i.e., C-SET, all three flip-flops capture the current faulty intermediate data leading to an invalid majority-voted value at the output. On the other side, if a transient is generated on the data path, i.e., D-SET, close to the sensitive clock edge, all flip-flops wrongly capture the transient as implicit valid data. One countermeasure to address this issue is full-TMR (FTMR), in which three voter modules are used together with a data path logic triplication. The depicted concept of Figure 3.18 is robust against D-SET. However, in order to reduce C-SET impact on TMR, it is advantageous to manage the clock paths individually (see Section 3.5.4).



Figure 3.18: Scheme of FTMR with shared clock network

One of the main drawbacks for FTMR is the increase of complexity of the design. First, the penalty in realistic applications in area occupation and power consumption by triplication of the combinational logic in front of the triplicated flip-flops has to be considered. Second, there is also an impact on DfT complexity, such as scan chain insertion, with access to the internal memory cells (see Section 2.4.1). Another aspect is the challenge on the design flow itself to decouple the redundancy of FTMR from the original design description, e.g., the HDL design. Even though the flip-flops are implemented as FTMR standard cell gates, a custom netlist parser can accomplish the essential design transformation to obtain an FTMR design [16]. However, this is additional design effort on top of the general aforementioned penalties of FTMR.

#### 3.8.4 Robust Triple Modular Redundancy Flip-Flop

An alternative to complex FTMR is to describe single-voter radiation-hardened TMR flip-flops as stand-alone standard cell gates [16]. Thus, there is no need to replicate any logic and the DfT support is easier to design. These flip-flops contain one voter and are equipped with integrated delay-element-based D-SET transient filter ( $\rightarrow \Delta$ ). Consequently, these cells are called  $\Delta$ TMR flip-flops. As shown in the circuit scheme of Figure 3.19, they have the same pin interface as a standard flip-flop and are therefore fully compliant to the standard digital design flow. No additional design task is required in comparison to the FTMR approach. The concept of  $\Delta$ TMR (applied to flip-flops) was firstly evaluated with the use of the 250 nm technology of IHP (see Section 2.1).



Figure 3.19: Circuit scheme of a  $\Delta$ TMR flip-flop with transient filter elements on the data path and a single majority voter (MAJ)

However, the integrated transient mitigation scheme of  $\Delta$ TMR flip-flops reduces the speed performance of the entire memory cell. As illustrated in the figure, three delay elements with a width of  $\delta$  form the transient filter in this arrangement. The input-to-register delay is dominated by the longest chain of two elements to the D input of the third flip-flop. As a consequence, the data must be present and stable more than 2 times  $\delta$  earlier before the sensitive clock edge arrives. This arrangement determines an increase of the setup time in these cells. On the other side, the shortest path is from D to the first flip-flop. This dominates the amount of hold time of  $\Delta$ TMR flip-flop (see Section 2.2.3 for more details). Moreover, the delay elements of these flip-flops are realized with inverter chains contributing to an increase of the cells' power consumption.

Similarly as with the FTMR, enabling the DfT with the use of scan-test is hardly addressable with this baseline  $\Delta$ TMR solution. If one considers a TMR cell as a single flip-flop, also if remapped to scannable counterparts as illustrated in Figure 3.20, the internal structures (memory cells, flip-flops) are not accessible during ATPG. Thus, any internal fault inside a TMR flip-flop is masked by the voter. However, a reconfigurable TMR flip-flop with scan function is proposed in [119]. In this solution, the replica circuits can be switched-off when not required in order to save power. With respect to  $\Delta$ TMR, alternative circuit concepts are required.



Figure 3.20: Concept for scan-test after [16]

Another aspect is the mitigation of charge-sharing effect by component spacing and internal power ground stripe connections inside the  $\Delta$ TMR cell. Even though 250 nm is a larger-scaled technology node, the additional resources for an, e.g., MNCC-aware component placement is not negligible. When lower-scaled technologies are targeted, this challenge is getting more important due to smaller structure sizes. The concept has also been applied to flip-flop designs in IHP's 130 nm technology. Figure 3.21 shows the layout frame of such a  $\Delta$ TMR flip-flop. As can be seen in this illustration, the area of the inverter-based transient filter ( $\delta$ -delay regions) occupies more than 50% of the overall cell area. Moreover, cells are equipped with pre-drawn vertical power and ground stripes in order to improve the reliability. Their connection is implicitly done during a legal placement of the cells into the standard cell rows. However, this affects the placements' location and routing result in later designs due to the internal occupied metallization of the  $\Delta$ TMR cell.



Figure 3.21: Cell frame of a  $\Delta$ TMR flip-flop in 130 nm technology

Several  $\Delta$ TMR cells with different flip-flop spacings varying of, e.g., 35 and 45 µm and transient filter sizes with  $\delta = 0.5$  and 1 ns had been implemented and arranged in separate test vehicles for heavy-ion irradiation test campaigns. According to the radiation test reports of these structures, the highest obtained threshold LET is 32 MeV cm<sup>2</sup> mg<sup>-1</sup>. However, the waste of active silicon area by internal spacing, the occupied routing layers for internal signaling, and the power consumption due to transient filter complexity limits the suitable usage of these solutions.

Nevertheless, the work of [16] and the general concept of  $\Delta$ TMR flip-flops have been the main inspiration for the novel proposed circuit concepts discussed in this Thesis, leading to improved, applicable DfT-compatible RHBD TMR flip-flops with a lower overhead, available in several configurations depending on the target application.

#### 3.8.5 Self-Correction Function

As known from the literature and as discussed in Section 2.4.2, clock-gating is an efficient technique to save power of a design. However, memory cells such as flip-flops refresh their data with the arriving clock edge. When the clock is switched-off (i.e., clock is gated), the actual data is held and any upset is not overwritten afterwards. Thus, the errors would accumulate over time under radiation. One technique to address this issue, is the use of *self-correction* mechanism inside the memory cells. The use of an extra latch of redundancy integrated in the master or slave stage to improve immunity to SEUs is proposed in [120]. The error is detected by local comparison and the correction is applied by control of multiplexers inside the general flip-flop.

An alternative concept for self-correcting circuits for RHBD logic is proposed in [121]. In this work, the state of the TMR is voted in the state element feedback path. The voting is realized with the use of complete C-elements/GG-based voter in each flip-flop. The clock-gating and scan function is supported. The concept is applied to a pipeline design and implemented in a 90 nm CMOS technology. According to SEE test results, an SEE hardness to over  $100 \text{ MeV cm}^2 \text{ mg}^{-1}$  is obtained.

Another development for TMR flip-flops is the use of distributed C-element/GG-based voters inside the multi-bit register cells [122]. Two voter architectures are proposed, one for the redundant, and another one for non-redundant mode. In this approach, the clock signals are also separated for each group of the TMR. The circuit concept of one flip-flop of the TMR arrangement is shown in Figure 3.22.



Figure 3.22: Self-correcting flip-flop without mode control after [122]

The signals for self-correction are SA, SB and SC, respectively. As can be seen, the slave latch contains a GG connected to a transmission gate (dashed box). In this case, if both signals SC and SB are equal while slave latch is activated, the equal signal is propagated.

In the other (error) case, the self-correction is started in the other TMR stage leading to equal SC and SB signals. Thus, the feedback signals of the slave latches are always generated (and corrected) by the two other output signals of the TMR arrangement.

#### 3.8.6 Voter Architectures for TMR

The voter is an essential element or module in TMR-based circuits. When applied to logic, it is capable to mask an internal one-bit error, such as transient (SET) or bit-flip/upset (SEU). The most popular majority voter architecture is the 2-input AND, 3-input OR-gate-based solution. It can be easily derived from its logical function

$$Y_{\text{MAJ}}(A, B, C) = AB \lor BC \lor AC.$$
(3.3)

The resulting circuit is illustrated in Figure 3.23 (a). It is a direct mapping of Eq. 3.3 to the gate-level realized with standard cell components.



Figure 3.23: Standard cell-based voter solutions: (a) 3 x AND2 and OR3 composition, (b) a single OA222 gate, (c) XOR-multiplexer-based voter of [73]

However, this baseline solution has a moderate power, area, and delay performance. Many other standard cell-based configurations can be realized with different performance characteristics. According to the extracted results for a 32/28 nm CMOS process in work [123], one of the most suitable standard cell-based voter solutions is the OA222 gate implementing the circuit of Figure 3.23 (b). A lower overhead architecture with multiplexer and an XOR gate is proposed in [73] (Figure 3.23 (c)). The XOR compares two of the three inputs nodes and controls the multiplexers' selector in order to choose the third node in case of  $A \neq B$ .

Another voter solution is the use of in-parallel connected GGs, or C-elements (see Section 3.5.2) as also used in [122]. The respective voter circuit arrangement with GGs is illustrated in Figure 3.24.

As shown therein, the arrangement is made of two stacked NMOS and PMOS transistors in each stage. The function is as follows: in the fault-free case, i.e., if the inputs A, B, and C are equal to each other, all three GG inverters drive the same signal at node QN. In contrast to that, if one input differs (faulty mode), two GGI-stages are in offstate and QN is driven by one single Guard-Gate Inverter (GGI) only. Moreover, the guard-gate-based voter is characterized by a low transistor count of only 14 transistors.



Figure 3.24: Guard-gate inverter-based (GGI-based) voter architecture with driver

In comparison to the baseline AND-OR variant of Figure 3.23 (a) in which every logic gate consists of at least six devices, the total number of 24 to 14 is a big improvement. Nevertheless, these special GGs or C-elements are hardly available as single gates in common standard cell libraries.

### 3.9 Addressed Open Issues in this Thesis

The main objective of this work is to obtain standard cell-based reliable and robust hardware systems with the use of standard technologies. As a consequence, design concepts and compliant methodologies for the digital design of such systems have to be developed. In this Thesis, two hardware systems are considered: the reliable differential logic design and the radiation-hardened circuit design, both of which using of a standard cell-based design approach.

#### 3.9.1 Addressed Issues for Differential Logic Design

This Thesis addresses the following issues with respect to differential logic design:

- The existing approaches for standard cell-based differential logic design are focusing on MOS-based applications. The generation of logic gates as discussed in Section 3.2 is suitable for generic cell layouts. Referring to this work, the selected standard technology offers more attractive high performance HBT devices with larger, more complex cell frames. Thus, the concept for differential MOS-based logic is not applicable for bipolar differential logic design. Consequently, modifications to the design flow, especially for the place-and-route phase, are required for this type of CML-based circuits.
- 2. Standard cell and library design is a high effort task. Several views of each cell have to be generated and provided by the library in order to enable the design generation of standard cell designs. To reduce the overall designs costs, a *modular standard cell library concept* to derive alternative or additional standard cell configurations is proposed. Moreover, the impact on area and power of a design has to be considered.
- 3. One of the most important views are the timing and power models of standard cells. They are essential to accurately estimate the designs' characteristic in terms

of performance during the design phases. Thus, this *characterization* (CZ) of differential logic gates is very important. It is beneficial to provide flows independent of the support of differential signaling by the characterization tool.

4. Since differential signaling in target technologies with few available routing layers increases the routing complexity, finding suitable routing solutions for differential digital designs is challenging. Hence, there is a strong need to offer a strategy to enable RC-matched/in-parallel-routing of such a differential standard cell-based design for technologies with limited routing resources.

#### 3.9.2 Addressed Issues for Radiation-Hardening-by-Design Circuits

RHBD circuits require robust memory cells, such as latches and flip-flops. TMR applied to flip-flops is a popular hardware redundancy approach to correct a one-bit error. Furthermore, the concept of  $\Delta$ TMR is a suitable solution to obtain a robust flip-flop design. It is capable to sustain the robustness against SEEs. However, the following issues have not been addressed in recent work and are therefore the subject of this Thesis.

- 1. RHBD often implies additional overhead leading to penalties in terms of area, power, and delay overhead. In particular,  $\Delta TMR$  flip-flops are expensive gates, due to the occupied resources required for the robustness. Thus, it is necessary to divide the complex architecture of  $\Delta TMR$  cells into *logical sections* and optimize each of them individually. As a result, a construction-kit-like  $\Delta TMR$  flip-flop cell set could be provided and offer distinctive configurations for different application scenarios. The cell concepts should address the penalties in terms of area, power, delay, and the robustness against SEEs.
- 2. The masking effect of the voter inside a  $\Delta$ TMR cell also affects its characterization result. The internal structure my hide real setup or hold violations during standard cell *characterization* leading to inaccurate timing behavior at any design stage. Thus, the changes on the CZ flow have to be discussed.
- 3. The use of RHBD  $\Delta$ TMR flip-flops does not intentionally result in a robust VLSI design. Additional cells, such as transient filter, robust driver cells are required. Consequently, the challenges at design generation stages of the standard digital design flow to obtain RHBD applications with these cells are also covered in this work by the proposed *design methodology* for RHBD circuits.
- 4. The recent  $\Delta$ TMR flip-flop solutions just support the memorizing function of a one-bit value. Moreover, the DfT in terms of scan-test is not fully supported. For complex VLSI designs, structural tests such as *scan-tests* are mandatory to discover faults. Hence, a concept for  $\Delta$ TMR flip-flops, which allow to evaluate the TMR-internal flips-flops by scan-tests is highly preferable. However, the aspect on the robustness to radiation effects is still obligatory.

- 5. Another aspect is the support of low-power design features such as *clock-gating* to improve the overall power consumption of a design. The  $\Delta$ TMR concept for clock-gates is not evaluated in recent work. Thus, it is addressed in this Thesis.
- 6. When the clock signal is switched-off (clock-gated), the memory cells are not able to refresh their data. Even though TMR is used, the induced upsets by radiation accumulate over time. As presented in Section 3.8.5, approaches addressing this issue are integrated *self-correction* functionality inside the cell. However, the presented works do not consider  $\Delta$ TMR concept which use a common clock signal. Moreover, some approaches require deeper understanding in circuit design. Accordingly, concepts for  $\Delta$ TMR flip-flops are essential which are easily applicable. They would contribute to an improved power performance of complex digital designs with the use *self-correcting*  $\Delta$ TMR memory cells.
- 7. Finally, the recent  $\Delta$ TMR cells obtained a robustness up to a threshold LET of  $32 \,\mathrm{MeV}\,\mathrm{cm}^2\,\mathrm{mg}^{-1}$ . More robust flip-flops are advantageous for applications with harder requirements. Hence, new circuit schemes and  $\Delta$ TMR *layout concepts* with stricter MNCC-aware internal component spacing are highly required to mitigate SEEs and the charge-sharing effect.

All proposed concepts and methodologies must be compliant to the general standard digital design flow. This Thesis finally draws an entire picture from standard cell library development, over hardware design description to real implementations with the applied methodologies. Consequently, the outcome of this work can also be seen as a chain from CZ, over RTL to GDS (CZ-RTL2GDS) to obtain *Reliable and Robust Hardware Systems*.

"Krawehl, krawehl! Taubtrüber Ginst am Musenhain trübtauber Hain am Musenginst Krawehl, krawehl!" — Loriot als Dichter Lothar Frohwein

Chapter 4

# **Concepts and Methodology for Reliable Standard Cell-based Differential Logic Design**

The design of reliable differential logic and its respective signaling are mainly obtained with the use of the analog design approach. When differential standard cell-based designs are targeted, additional effort is required at cell and library architectural level to offer design flow-compliant views (e.g., timing/power models, layout view) for standard digital design tools. This Chapter presents exactly these concepts and proposes a methodology to generate reliable differential standard cell-based hardware systems.

## 4.1 Introduction

Thanks to higher integration of smaller-sized MOS device structures and the increase in performance with low power consumption, CMOS logic style and its conventional railto-rail signaling is widely used for the digital design of complex VLSIs. The CAD tools for the associated standard digital design flow are optimized for this kind of logic family with its characteristics and respective signaling, i.e., ground to power supply voltage swing. They allow to develop highly-integrated standard cell-based SoCs. However, these paradigms fully change when differential logic standard cell design is addressed.

First, the pin interface of differential cells and their logical differential function are not comprehensible to most design tools. The cells have to be transformed to compliant counterparts, described and modeled for their use. Consequently, an increase of modeling and development costs has to be considered for differential logic standard cell libraries.

Second, the selected differential logic gates in this work are realized with differential amplifiers made of bipolar transistor devices. They are supplied by a constant current

source in the range of several hundreds of microamps up to more than one milliamp per logic gate. Both, the HBT devices and the constant current source have a more serious impact on the area occupation and power consumption in real applications in comparison to standard CMOS logic design. Therefore, an offered cell set of a differential logic gate library must be chosen decisively to cover most of the target applications efficiently.

Third, the standard design flow has to be modified and extended for differential logic design in order to enable the design generation with standard design tools at different design stages. In particular, the logical mapping from an intermediate design with pseudo logic gates to real differential counterparts with an appropriate signaling is essential.

Fourth, the layout frames of the offered HBTs in the selected standard technology already use two of the three offered thin routing metal layers. As a consequence, a solution has to be provided to enable the routing of differential logic designs with few routing layers using PnR tools for digital design. Moreover, the differential signal pairs should be routed in parallel in order to achieve similar parasitic effects.

This work provides the following approaches to enable a standard cell-based design of differential logic bipolar applications:

- 1. A concept for *modular standard cell libraries* is proposed to reduce the development time and costs. The differential gates are divided into *Logical Sections* to easily derive further cell configurations in a shorter time.
- 2. The challenge with respect to power consumption is addressed with the proposed cell concept. It is compliant to an already existing bias approach, which allows to set different bias currents. The proposal provides several cell configurations in a library, classified into Speed-Classes which represent the cell power-delay performance.
- 3. The compliance to the standard digital design flow is enabled by offering modeling flows for cell characterization, *Single-ended Pseudo-Gates (SEPGs)* at logical level, *fat-wire* views for layout generation and appropriate counterparts for final implementation. Thus, the flow for differential logic standard cell-based design is split into two domains and extended with a *Design Conversion* stage.
- 4. An intermediate routing action is proposed in a first routing phase. The limitation of few available routing layers is solved by the introduced *virtual fat-wire boundary pins* in the fat-wire (FW) cell layouts. The final pin-access is realized during a second routing phase of the differential counterpart design.
- 5. A design methodology for differential logic design with the use of the proposed concept is presented and evaluated on a relevant example circuit.



The contribution of this Thesis is illustrated in Figure 4.1.

Figure 4.1: Overview of the contribution to obtain reliable differential logic designs

## 4.2 Standard Cell Concept

This Section introduces the standard cell concept for the differential logic design library. It starts with a specification, the supported bias concept, followed by the circuit-level architectures. The proposed cell set addresses power consumption and challenges in area occupation. The logic gates base on bipolar CML. Their concepts are experimentally investigated and evaluated on IHP's standard 250 nm BiCMOS technology with HBT devices with a performance of  $f_T$ ,  $f_{max}$  of 75 and 95 GHz (see Section 2.1).

#### 4.2.1 Specification

Table 4.1 gives the overview of the library specification. A nominal power supply voltage for the bipolar circuits of VCC = 3.3V is set. This enables the design of more complex gates with stacked differential amplifiers, whereas 2.5 V is defined for conventional MOS devices in this technology. For reliability and power aspects, a nominal bias current  $I_b$ of 75 µA is specified for the logic gates to obtain a large-enough voltage swing  $(V_{SW})$ with an acceptable power consumption.

| Parameter               | Value      | Remark                                                  |
|-------------------------|------------|---------------------------------------------------------|
| VCC [V]                 | 3.3        | CML/ECL power supply                                    |
| VDD [V]                 | 2.5        | CMOS power supply                                       |
| $V_{pp} [mV]$           | 150.0      | Min. peak-to-peak voltage                               |
| $\mathrm{I_{b}}[\mu A]$ | 75         | Min. bias current                                       |
| Pitch                   | compatible | $\mathrm{CMOS}\leftrightarrow\mathrm{CML}/\mathrm{ECL}$ |

Table 4.1: CML/ECL library specification

The standard cell layouts must have compliant routing pitch definitions given by the respective CMOS standard cell library. Thus, the design integration in later design stages or the assembly of mixed CMOS/CML hierarchical standard cell designs is supported.

Finally, a modular library concept is proposed with a minimum cell set for most target applications. This reduces the design costs and enables an efficient derivation of new cell configurations.

#### 4.2.2 Supported Bias Concept

It is a natural fact that the delay performance of standard semiconductor devices degrades with higher ambient temperatures. Furthermore, as known from the literature, noise margin (NM) is one measure for reliable signaling. With regard to differential CML gates, NM is mainly determined by the voltage swing  $(V_{SW})$  of a differential signal (see Eq. 2.4 on page 25). This voltage swing is set by the respective peak-to-peak voltage  $(V_{pp})$ , defined by the selected resistance of the load resistors  $(R_L)$  and the local bias current  $(I_{SS})$  of a differential stage. Consequently, in order to be adaptable to changes of the operating conditions or to compensate parasitic effects, it is necessary to adjust the resulting nominal NM, and therefore, to enhance the reliability. This can be done by controlling the global bias currents  $(I_b)$  for the differential logic. For this purpose, an existing programmable Zero Temperature Coefficient (Current) (ZTC) block is selected. It provides several constant bias currents  $I_b$  in order to supply different, isolated or distributed CML modules in parallel. The current  $I_b$  can be programmed with a three-bit encoded single-ended CMOS-compatible control input bus (A). The ZTC block is designed so that it can be easily extended by abutting cascaded components that provide additional  $I_b$  currents for further CML/ECL modules.

Table 4.2 gives an overview of the four different configurations of the baseline  $I_b$  bias currents. It shows the respective peak-to-peak voltage  $V_{pp}$ , the resulting voltage swing and the obtained noise margin (NM).

| A[2:0]      | $I_{b}[\mu A]$ | $V_{pp} \ [mV]$ | $V_{SW}$ [mV] | NM [mV] |
|-------------|----------------|-----------------|---------------|---------|
| $\{0,0,0\}$ | 75.0           | 150.0           | 300           | 92      |
| $\{0,0,1\}$ | 82.5           | 165.0           | 330           | 104     |
| $\{0,1,0\}$ | 100.0          | 200.0           | 400           | 133     |
| $\{1,0,0\}$ | 125.0          | 250.0           | 500           | 177     |

Table 4.2: Bias current configurations and resulting voltage swing and NM

Figure 4.2 shows the ZTC block in application. An  $I_b$  output current is provided at every B output. It has to be locally scaled to a natural multiple, i.e.,  $I_{SS}$ , in every differential logic gate (DLG) in order to achieve the specified  $V_{pp}$  depending on the implemented resistance (see Eq. 2.2 in Section 2.6.1). For that purpose, a left branch of a current mirror is selected (i.e., vbmrrl cell) with a 1-times-scaled NMOS transistor interfacing an ZTC-output. The resulting signal (*vbias*) can be used to bias several types of differential logic gates.

Every differential cell consists of a scaled-NMOS transistor counterpart. It implements the individual right branch of a current mirror and is located in a dedicated logical



Figure 4.2: Bias concept for two domains of differential logic

section for that purpose. As a result, an n-multiple current is locally generated inside the logic cell with the use of the exact n-times-scaled width of the NMOS-transistor of the vbmrrl cell. The cell-specific  $I_{SS}$  current is therefore given by the ratio between the left and right branch transistors. The ZTC block supplies two different CML/ECL modules in the arrangement shown Figure 4.2. Each of them is equipped with the left branch vbmrrl interface cell. The baseline bias current  $I_b$  is configured by the CMOS input bus A={0,1,0} of the ZTC block and set to 82.5 µA. This results in a peak-to-peak voltage of 165.0 mV and a larger NM of 104 mV, which is above the minimum accepted value of 100 mV for NM (see Section 2.6.1).

Another aspect is the reduction of the power consumption of differential circuits. By appropriate configuration of input bus A, the global bias current can be decreased. In example, a change from  $A=\{1,0,0\}$  to  $A=\{0,0,0\}$  saves already more than 35% of the total power. Moreover, the concept can be easily extended with additional MOS transistors, which tie the *vbias* to the ground potential to fully bring a CML module into a power-saving state.

#### 4.2.3 Modular Standard Cells – Logical Sections

A modular standard cell concept is proposed for the differential logic gates in order to reduce the NRE costs for standard cell library design. Thanks to this approach, new cell configurations with, e.g., lower area occupation, more suitable speed and power performance can be rapidly derived. Thus, the cell scheme of the DLGs is divided into three *Logical Sections*:

- 1. The *Rload-Section*, which consists of the load resistors.
- 2. The *npn-Section* with its npn-network/HBT transistor arrangement.
- 3. The Bias-Section, in which the cell-related bias current is generated.

The scheme applied to a differential CML latch with asynchronous reset function is shown in Figure 4.3. As can be seen therein, the npn-Section interfaces the nodes



Figure 4.3: Logical sections of a differential CML latch

connected to the Rload-Section and the drain connection of the NMOS current source of the Bias-Section. The logical function is realized inside the gate by the HBT network in the npn-Section. Moreover, this section is internally divided into base-emitter-voltage-level-dependent subsections ( $V_{BE}$  L-levels) as indicated by the dashed lines. The number of different voltage levels is determined by the number of stacked bipolar devices of a cell. Each  $V_{BE}$  L domain has the same swing but a reduced offset of 0.8 V due to the base-emitter voltage-drop (see Section 2.6).

However, as known from the literature, the HBT performance depends on the applied collector current. Thus, several logic gate variants with different performances can be obtained by exchanging *Rload/Bias-Section* pairs. The pair-wise selection maintains parameters such as  $V_{pp}$ . These section-pairs can be applied to the same npn-network of npn-Section as long as the adjusted collector current does not impact the circuit scheme.

Similarly as for the circuit scheme level, this modular aspect is also advantageous for the layout design. The bipolar transistor devices in this technology have a dedicated scalable layout frame. Their terminals such as base, emitter and collector have fixed positions in a rectilinear device frame surrounded by an individual guard-ring. As a consequence, the npn-Section of the cell and its routing at layout level can be reused for further CML configurations. The modular pair-wise selection of *Rload-Section* and *Bias-Section* can also be applied at the layout level. Nonetheless, the standard cell frame size of a DLG is mainly determined by the dimension of *npn-Section* realization. Thus, the required area is nearly constant independent of the cell configuration of one logic type, e.g., latch.

However, new cell configurations can be easily derived in short design time with the use of this modular standard cell concept. The function of a static differential logic gate is given by the logical connectivity of the npn-network of npn-Section, whereas the performance is mainly adjusted by the size of the NMOS transistor providing the required current ( $I_{SS}$ ). The concept is illustrated in Figure 4.4. As illustrated in this example, four different configurations can be derived from one npn-Section with one

given functionality. The resulting cells consist of *Rload/Bias-Section* pairs. They differ in speed and power performance but have similar area occupation due to the npn-Section frame.



Figure 4.4: Derivable configurations with one npn-Section A

This is in absolute contrast to the design of CMOS standard cell gates, in which individual transistor widths are scaled to meet performance targets.

#### 4.2.4 Cell Configurations – Speed-Classes

One of the main characteristics of static differential logic is its static power consumption linear-dependent on the bias current and the connected power supply. Moreover, the provided current controls the speed performance of the HBT. As a consequence, cells with different delay/power configurations have also to be part of a CML/ECL standard cell library. Thus, the challenges on the power consumption can be addressed during design generation with standard design tools.

As mentioned in the previous section, the modularity of the cells allows to design new cells in a more comfortable way. Resulting cells by different Rload/Bias-Section pair combinations are classified in *Speed-Classes*, which is a known term for this kind of grouping in other ECL libraries (see Section 3.2.2). The classes are distinguished by the resistance added as suffix in the cells' name.

A new configuration is obtained with a scaled bias current  $I_{SS}$  realized by a number of identical (multiplied) NMOS transistors (m) in the Bias-Section. The resistors  $R_L$  of the Rload-Section are sized to maintain the minimum nominal  $V_{pp}$ . The changes in the current directly affect the speed performance of a differential gate. Moreover, the sizing of the NMOS-transistor and the load resistors is chosen such that the scaling factors are applicable at the layout-level.

As an initial start, three speed-classes are proposed to offer alternatives for power consumption and speed performance during design optimization. One class is offered for the low-speed, another one for middle-speed, and a third one for high-speed design parts. Table 4.3 shows the extracted results from parametric transistor-level simulations of a CML latch implemented in the 250 nm technology. The bandwidth (BW) is extracted for a fan-out of one connected to the cell under test.

As can be seen therein, the CML-latch in the fastest speed-class has a bandwidth of 12.90 GHz and requires a  $100 \Omega$  load resistor pair and a 20 times  $I_b$  current to final

| Parameter            | Speed-Class $(R_L[\Omega])$ |      |       |  |  |
|----------------------|-----------------------------|------|-------|--|--|
|                      | 100                         | 500  | 2000  |  |  |
| I <sub>SS</sub> [mA] | 1.5                         | 0.3  | 0.075 |  |  |
| m (factor)           | 20                          | 4    | 1     |  |  |
| P [mW] (at 3.3V)     | 4.95                        | 0.99 | 0.25  |  |  |
| BW -3 dB [GHz]       | 12.90                       | 5.67 | 1.65  |  |  |

Table 4.3: Characteristics of initial speed-classes of a 3.3 V-compatible CML latch

1.5 mA. Even though a 50  $\Omega$  configuration (not shown in the table) would have a marginal increase in bandwidth above 13.1 GHz, twice the current would be required to maintain  $V_{pp}$  of 150 mV. Thus, the 50  $\Omega$  speed-class is less attractive for larger designs.

For middle-speed applications, the 500  $\Omega$ -class is proposed. Although it has less than one half of the bandwidth in comparison to the fastest 100  $\Omega$ -class, it consumes only close to one fifth of the power. The variant with lowest, default bias current is the 2000  $\Omega$  solution. It still provides a bandwidth of 1.65 GHz with only a quarter of mW, and is therefore, suitable for the usage in timing-uncritical paths.

A minimum library with at least three speed-classes is a good tradeoff between library design costs and the ability for the use in high-speed, middle-speed, and low-speed applications<sup>1</sup>. The 500  $\Omega$  and the 2000  $\Omega$ -variant can be selected to improve the power consumption of a differential design. The cells of the fastest class are more suitable for higher-speed circuit parts. Nevertheless, the more complex the application, the more paths have to be optimized. Consequently, it may still be challenging to achieve sufficient design solutions with the use of these three initial classes.

#### 4.2.5 Speed-Class Extension

Good quality of results of design optimization can be only obtained with a rich cell set offered in a library. The three initial speed-classes for logic gates offer at least alternatives for design generation. Nonetheless, the performance gap between the classes is too wide for an efficient standard cell-based design solution of more complex circuits. As a consequence, a finer-grained scaling of speed-classes is preferable. Certainly, this increases the design costs for library development, but this bears no relation to the achievable results in terms of efficiency.

The fastest  $100 \Omega$  class requires 1.5 mA to maintain a  $V_{pp}$  of 150 mV. This current exceeds the optimal operating point for maximum device speed performance of the smallest possibly-sized HBT in this technology. Thus, the next larger-scaled HBT with its respective larger-sized layout frame is selected. As a result, the occupied area is significantly increased, since nearly all transistors of the network in npn-Section are affected. Consequently, other additional speed-classes are proposed as a library extension with the use

<sup>&</sup>lt;sup>1</sup>The author would like to thank Silicon Radar GmbH (https://siliconradar.com/) for the implementation of the first version of the CML/ECL library.

of the smallest possible HBT device. They fill the gaps between the initial classes. The speed-class extensions and the extracted bandwidth results are listed Table 4.4.

| Parameter            | ${\rm Speed-Class}\;(R_L[\Omega])$ |       |       |      |      |      |       |
|----------------------|------------------------------------|-------|-------|------|------|------|-------|
| 1 arameter           | 100                                | 125   | 200   | 250  | 500  | 1000 | 2000  |
| I <sub>SS</sub> [mA] | 1.5                                | 1.20  | 0.75  | 0.6  | 0.3  | 0.15 | 0.075 |
| m (factor)           | 20                                 | 16    | 10    | 8    | 4    | 2    | 1     |
| P [mW] (at 3.3V)     | 4.95                               | 3.96  | 2.48  | 1.98 | 0.99 | 0.5  | 0.25  |
| BW -3 dB [GHz]       | 12.90                              | 12.50 | 10.67 | 9.57 | 5.83 | 3.12 | 1.66  |

Table 4.4: Speed-class extensions of a 3.3 V CML latch

The 6 GHz bandwidth gap between the  $125 \Omega$  and the  $500 \Omega$ -variant is bridged by two additional classes. Both 200- $\Omega$ -configurations are scaled-down by half of the one-hundred variants. As a consequence, they require half of the bias currents but still provide a moderate bandwidth of 10.67 GHz and 9.57 GHz. Similarly, a 1000  $\Omega$ -configuration is proposed to fill the performance gap between the 500  $\Omega$  and the slowest 2000  $\Omega$ -class with its lowest power consumption. As already mentioned in Section 4.2.3, the area does not scale linearly with the speed-class. The reason is the layout dimension of the npn-Section, which mainly determines the cells' width, whereas the Rload- and Bias-Section have minor impact in most cases. Thus, the area is not listed in the table for comparison.

However, this speed-class extension provides several additional candidates in a library, which can be selected for path and design optimization to obtain better design performance. To demonstrate the benefit of this proposal, logic synthesis runs have been performed on a subset of ISCAS89 benchmark circuits [70] offered in IWLS 2005 Benchmarks bundle [124]. All runs used the same constraints and were only tuned by suitable cell and speed-class selections. The parameters for area and power for a mapping to the ECL\_I library (three initial speed-classes) and an ECL\_E library with speed-class extensions (see Table 4.4) are extracted. The results are listed in Table 4.5.

| ISCAS89 | ECL_I    |        | ECL_E    |        | Savings         |       |  |
|---------|----------|--------|----------|--------|-----------------|-------|--|
|         |          |        |          |        | ECL_I vs. ECL_E |       |  |
| Circuit | Area     | Power  | Area     | Power  | Area            | Power |  |
|         | $[mm^2]$ | [mW]   | $[mm^2]$ | [mW]   | [%]             | [%]   |  |
| s344    | 0.39     | 271.06 | 0.37     | 224.99 | 5.12            | 17.34 |  |
| s349    | 0.38     | 265.61 | 0.34     | 188.34 | 10.53           | 29.09 |  |
| s382    | 0.44     | 282.31 | 0.44     | 273.17 | 0.00            | 3.24  |  |
| s400    | 0.44     | 287.30 | 0.40     | 217.73 | 9.09            | 24.22 |  |
| s526*   | 0.45     | 264.00 | 0.41     | 160.31 | 8.89            | 39.28 |  |
| s838_1  | 0.65     | 414.39 | 0.65     | 389.64 | 0.00            | 3.80  |  |

Table 4.5: Extracted results for some ISCAS89 benchmark circuits

As can be seen in the table, the power values are significantly improved for all circuits when the mapping is done to the extended library. The power is given in milliwatt and a reduction of several tens of mW can be achieved. In particular, more than 100 mW are saved for the ECL\_E-solution of s526 circuit. This is enabled with additional mapping directives for some flip-flops in selected timing paths. Nevertheless, the area occupation is quite similar as expected.

As confirmed by these practical evaluations, the challenges on power consumption of differential standard cell-based designs can be addressed with a range of different speedclasses in a library. The extension of speed-classes is easily implemented by using the modular standard cell concept. The components for Rload and Bias-Section can be exchanged to derive a new speed-class of a standard cell. The initial library ECL\_I is already sufficient for several circuits in the analog design approach. However, more cell configurations are required (ECL\_E) by CAD tools which can be efficiently selected during a standard cell-based design generation.

#### 4.2.6 Level-Shifter

The input stages of high-speed differential ECL gates expect dedicated input voltagelevels in order to operate at their optimum. As a consequence, special voltage-levelshifter cells are required for voltage level conversion to the respective  $V_{BE}$ -level. They interface the differential output voltage of a driving circuitry and act as emitter-follower to provide the correct  $V_{BE}$ -level voltage for the receiving input stage. In a standard cell-based design, two different approaches are applicable.

One proposal is to integrate the level-shifters directly into each differential logic gate in front of the input stages as introduced in Section 3.2.2. Thus, all differential pin pairs have the same first voltage level interface ( $V_{BE}L1$ ) which is seen by the external side. Furthermore, there is no additional action required by the logic synthesis tool to optimize or correct the interface signaling levels.

The selected alternative solution is the use of individual speed-class-dependent levelshifter cells as part of the standard cell library. The correct cell and pin connections can be made during the mapping phase of logic synthesis run. Hence, it is proposed to define an additional connection\_class attribute in the pin section of the .lib-file. As a consequence, the input and output pin-related  $V_{BE}$ -voltage levels are properly defined. However, this implies the support of an extra attribute leading to a limitation or limited use of the selected logic synthesis tool. Moreover, all level-shifter must be suitable to drive all required  $V_{BE}$ -levels. Consequently, several speed-class-dependent level-shifter cells for the second and third voltage level must be offered separately in the library.

Figure 4.5 (a) shows the transistor-level scheme of the supported level-shifter concept of this work. The level-shifters are stand-alone cells. As can be seen, both voltage levels  $V_{BE}L1$  and  $V_{BE}L2$  are generated from the differential input signal Dp/Dn. The differential output pairs Y and Z imply the second and third  $V_{BE}$ -level, respectively. An application of such level-shifters in a CML latch is illustrated in Figure 4.5 (b). The



Figure 4.5: Level-shifter: (a) transistor-level scheme with both output levels, (b) levelshifter in application for the reset and clock signals of a CML latch

input voltage levels for the clock and a reset signal are generated by two separate cells. As can be seen, level-shifter cells are expensive gates. They consist of two current sinks realized by NMOS transistors. Thus, the use of level-shifters will have a major impact on the overall static power consumption and area occupation in later generated standard cell-based ECL applications. The appropriate mapping is realized during synthesis run. Thus, the power is considered during design optimization.

## 4.3 Library Aspects for Differential Logic Design

Standard cell-based design flows for the creation of digital differential logic applications differs to the conventional one due to the differential signaling concept. The compliance to the flow must be enabled at the logical and layout generation stages. This implies the execution of additional design tasks with at least two extra libraries: one for the compliant intermediate design generation, i.e., single-ended domain (SE-domain), and another one for the final implementation in the differential domain (DIFF-domain). Both require coherent library concepts. They are presented in this Section. In particular, the proposed solution to obtain bipolar differential circuits with limited routing resources is given.

#### 4.3.1 Standard Cell Set

All combinational gates, which can be realized with the use of one single current source should be part of a minimum differential standard cell library. Among them are twoinput logic gates, such as, OR, NOR, AND, NAND, XOR, or XNOR. Furthermore, to include a three-input 2:1 multiplexer cell (MUX) is also beneficial. Although its function can be realized by combining, e.g., three NAND gates, the classical differential singlegate 2:1 MUX cell has a one-stage propagation delay and requires one I<sub>SS</sub> bias current only (see Figure 2.13 (a)). Thus, power consumption and area occupation are significantly reduced. Similarly, level-shifters, flip-flops and latches, and inverting differential amplifiers should be provided. They are essential gates and are often mapped during design generation with logic synthesis tools. All these cells must exist in all selected speed-classes (see Section 4.2.4 and 4.2.5). Moreover, in order to interface CMOS logic or LVDS signaling, converter cells with ECL interface in both directions, i.e., cmos-toecl and lvds-to-ecl and vice versa, are also advantageous. Finally, all these cells should be developed in its *base function*, i.e., the implicit resulting cell's functionality without crossing the differential signal pairs. This cell set implements the *baseline library* which is finally used for ASIC fabrication. Moreover, this is the input for later library creation for the SE-domain to enable the use of the standard cell-based design flow.

Nevertheless, differential logic gates can also be driven in single-ended mode as introduced in Section 3.2.2. To enable this mode in parallel, the effort in library design and standard cell characterization has to be increased. Moreover, it complicates the logic synthesis and layout generation stages. As a consequence and in order to reduce these NRE costs, the pins of all logical cells of this baseline library have a differential pin interface. Nonetheless, the proposed design methodology still supports single-ended signaling such as CMOS signals.

#### 4.3.2 Single-ended Pseudo-Gates

To make the differential baseline gates comprehensible to standard digital design tools, the logical function of each differential gate has to be modeled as a single-ended counterpart. These gates are named Single-ended Pseudo-Gates (SEPGs) and comprise the SE-domain at logical (netlist) level. Their views are used during logic synthesis for mapping from generic netlist objects to an intermediate netlist with pseudo-gates. The difference to the differential gate is the pin interface and the logical function. The cellspecific parameters such as timing, power and area are identical. In more detail, each differential logic gate is able to process various types of logical functions through permutations by crossing the differential pin pairs. As an example, a buffer can act as an inverter if either the input or the output pin pair is crossed. As a consequence, the SEPG variants can be easily obtained by implicit negation by crossing the differential signal pairs of the gates.

Based on this fact, several variants with different logic functions can be derived from the original differential baseline gate. As a result, the derived SEPGs do not provide a differential pin interface. They offer single-ended (SE) pins, compliant to the wellsupported single-ended signaling by standard CAD tools. Moreover, the cell richness of this resulting SE library is increased. These alternative SE variants can be used during logic synthesis or place and route stage and improve the quality of results.

Some gate transformations of a differential baseline OR-gate are depicted in Figure 4.6 as examples. The OR-gate with its differential signal pairs is illustrated in Figure 4.6 (a). The single-ended OR-based gates are illustrated in Figure 4.6 (b). Particularly,

when the output pin pair of X is crossed, the single-ended NOR function is obtained. Furthermore, an implicit NAND function can be realized by crossing the pin pairs A and B of the baseline cell. Moreover, with an additional inverted output X, the AND function can be accomplished (see Figure 4.6 (c)). Similarly but not shown in the figure, the function  $\overline{A} \vee B$  or  $A \vee \overline{B}$  can be obtained by crossing one of the corresponding differential input pin pairs and transforming the function to the respective single-ended behavior.



Figure 4.6: Gate transformation of an original differential logic gate to SEPG variants: (a) the baseline differential gate with OR-function, (b) the derived single-ended OR/NOR-variants, and (c), the AND/NAND gates

With the use of that approach, all single-ended (SE)-variants are created by scripting and derived from the baseline library as illustrated in Figure 4.7. The essential views (timing and behavioral models) for several design stages are generated. To enable this model generation, two approaches are proposed. Either the new models can be derived from a SE library with uncrossed baseline function, or the models are generated from uncrossed variants with differential pin pairs. Which approach is selected, depends on the strategy how the characterization, i.e., the timing and power model generation of the baseline logic cells is performed (see Section 4.3.5).



Figure 4.7: Two approaches for SEPG-model generation from different baseline libraries

Finally, it has to be emphasized again, that the SEPG-library cells have no real, physicallyexisting object behind. They are only required to model the differential logical function to single-ended counterparts to enable the implementation with digital design tools. These intermediate views are virtual views, replaced by their differential baseline counterparts before chip fabrication. However, with the proposed generation for SEPGs, the library development and maintenance costs of this library type are reduced. In total, the views of 612 SEPGs are generated by scripts from 61 logic gates of the differential baseline library. Any change of a baseline gate is automatically derived and included in the SEPG by model generation.

#### 4.3.3 Fat-wire Specification

Similarly as for the logical level, design tools for digital standard cell-based design require suitable cell descriptions at the physical implementation stage. If a layout is generated from a fully-differential netlist, the differential signal pairs are treated and routed as single-ended wires. Thus, the wire lengths and the influence of parasitic effects of a differential pair are not similar. Moreover, the logical function and an implicit signal inversion by crossing a differential signal pair are not supported. Both of these limit the design optimization opportunities. Consequently, a similar SE counterpart – an intermediate view – is required at physical implementation stage in order to enable the full tool support during PnR. This view is an abstract fat-wire (FW) layout view.

With the use of these FW layout views, an intermediate layout design with the connectivity of an SE netlist can be obtained in a first place and route phase. After a successful physical implementation, the design has to be transformed to the real-existing differential gates (DIFF). The SE pins and wires are split and relabeled to their differential counterparts. The final differentially routed paths must be logically connected to the new pins, and the routes must fit to the new pin-shapes and the differential routing grid. As a consequence, both layout variants of the FW and the DIFF cells have to be drawn in their respective grid definitions to ensure consistency between the two design domains.

The FW concept to obtain in-parallel-routed differential signal pairs has been already introduced in Section 3.1.3. The recommendations of [78] are followed for the routing grids, wire widths and spacings in order to maintain a compliant layout definition for the CML cells of the FW and DIFF domain. In addition, all parameters are selected as such, they match to the specifications of the corresponding CMOS standard cell library to enable the design of, e.g., flat, hybrid CMOS/CML/ECL standard cell circuits.

The pitch P of a single differential wire d is defined by the sum of its width W and its same-net spacing S.

$$P_d = W_d + S_d \tag{4.1}$$

The pitch P of a fat-wire f must be a k-multiple of  $P_d$  in order to align both grids to each other:

$$P_f = k \cdot P_d \tag{4.2}$$

The width W of fat-wire f can be derived immediately as the sum of the differential width and its pitch to:

$$W_f = P_d + W_d \tag{4.3}$$

, whereas the spacing S for a fat-wire f is given by:

$$S_f = P_f - W_f \tag{4.4}$$

The fat-wire placement grid  $(G_f)$  and its differential counterpart  $(G_d)$  are already given by the pitch specifications. It is defined by:

$$G_f = P_f$$
, and  $G_d = n \cdot P_d$  (4.5)

Figure 4.8 illustrates the definitions above on two fat-wire signals (net01, net02) and the differential pair routes (net01p/n, net02p/n).



Figure 4.8: Illustration of fat-wire and differential wire definitions

To be compatible to both placement grids, the cell height of a DLG must be a multiple of the fat-wire pitch  $P_f$ . Table 4.6 list the specified parameters for the libraries in the 250 nm technology node.

Table 4.6: Placement and routing specification for the selected 250 nm technology

| Parameter   |                 |
|-------------|-----------------|
| $W_d = S_d$ | $0.84\mu{ m m}$ |
| $W_f$       | $2.52\mu{ m m}$ |
| $S_f$       | $0.84\mu{ m m}$ |
| $G_f = P_f$ | $3.36\mu{ m m}$ |

If a library is fulfilling these requirements, an on-grid placement and a proper on-track routing for the FW and the DIFF domains are achievable. Furthermore, the parameters are chosen such that they are a multiple of the pitch definitions of the additionally offered CMOS standard cell library. Thus, a compatible fat-wire design for standard cell-based differential circuits is enabled.

#### 4.3.4 Fat-wire Compatible Differential Standard Cell Layouts

Based on the logical section scheme introduced in Section 4.2.4 and the aforementioned requirements for fat-wire routing, the baseline layouts of the differential logic gates can be developed. Since digital standard cell design is intended, the layouts have to be drawn according to standard cell design rules. This includes among other things, a common pitch and grid definition and the rail arrangement of special net pin-shapes for power and ground connections. Moreover, in order to reduce the NRE costs for library design, it is proposed to apply a modular concept also for the layout views.

A fat-wire compatible differential standard cell layout frame is illustrated in Figure 4.9. As can be seen, all three sections are limited by the width of the VCC power and VSS ground rail shapes. The *Bias-Section* is placed above the VSS rail. The essential *vbias* signal itself is treated as a special net and shaped like a thin additional rail at the lowest metal layer.



Figure 4.9: A fat-wire compatible standard cell layout frame of a differential CML latch

The *npn-Section* is realized by an array-like placement of the HBTs. The function is internally routed with as less as possible metal layers in order not to block top-level routing tracks at later design stages. On top of that, the *Rload-Section* is located with its resistors connected to the VCC rail and collector nodes. However, the selected 250 nm technology offers a three-thin and two-thick metal-stack. Moreover, the layout frames of the available HBT devices already consist of shapes of two of the three thin metal layers as mentioned before. Thus, most of the interconnections in the *npn-Section* and the pin-shapes for a final pin-access need to be drawn at the most top thin metal layer, i.e., the third metal layer.

In addition, it has to be emphasized the same class of metals is used for the logical differential pins as discussed in the previous section. Otherwise, thicker metals require a larger minimal width and have different track and pitch definitions. This would lead to off-track routes and block adjacent routing channels after design conversion. Moreover,

this technology-related limitation also implies, that only three metal layers can be used for the entire routing of a digitally-designed differential logic circuit. As a consequence, such designs are more congested due to the low number of available thin routing metal layers in this technology.

Due to these limitations, it is proposed to move the FW pins to the cell boundary in order to free routing channels for final pin-access. Consequently, these pins are called *virtual fat-wire boundary pins*. These are virtual pin-shapes and only present in a FW cell layout view. They are routed in a first routing-phase at PnR stage, whereas the final pin-routes are generated in a preserved area in a later implementation step afterwards.

Figure 4.10 illustrates the abstract layout views used for design generation and final manufacturing mask generation. The view in Figure 4.10 (a) is required for the placement and the first routing phase of the SEPG netlist design with the use of a complete FW layout library. The pin-access area preserved by the definition of the *virtual fat-wire boundary pin* locations of a differential logic gate is depicted in illustration Figure 4.10 (b). The intermediate routing blockages of the FW counterpart are not existing in this view. However, this layout view is physically identical to the layout shown in Figure 4.9. It is only presented here again for the context.



Figure 4.10: Layout views: (a) fat-wire layout with fat-wire boundary pin-shapes and additional routing blockages, (b) layout of the differential counterpart with depicted preserved area for the final pin-access

Similarly as for the SEPG library, the layout frames for this database are automatically generated by scripts out of the differential counterpart baseline library with the respective original abstract layout views as input. The detailed procedure is presented in Section 4.3.6 right after a short introduction about the characterization of differential logic gates.

#### 4.3.5 Characterization of Differential Standard Cells

CAD tools for digital standard cell-based designs require power and timing information of each used standard cell of a design in order to estimate and predict the timing budget and power consumption at different design stages such as logic synthesis or place and route. The way this information is generated is known as standard cell characterization (CZ). As a result of this task, the .lib-files for power/timing and .v-files as behavioral models are generated (see Section 2.2.3).

Nowadays standard cell characterization tools are optimized for large cell sets of CMOS logic gates. The support of differential signaling is still kind of a niche, since it is not widely used in digital design. Moreover, the characterization setup for differential logic differs to the CMOS-like single-ended signaling concept. Consequently, approaches for characterization of differential standard cell gates are required to provide the essential information to the design tools. Two different ones are proposed in this work. The first one can be used when the CZ-tool does not support differential signaling, whereas the second one requires the support of this signaling concept.

In the first approach, all differential gates to be characterized are encapsulated by signal wrapper cells. These wrappers provide a single-ended compatible pin interface and convert all signals to differential signal pairs. The differential input pin pairs are connected to a single-ended-to-differential (S2D) wrappers. The differential outputs are post-processed by differential-to-single-ended (D2S) counterparts. As a result, both types of wrappers provide a CMOS-compatible interface to the CZ-tool and generate the internal differential voltage-levels for the gate under characterization. The thresholds for delay and transition time measurements are defined in percent for the single-ended pins. Thus, the timing and power characteristic of each wrapped differential logic gate can be properly modeled with the use of a commercial library characterization tool. This approach applied to a differential CML latch is shown in Figure 4.11



Figure 4.11: Encapsulation of a differential CML cell with wrappers for CZ

The second approach is useful, when the CZ tool supports differential signaling. Thus, the logic gate can be directly specified for characterization without any input and output wrappers in between. The input/output characteristics can be specified directly for all differential pin pairs. The points for signal measurements are further given by the crossing midpoints of the individual differential signal pairs. As a consequence, the delay and transition times of the differential logic gates can be exactly measured.

Both approaches are experimentally examined with the use of different commercial standard cell characterization tools. The first characterization of the CML/ECL library is done with Cadence Encounter Library Characterizer (ELC). The second approach is performed with the use of Cadence Liberate<sup>TM</sup> [28], a modern library characterizer capable to process complex logic gates.

However, in any case, only models of the basic libraries with cells of the uncrossed variants are obtained, i.e., either SE views (.lib/.v) as a result from approach 1 or a resulting fully differential library bundle as an outcome of approach 2. On the other side, the more functional variants are offered in a library (richness), the more alternative gates can be selected during design generation using CAD tools. As a consequence, the additionally SEPG model variants are generated out of the characterized results by custom library conversion tools and scripts as already mentioned in Section 4.3.2. The functional behavior in terms of Verilog® behavioral models is generated in parallel.





Figure 4.12: Two approaches for model generation

The purpose of the SEPG model libraries is the generation of standard cell-based circuits in differential logic, designed with intermediate views of single-ended-compliant pseudo cells. These views are used at logic synthesis, simulation, or FW layout-level in the SE domain to obtain good quality of results. The differential (DIFF) model libraries are applied at the same design stages for a differential design counterpart. Consequently, the view generation of the layout information for both domains (SE/DIFF) has to be developed additionally. The procedure is described in the next section.

#### 4.3.6 Physical View Generation

The selected layout generation tool [26] requires abstract geometrical information in terms of .lef-files (LEF) of every instantiated cell in the gate-level netlist. With known

cell dimensions, pin-locations and blockage areas, the PnR tool is able to generate designrule-conformal layouts for a given connectivity in a netlist. The design parameters such as timing, power and the functional behavior are extracted using the cell-related information provided by the linked .lib-model libraries. With respect to this work, the proposed standard cell layout frames for differential logic cells follow a modular concept (see Section 4.2.3). It supports fat-wire routing to obtain in-parallel-routed differential signal pairs (see Section 4.3.3 and 4.3.4). In order to save some NRE costs and library maintenance, it is preferable to derive also every fat-wire SEPG layout counterpart from the original cell layout automatically. This improves the adaptability when the layout of a master cell has changed. Nevertheless, some limitations have to be considered for this physical view generation of differential master cell layouts.

Pin-shapes with multiple segments are freely accessible by the router. Thus, it might happen that a pin is not connected at the midpoint of a signal path, but therefore somewhere at an edge or close to a border of the extracted pin. Moreover, the FW layout view generation with its fat-wire pin calculation relies on dedicated pin-shape locations, definitions, and dimensions inside the baseline master cell layouts. As a consequence, the pin-access-region and the pin locations are constrained by using the extracted, exactlydrawn pin-shape only. Their locations can be extracted from the .lef-information of the master cell layouts. However, the layouts of the gates lead to higher routing congestion which has to be addressed already by the fat-wire cell layout itself.

The proposed procedure to cope these challenges is illustrated in Figure 4.13. The procedure itself is depicted in Figure 4.13 (a), whereas the corresponding LEF syntax for both, the pin master locations and the fat-wire variant are depicted in (b) and (c), respectively.



Figure 4.13: Forming the fat-wire pins: (a) procedure for fat-wire pin generation, (b) the respective LEF syntax of the differential layout, and (c), the generated FW-LEF content

As can be seen in the figure, one rectangle shape is therefore defined for each differential pin (e.g., Dp and Dn). A fat-wire boundary pin-shape geometry for the single-ended pin D can be calculated by the merge of the lower-left corner (LL) of the negative pin (Dn) and the upper-right corner (UR) of the positive pin (Dp). An additional pitch-offset in X direction to the cell boundary can be added to free more routing tracks for final pin-access. Furthermore, the original differential pin locations are finally marked as local blockages in order to avoid shorts in later PnR phases.

A complete LEF-library is generated following the presented procedure above with a commercial abstract generator tool. The fat-wire layout views for all logical SEPGs variants are derived with a custom *LEF Library Generator* tool as shown in Figure 4.14. It uses these new baseline FW macro definitions and realizes new variants by logical negation by pin-pair crossing. However, the calculation and extraction of a FW pin are only possible if the differential pin is explicitly defined with one single shape. If the entire signal below such a pin-shape is intentionally extracted, the number of rectangles does not allow to derive the target fat-wire pin location. As a consequence, it is mandatory to extract only the pin-shape without the metallization below during physical view generation of the differential cells' master layouts.



Figure 4.14: Generation of physical views for differential logic design

With the physical FW-views for the SEPGs and the differential master cells (DIFF), all necessary data for layout generation is provided to design standard cell-based differential logic circuits with commercial design tools. The FW views are used for intermediate layout generation of a SEPG netlist design, whereas the LEF-information of the real-existing differential gates is used in a second route phase.

## 4.4 Design Methodology for Differential Logic Design

This Section introduces the design methodology with the use of the proposed cell and library concepts. It covers the proposed design flow, discusses the design limitations and flow modifications. Moreover, it can be seen as a guideline in order to obtained compliant differential logic standard cell-based designs.

#### 4.4.1 Design Flow for Differential Standard Cell-based Design

All recent works for differential logic design propose well-established flows maintaining the compliance to the standard design flow (cf. Section 3.1). Cell-substitutions or design conversion stages implement the interface between the single-ended (SE) and differential (DIFF) signaling domains. As a consequence, a similar flow concept with slight modifications can also be applied to the design of standard cell-based bipolar differential logic applications. The design flow for digital standard cell-based circuit design is already presented in Section 2.3. Consequently, only the differences, the extensions in comparison to the standard flow (see Figure 2.4), and the implicit interactions are introduced here.

The proposed flow for differential logic design is depicted in Figure 4.15. The starting point which differs from the standard design flow is a logic synthesis netlist result obtained at GLS stage with the use of SEPG libraries and models. In the modified flow, the PnR stage is split into two phases. One phase is for the generation of an intermediate fat-wire layout design with single-ended FW cell variants. A second one implements the differential counterpart with split nets and differential logic gates for final pin-access. A conversion step is introduced after the first PnR-phase which transforms the designs into the differential domain (DIFF). From now on, the appropriate libraries with its differential master cell layouts and timing models are considered by the tools.



Figure 4.15: Proposed design flow for standard cell-based differential logic design

As a result, a digitally-designed differential standard cell-based design can be obtained. However, some limitations and features at RTL level have to be discussed for compliance aspects.

#### 4.4.2 Limitations and Features for RTL Design

The RTL implementation for a reliable differential logic design does not require extra effort or particular attention. The behavior can be described in traditional single-ended manner in order to be compliant to the logic synthesis step and the proposed SEPG concept. However, the extension of the standard design flow brings with it some special features and limitations at this level that should not go unmentioned. Consequently, they are addressed in this Section. First of all, differential logic generally has interfaces to further CMOS modules. Their behavioral description can be managed modular-wise in hybrid, or written in fullymixed manner. Even though it is possible to generate flat hybrid CMOS/CML/ECL designs following the proposed concepts and methodology, a mixed design description in the same module is not recommended. At RTL stage, the technology is unknown, whereas the mapping is realized during logic synthesis step. Another reason is the challenge at layout generation stage, since these logic families have totally different characteristics in terms of cell frame size and signaling concept. As a consequence, it is proposed to follow a strict design hierarchy with a clear domain separation in that sense, the functional descriptions for single-ended logic (such as conventional CMOS) and behavioral directives for differential logic design are divided in separate modules each.

Furthermore, the essential interface wrappers for signal conversion from one domain to another such as cmos-to-ecl/ecl-to-cmos cells, can be explicitly instantiated inside a module. Thus, the implementation of the modules is realized fully independent of each other which gives an enhanced design control.

Moreover, it is important to avoid implicit direct connections between single-ended pins and differential pin pairs. For instance, if such a faulty connection remains in a design without a converter cell, the electrical signal is improperly driven at an incorrect voltage level leading to unpredictable transitions. Both can be prevented with the aforementioned decisive logical separation of the logic-related behavior into different modules. Thus, each module can be generated in its target technology during logic synthesis. Furthermore, additional attributes such as **connection\_class** for the SEPGs can be defined which are evaluated at GLS stage.

On the other side, a feature of the proposed flow is the general handling of single-ended signaling. As an example, the timing-uncritical initialization of a configuration realized by flip-flops is selected. At RTL level, this can be understood as an asynchronous control behavior. The respective VHDL code for such a high-active asynchronous initialization is shown in Listing 4.1.

```
Listing 4.1: Asynchronous initialization of an n-bit register
```

```
process (clk, init)
begin
    if init = '1' then --high-active async. initialization
        dout <= (others ==> "0");
    elsif rising_edge(clk) then
        dout <= foo;
    end if;
end process;</pre>
```

5

At this design stage, it is still open how the signaling of the *init* signal is handled and finally realized. The decision whether differential signaling or single-ended signaling is used, depends on the available flip-flops for the selected logic at GLS stage. If a SEPG library provides both types of flip-flops (FFs) cells, i.e., cells with single-ended control pins and cells with a full differential interface, the final signaling is decided by the mapped type of flip-flop in the netlist. The respective pin interface is evaluated by the configuration at *Conversion* stage and results in an implicit signaling.

As a conclusion, the proposed design flow supports single-ended signaling inside a differential logic module, whereas a clear modular separation between the different domains is preferred to ease the implementation.

#### 4.4.3 Gate-Level Synthesis

The Gate-Level Synthesis (GLS) for differential logic designs is enabled with the use of the SEPG standard cell library (see Section 4.3.2). As a result, intermediate singleended netlist designs can be generated with commercial logic synthesis tools. Moreover, the gate-level designs can be functionally verified by post-synthesis timing simulations. Although the design methodology for these designs generally follows the standard design flow, three aspects have to be addressed at this stage to maintain a correct signaling and in order to simplify the implementation of differential logic SEPG or hybrid CMOS-SEPG designs.

The first aspect addresses the usage of different level shifter concepts. If the level shifters are integrated inside the logic gates, the voltage levels of all signals are equal for a DLG. The respective  $V_{BE}$  levels for CML are internally generated and no additional effort is required. In contrast, one solution to obtain correct designs with separate level shifter cells is the use of the connection\_class. It can be specified for each logical pin in the .lib-file. Thus, the synthesis tool checks for potential signaling violations based on evaluation of the attribute definitions. If the selected synthesis tool supports this additional attribute, the violations are corrected automatically, whereas manual interactions are required in the other case.

Another aspect is related to the used units for constraint definitions in hybrid designs when different types of logic, such as low-speed/-power CMOS and high-speed/-power CML/ECL, are used in the same circuit. The design rules and constraints for clock definitions, false paths, input and output delays, static or dynamic power can be set depending on one primary library. On the other side, standard cell characterization results are provided in different order of magnitude if libraries with fully contrary cell performances are used at the same time. Thus, the physical units for timing of highspeed logic CML can be given in ps, while ns are often given for low-speed logic gates (e.g. CMOS). Similarly, the power consumption for CML is given in mW instead of, e.g.,  $\mu$ W or nW for CMOS. As a consequence, a proper constraining and careful evaluation of the reported result files are required during hybrid design development.

However, maintaining a well-separated design hierarchy with logical domains, e.g., CMOS and SEPG modules, is the most important requirement for the GLS step. This improves the design control and enables the individual physical implementation of the different logic modules. Consequently, the design compilation has to be done logic-family-wise and

in hierarchical manner in hybrid, mixed-logic designs. The constraints can be derived from the top-level for all submodules. Thus, every block is individually synthesize-able with its respective constraints and in the target logic family by appropriate control of library and cell selection. In order to keep essential signal interface cells in the design, they are excluded from any optimization run. Moreover, every synthesized submodule is finally set to *untouchable* in upper hierarchy levels during implementation. The cell selection for individual modules and the prevention from optimization can be controlled with the use of common SDC commands, such as set\_dont\_touch and set\_dont\_use. Finally, each module has to be validated against the design rules, especially with respect to compliance with connection\_class when the library concept uses separate level shifter cells. Nonetheless, this procedure also allows to generate pure SEPG designs, either in flat or hierarchical manner.

For demonstration purpose of the methodology, a synthesis solution of a SEPG design mapped to the proposed CML/ECL library in 250 nm technology is illustrated in Figure 4.16. The circuit is a divide-by-4/divide-by-5 clock divider (div4/div5) as part of a high-speed clock generator. As can be seen in the figure, an instance of a level-shifter U11 is automatically inserted in order to provide the correct  $V_{BE}L2$  signal voltage for the U10/B pin. Based on the STA, this clock divider is able to operate up to 10.0 GHz. It consumes 33.31 mW static power and occupies 18000.00  $\mu$ m<sup>2</sup> silicon area.



Figure 4.16: SEPG scheme of a div4/div5 clock divider

Following this methodology, hierarchical-hybrid CMOS-SEPG or pure SEPG designs can be developed with the use of standard logic synthesis tools. The obtained results give a first impression for area occupation, timing behavior, and power consumption of these pseudo-differential standard-cell-based designs. The netlist and its corresponding timing can be extracted and used for post-synthesis timing simulation. When the designs have passed all simulations for functional verification, the layout generation of this intermediate design view can be started.

#### 4.4.4 Placement and First Routing-Phase

The compliance at logical level is addressed with the use of single-ended gate variants (SEPGs), whereas the FW-layout views are used for the generation of initial pseudolayouts which require some additional post-processing to obtain a final differential counterpart. Similarly as proposed in the recent works introduced in Section 3, the PnR stage is divided into phases. In a first phase, the layout of a SEPG-netlist is generated using the fat-wire libraries together with a commercial PnR tool. The output of this stage is an intermediate FW-layout design with its optimized post-route SEPG-netlist. This Section describes the flow of the first PnR phase.

The floorplan for an SEPG design is similarly developed as for a standard digital design. One exception for CML-based logic is the huge amount of static power consumption up to several mW per single gate. As a consequence, the Power Distribution Network (PDN) has to be implemented more robustly in comparison to CMOS designs. Moreover, it is proposed to create additional ring and stripe structures for the common bias signal (*vbias*). Thus, this special net is properly connected by cell abutment during placement. In addition, special tap cells can be inserted below the vertical stripes to improve the biasing. Furthermore, one global ZTC block and the left branches of the current mirror are added and placed – one per CML/ECL domain. Afterwards, all standard design tasks such as blockages, channels, placement-guide definitions, and pin placements can be done to finalize the floorplan.

To prevent later-generated shorts after design conversion, all routing tracks are defined only in their preferred vertical and horizontal direction. Thus, resulting shorts by wire splitting can be avoided, since every routed corner of a wire segment is generated by two different metal layers and one via at least. This generates an implicit separation seen by the conversion tool. Moreover, only metal layers with same width and pitch definitions are allowed for FW-signaling. As a consequence, three thin metal layers can be used for routing in this selected 250 nm technology: two for horizontal routes, and only one layer for vertical routes, respectively. After this routing specification, the FW cells of the logical SEPGs can be placed. Furthermore, the CT can be implemented and the design can be optimized to meet the design and timing constraints. However, a generated FW-design solution has to be verified additionally against a design-rule conformal level-shifter instantiation. The first routing can be invoked afterwards.

By default, all signal nets are treated and routed as fat-wire signals. If other wire dimensions are preferred, most PnR tools support alternative signal net-related routing attributes. Moreover, wires not-to-be-split can be excluded from routing in this first phase. All other signals are routed and the FW-pin-access is obtained by interfacing the predefined virtual fat-wire pin-shapes located at the cell boundary. This preserves free routing channels for final pin-access in a second phase. In addition to the standard implementation checks, all FW-designs have to be checked for *wrong-way* or *wrong-track* violations indicating a wire segment is routed on a non-preferred routing layer. After fixing all violations, the design is extracted and the post-route timing is calculated. As a result, the design is exported with the corresponding post-layout SEPG-netlist (.v), timing data (.sdf), and with the related FW-layout information as a .def-file. The generated data is ready for verification by post-layout timing simulation and design conversion to obtain a natural differential logic design with balanced differential signal pair routes.

#### 4.4.5 Design Conversion

The generated FW-layout views and the connectivity as SEPG-netlists still contain surreal pseudo-gates and wiring. Both are required to enable the compliance to standard design flow and to obtain standard cell-based differential logic designs. Nevertheless, a design conversion step is required to translate these intermediate-design data to real technology-dependent counterparts.

For that purpose, a custom design converter tool is proposed. It is written in C++ and translates the SEPG-netlist and fat-wire layout data to a differential design. In more detail, the SEPG gate variants are remapped to their differential master cell counterparts. When the converter identifies a pin as a differential pin, the pre-routed wire segments of the first routing phase are handled as wires to be split. Every fat-wire is divided into a positive and the negative branch with the specified spacings and widths. Moreover, all routed IO pins are translated into differential pin-shapes as well. The conversion settings can be predefined in an XML configuration file, in which the pitch, cell mapping rules, and spacings can be set (see Figure 4.17).



Figure 4.17: Design conversion: from SEPG to DIFF design

Together with additionally defined cell and pin information, the designs are parsed and the differential objects identified, remapped, or split. Thus, the design converter is capable to translate designs of different technologies by configuration. When both views (netlist and layout) are converted, all wire segments of the differential signal pairs are implicitly routed in parallel after splitting. Furthermore, one feature of the converter is to split wires also if labeled with special attributes for an inverse behavior. This functionality can be configured and is supposed to enable future processing of pre-routed hybrid digital designs made of, e.g, SEPG logic and CMOS arranged in one flat module. Moreover, when the conversion is only applied to a post-synthesis netlist, the new, generated differential netlist (.v) can be used in order to extract the timing in SDF format by explicit timing engines or logic synthesis tools. As a consequence, digital timing simulations of differential gate-level designs is enabled, directly after the GLS stage.

Listing 4.2 shows a configuration of the converter tool. The behavior for pre-routed signals and vias are specified therein. It is further possible to delete all signal vias during conversion. Finally, wire segments can be shrunken or extended by specific settings.

```
Listing 4.2: XML configuration for the design converter
```

```
<config>
  <default>
     <splitP value="p" /> <!-- default suffix: p -->
     <splitN value="n" /> <!-- default suffix: n -->
     <keepNetVias value="1" /> <!-- default: 1 , (boolean) -->
     <defSwapPNNet value="0" /> <!-- default: 0 , (boolean) -->
     <defSwapPNPin value="0" /> <!-- default: 0 , (boolean) -->
     <instancePinCrossing value="0" /> <!-- default: 1 | 0 or 1 (boolean) -->
     <pitch value="1680" /> <!-- units | default: 1680 -->
     <rowConversion rowsite="ECLCoreSite" fatstep="3360" thinstep="840" /> <!--
         SITE Conversion -->
     <ioPinConversion fatsize="2520x2520" thinsize="840x840" /> <!-- I/O pin
         Conversion -->
     <spacing value="840" /> <!-- units | default: 840 -->
     <shrink value="0" /> <!-- units | default: 0 -->
     <extend value="0" /> <!-- 0 or any number | default: 840 -->
     <rule enabled="false" name="ECL FATWIRE RULE" /> <!-- if enabled, add
         NDR -->
  </default>
  <rulemap>
     <rule name="ECL_FATWIRE_RULE" toname="ECL_THINWIRE_RULE"
         pitch="0.84" />
  </rulemap>
</config>
```

As an example, the result of a fat-wire routed design is depicted in Figure 4.18 (a). As can be seen, only one wire segment is routed for every signal derived from the SEPG connectivity in the netlist. The routing result after design conversion is illustrated in Figure 4.18 (b). All former fat-wires are now split into two wire branches maintaining the configured spacings as set in the XML configuration file. After this intermediate step, all differential wires are not properly connecting adjacent wire segments or pins.



Figure 4.18: Routing of a design, (a) fat-wire design, (b) the split variant after conversion

The segments are still interfacing the virtual locations of the fat-wire boundary pins in a converted design. As a consequence at this stage, the design is full of open nets from the connectivity point of view. Thus, the final detailed connections have to be completed in a

next, second routing phase in order to finalize the detailed pin-access and to obtain wellconnected differential logic designs with parallel routes for RC-balanced characteristic.

#### 4.4.6 Second Routing-Phase

This design flow extension stage is required in order to generate the final pin routes in addition to the split but unfinished pre-routes by the design conversion tool. This action can be done by commercial PnR tool with the use of the output files of conversion stage, i.e., the differential gate-level netlist (.v) and the new, split-wire respective layout design in DEF format.

In contrast to the first PnR phase with the SEPG-FW library setup, the virtual fatwire boundary pin locations and shapes are invisible, since the differential standard cell master layouts are linked at this stage. As a consequence, the connectivity check will flag many unfinished open-net violations which must be corrected. However, it is proposed to set the generated differential pre-routes by the converter tool to *fixed* to avoid any distortion of in-parallel routed wires. Otherwise, these open routes are rerouted for detour avoidance or alternative pin-access. Furthermore, an implicit signal inversion of a differential pair is realized by straightforward routing to the target pins. They are finished by means of differential pair crossing and may result in remaining routing artifacts in the final layout design (see slight detours on the left side in Figure 4.19).



Figure 4.19: Detailed pin-access generated in the second routing phase

However, the layout generation of differential logic standard cell-based designs is obtained. The finalized design database can be exported and taken as input for post-route timing simulation or design verification. It is prepared for further integration or fabrication.

## 4.5 Summary

The proposed design methodology, library and cell concepts for differential logic design enable the generation of reliable, differential standard cell-based applications. The modular library cell set and the standard cell arrangement are presented. In-parallelrouted differential signaling is obtained by the well-known fat-wire design approach. For technologies with few routing metal layers, the use of special *virtual fat-wire boundary pins* is a suitable solution to enable the pin-access with few available metal layers. A configurable design converter remaps a SEPG (netlist) and FW (layout) design to the differential counterpart. The layout generation is therefore divided into two phases, whereas the second phase realizes the final connections in the differential design. Furthermore and as a conclusion, the proposed modular standard cell concept and the benefit of speed-classes in general allow to obtain efficient digitally-designed standard cell-based reliable hardware systems.

"Du dödel di! Dö dudel dö ist zweites Futur bei Sonnenaufgang!" — Loriot – Die Jodelschule

## Chapter 5

# Concepts and Methodology for Radiation-Hardening-by-Design Circuits

The design of Radiation-Hardening-by-Design (RHBD) circuits with the use of standard unhardened technology requires additional effort in cell and library development. Applicable concepts at different levels have to be addressed to counteract radiation effects with low penalties in terms of area, power, and delay overhead. Moreover, a corresponding design methodology has to be developed to enable the design of robust hardware systems with standard design tools.

## 5.1 Introduction

Robust applications for space can be directly obtained with the use of special radiationhardened platforms offered by nowadays silicon technology foundries (cf. Section 2.1). They provide a wide range of transistor devices, IP cores, rich standard cell library sets and special PDKs aligned to fabrication process features to cope with radiation effects such as TID and SEEs (cf. Section 2.7). The alternative approach to obtain the design's robustness is Radiation-Hardening-by-Design (RHBD) with special circuit-, layout-, or design-level techniques using standard technologies.

However, countermeasures are necessary to cope radiation effects. Serious issues in CMOS technologies are hard-errors such as Single Event Latchup (SEL) – a high increase of the power supply current that can damage silicon devices. One approach for SEL-hardening is structural modification of the devices and cells. Less dangerous but more often occurring are voltage glitches on signal paths (i.e., SET) and bit-flips in storage cells (i.e., SEU). They are a result of direct energetic particle hits or captured transients. Both effects are non-destructive and classified as soft-errors. They can be mitigated with many different measures at the circuit-level.

There are various distinctive memory cell architectures proposed for SEU mitigation as presented in Section 3.7. Among them are the well-known DICE architecture, HIT flipflop, and Quatro storage cell. Moreover, delay-filtered DICE flip-flops and the temporal-DICE flip-flop are improved circuit concepts for additional SET mitigation. Furthermore, the TSPC-Quatro flip-flop, TSPC-DICE, and the Dual-Modular TSPC flip-flop are concepts with better power-delay performance.

Adding redundancy is one alternative solution to obtain robustness. The development of hardware redundancy requires less deep knowledge of cell design and benefits from the reuse of already existing unhardened components. The most popular approaches are the Built-in Soft Error Resilient (BISER) flip-flop or general Dual/Double Modular Redundancy (DMR) (cf. Section 3.8). Nevertheless, these approaches are capable to detect an error, however, they are not able to correct it.

Triple Modular Redundancy (TMR) is another hardware redundancy concept for single error correction. In TMR, the sensitive circuit part is triplicated and additional majority voter structures mask the error. The triplication can be applied to whole structures, i.e., for storage cells and combinational parts with multiple voters (FTMR). Another concept is a TMR flip-flop standard cell with integrated SET filter, known as  $\Delta$ TMR flipflops (cf. Section 3.8.4). It is a promising solution for RHBD circuits. However, these  $\Delta$ TMR flip-flops have a significant cell-related overhead in area, power, and additional delay. As a consequence, when TMR is the selected hardware redundancy concept, an additional library with several robust TMR cells in different flavor is highly preferable to reduce the penalty of the introduced overhead. Thus, a system's target requirements are more likely to be achieved.

Furthermore, the desired behavior of a single-error correction capability of one TMR flipflop implies a masking effect of structural defects of such an internally replicated flip-flop. If one considers TMR unit as a single flip-flop, direct utilization of the structural testing by scan-tests will not result in correct detection of internal defects. Therefore, for testing, it is a strong demand to enable individual scanning of each sequential subcell inside the TMR cell. As a consequence, TMR-based scannable flip-flops are desirable.

Another aspect is the reduction of the power consumption of a synchronous system when TMR is applied at memory cells such as flip-flops. Since the dynamic power in CMOS scales linearly with the clock frequency, putting the clock into a steady state, i.e., clock-gating (CG), is an effective solution to save power. However, when the clock is inactive, the TMR or the  $\Delta$ TMR registers are not able to refresh their data. As a consequence, induced upsets by radiation could accumulate over time. This has to be addressed by new  $\Delta$ TMR flip-flop architectures. Moreover, a concept for a robust clock-gate is simultaneously required to reliably turn-on/-off the clock.

Nevertheless, the  $\Delta$ TMR approach relies on a robust clock signal since the clock nodes are shared. Any transient in this clock network has to be suppressed in order to prevent internal MBUs of  $\Delta$ TMR cells. Consequently, there is a high need to provide special filter cells for this purpose. To summarize, an additional standard cell library with robust TMR flip-flops with different characteristics and features, robust clock-gates, and a selected set of essential driver and transient filter cells is mandatory in order to obtain RHBD circuits. Even though the robustness is achievable by redundancy at logical level, special spacing constraints at the layout-level have to be considered additionally to cope charge-sharing effect. Moreover, standard digital design tools require accurate timing/power models (.lib-files) to optimize designs more efficiently. Since the voter masks internal faults of a TMR flip-flop, a special setup for the characterization of  $\Delta$ TMR cells is necessary. Finally, a respective design methodology to obtain RHBD circuits with the use of the new cells together with an unhardened standard cell library has to be developed.

This work contributes with the following:

- 1. An additional standard cell library with essential gates for RHBD is proposed. Among them are transient filter cells, robust driver cells and RHBD memory cells.
- 2. A new modular  $\Delta$ TMR cell concept is proposed to obtain several variants with individual features and characteristics in terms of area and power efficiency, delay overhead, and design costs. Cell architectures are separated into four *Logical Sections*. Each section can be realized with the use of different baseline cells and modules to obtain cell candidates with best fitting characteristics.
- 3. To improve the testability of a design, a methodology for applying structural scan test and test pattern generation have been proposed for  $\Delta$ TMR flip-flops.
- 4. Reliable clock-gating can be obtained with the use of proposed  $\Delta$ TMR-based *clock-gate*. Moreover, novel  $\Delta$ TMR flip-flops with *self-correction* feature are presented.
- 5. The proposal for  $\Delta$ TMR-based cell characterization is additionally presented.
- 6. A *methodology* for the design of RHBD standard cell-based hardware systems using the new cell concepts is presented.



The contribution of this Thesis is illustrated in Figure 5.1.

Figure 5.1: Overview of the contribution to obtain robust RHBD circuits

## 5.2 Standard Cell Library

In this Section, the cell set and its circuit concepts for an additional standard cell library for the generation of RHBD hardware systems is proposed. The essential cell set for library extension is defined and the cell concepts are introduced. Among them are circuit solutions for transient filter and various novel RHBD- $\Delta$ TMR-based flip-flops with different functional features. The cells are separated in *Logical Sections*, whereas each section is investigated and compared individually to demonstrate and derive future alternative  $\Delta$ TMR-based flip-flop configurations. All experimental results are obtained by simulations of the implemented circuits in IHP's 130 nm BiCMOS technology. The selected standard cell library for concept realizations is SEL-free up to an LET of 67 MeV cm<sup>2</sup> mg<sup>-1</sup> (see also Section 2.1). Consequently, this semiconductor technology and the related unhardened but latchup-robust standard cell library are suitable for the validation of novel RHBD cell concepts.

#### 5.2.1 Additional Cell Set

The approach of this Thesis proposes the use of a unhardened standard cell library together with an additional cell set to develop robust standard cell-based hardware systems. Thus, an important question to be answered is the cell selection.

Despite of TID and SEL effects, digital designs that operate in radiation environments have to deal with two main types of induced effects (see Section 2.7.2). First, a particle hit can generate a transient pulse at an output node (SET). The transient propagates through the logic and could lead to functional errors. The second effect is the bit-flip in a memory device such as flip-flop, i.e., SEU, as a result of a captured transient or a direct particle hit.

Consequently, it is required to enhance the library with *transient filter cells* improving the robustness by transient mitigation. These cells can be added individually at nodes in the design to filter locally induced glitches. Moreover, critical nets such as clock, or asynchronous control signals are sensitive as well and require additional measures for SET mitigation. Thus, *robust driver cells* (inverters) and radiation-hardened SET filter configurations must be offered in the extra library.

Since memory cells such as flip-flops are most critical components in digital systems, there is a high demand to provide *robust memory cells*. A well-known concept is the  $\Delta$ TMR flip-flop architecture with improved robustness by SET mitigation parts on the data path (see Section 3.8.4). However, the introduced overhead by this hardware redundancy concept requires alternative circuit solutions to address the penalties in area, power and delay. Furthermore, a specific memory cell-internal MNCC-aware component placement is required improving the radiation-hardness by coping with the charge-sharing effect and the potential resulting multiple upsets.

Finally, concepts for the support of low-power clock-gating (CG) design feature is highly recommended to reduce the overall power consumption. Furthermore, concepts for ro-

bust *scan-flip-flops* have to be proposed in order to enable structural test for fabrication defects, i.e., design-for-testability.

The following subsections propose the cell concepts in detail, which comprise the additional cell set for a standard cell library to obtain robust RHBD hardware systems.

#### 5.2.2 Robust Driver Cells

Typically, standard cell libraries offer several types of buffers and inverters in different sizes and drive capabilities. In most cases, larger-dimensioned gates with higher driver strength are used for high fan-out net optimization or as driver cells during clock tree implementation. These cells occupy more silicon area and consume more energy. On the other side, they have a higher critical charge  $Q_{crit}$ , and are therefore, more robust against resulting transients by energetic particles strikes (see Section 3.4). As a consequence, these cells should either be part of the unhardened standard cell library or be offered in the additional cell library for RHBD.

As introduced in Section 3.4, the critical charge can be determined by analog simulations at transistor-level simulations. Based on the technology-related investigations for the selected 130 nm technology therein, an inverter with a driver strength of equal or greater than x8 is denoted as a "robust" driver cell. It provides a good tradeoff between area/power-penalty and robustness, especially compared to larger-scaled cells (see Figure 3.7). Thus, it is a suitable driver cell for critical nets in most applications.

#### 5.2.3 Transient Filter Cells

To address SET mitigation, special filter cells are required in order to eliminate glitches or to reduce the width of occurred transients on a signal path. Some most-relevant state-of-the-art concepts have been presented in Section 3.5. One approach is to delay a signal (d) by the amount of  $\delta$ -delay (d') and  $2 \times \delta$  (d"). The output is processed by majority voter with function q = MAJ(d, d', d'') (see scheme in Figure 3.8 on page 49). Another concept is the use of delay elements, and AND/OR gates in combination with a multiplexer (MUX). As already illustrated in Figure 3.12 on page 52 (a), the selector path of the MUX is realized in a feedback loop. However, delaying a signal is expensive in terms of power and area efficiency, and in many cases, delays are implemented as inverter cell chains.

An alternative concept for delay implementation on a signal is the use of dedicated *delay cells* of the standard cell library. In most cases, their internal arrangement is a customized inverter chain of standard inverters and inverter cells operating in weak inversion. Although an energy-related overhead still exists, this solution is more advantageous with respect to efficiency. These delay cells are directly optimized for their purpose: to generate longer delays up to several nanoseconds with low area occupation and power overhead in parallel.

With the use of two of these cells, a pure combinational standard cell transient filter with AND, OR gate is proposed. Any transient on a signal D is filtered in serial manner. The circuit scheme of this concept is depicted in Figure 5.2 (a). A derived optimization is the NAND-based variant of Figure 5.2 (b).



Figure 5.2: Proposed pure standard cell-based transient filter cells: (a) AND-OR-based, (b) the NAND-based optimization

Therein, the AND and OR gates are replaced by successive NAND-gates. This results in a slight improvement in terms of delay and a lower transistor count. Nevertheless, the limiting factor are the two  $\delta$ -delay stages in both approaches. Their size of  $\delta$  determines the transient-robustness on the one hand, but restricts the timing and power performance on the other hand. However, both concepts are suitable for filtering transients on lesscritical signals and do not require custom cell design. Figure 5.3 shows the waveform of the NAND-based transient filter solution (see Figure 5.2 (b)). As can be seen, two transients with a width around 300 ps are filtered.



Figure 5.3: Waveform of NAND-based transient filter in action

Transient filter cells are also added on highly-critical nets such as clock, or asynchronous control signals. As a consequence, the minimum transient filter cell set of an extra library should contain radiation-hardened configurations with higher critical charges in addition to the unhardened filter cells. Increasing the critical charge of the transistors leads to larger devices. When this is applied to both aforementioned solutions, it would result in more area and power overhead for the robust filter variants. In particular, the number of affected internal devices is quite huge for 2-input combinational gates. Hence, a robust implementation of the AND/OR/NAND-based variants would have a significant impact on the overall performance as well.

Consequently, GG-based circuit solutions as used in [92] and depicted in Figure 3.10 (a) are proposed as transient filters for critical net protection. Even though guard-gates (GGs) are less-frequently offered in a standard-cell library, they consist of fewer internal nodes and devices those critical charges have to be increased. For this approach, the

robust variant can be easily obtained by up-sizing only the driver stage(s), or by selection of another drive-type for the cell.

As a short summary, having all unhardened and robust variants for critical net protection available in an additional library, any net can be protected by the use of suitable transient filter cells. Furthermore, the AND-OR/NAND-based solutions presented here are also beneficial. In contrast to the GG solutions, they have the additional charm of effectively shortening transients instead of propagating them if arising transients exceed the implemented  $\delta$ -delay (see Figure 3.10 (b)). Thus, a truncated transient could be properly filtered by a subsequent transient filter cell in the signal path.

#### 5.2.4 Robust Memory Cells – $\Delta$ TMR Cells

SET mitigation is one challenge for RHBD circuits. The other important group of softerrors are SEUs – resulting bit-flips by radiation in memory cells. These upsets can be evaluated by subsequent connected logic and are either masked or lead directly to faults at the primary outputs. To address this, DMR or TMR-based hardware redundancy approaches applied to memory cells are suitable techniques (cf. Section 3.8). They do not require custom-designed flip-flop architectures. Instead, they consist of standard logic gates in these concepts with a small drawback in overhead. However, the penalty of triplication of sequential elements and combinational logic in FTMR is not negligible.

Alternatively, a baseline TMR arrangement of memory cells with single voter is preferred to correct one internal bit-flip (see Chapter 2 and 3). The default arrangement is not robust against SETs as stated in Section 2.8.1. In more detail, if a SET occurs on the data or clock net of such a TMR cell near to the sensitive clock edge, a multiple bit-upset inside the TMR cell may occur when clock is shared. Moreover, during the cells' layout design, the affected area of particle hits has to be considered to avoid simultaneous charge collection of multiple devices, i.e., charge-sharing or Multiple Node Charge Collection (MNCC). Consequently, additional TMR cell concepts for RHBD memory elements have to be developed addressing both, maintaining the implicit SEU-robustness of TMR and the SET mitigation, respectively.

The proposed cell concepts in this work are novel developments of the initial idea of the  $\Delta$ TMR standard cell flip-flops presented in [16] (see Section 3.8.4 and Figure 3.19). The concept therein is taken as an input to obtain further-improved and more robust  $\Delta$ TMR cells with different functional features. The  $\Delta$ TMR gates are characterized by a shared clock path and integrated transient filter for data path-related SET mitigation (D-SET). Moreover, the charge-sharing effect is addressed with a stricter MNCC-aware placement of internal components.

The concept of  $\Delta$ TMR cells implies the best compliance to the standard digital design flow. The pin interface, the internally signaling and the realized function are equal to their respective standard unhardened counterparts. Consequently, they can be handled as common components by digital design tools. Hence,  $\Delta$ TMR is selected as the RHBD concept for memory cells of any kind in this work.

#### 5.3 RHBD TMR Cells – Logical Sections

Wherever hardware redundancy is introduced, a low overhead in area occupation, propagation delay, and power/energy consumption are main concerns. These key criteria are conflicting to each other and require deeper investigation. With regard to the generation of VLSI designs with thousand to million times mapped gates, the individual characteristic of expensive TMR-based flip-flop cells is getting more and more important. As a consequence, a concept is preferable that divides complex flip-flops into sections and allows to derive alternative efficient TMR flip-flop solutions easily. Thus, each section can be investigated and optimized separately. In the best case, new robust TMR flip-flops can be developed by choosing a solution from each modular section to meet the target requirements such as area, delay, power, or function.

For that purpose, Logical Sections are also proposed in this work for the development of radiation-hardening-by-design  $\Delta$ TMR flip-flops at the gate-level. This divides complex memory cell architectures into different modules independent of their later layout representation. The general concept for a  $\Delta$ TMR memory cell is shown in Figure 5.4. Every RHBD- $\Delta$ TMR cell can be logically divided into four parts: a D-SET Filter Section, Memory Cell Section, Voter Section and a Driver Section.



Figure 5.4: General concept of logical sections of RHBD- $\Delta$ TMR cell

The first section includes the *D-SET Filter Section*, in which different concepts of transient filters on the data paths can be integrated. Their integration is indicated by the  $\Delta$ in the name of the radiation-hardened  $\Delta$ TMR cells. It is characterized by a 1-input/3output module. For simplification, the primary input is D with its local outputs D0, D1, and D2. The second section is the *Memory Cell Section* with the triplication of sensitive flip-flops or latches. The individual outputs of the first section are separately connected to the three internal memory cells. As illustrated in the figure, the clock node CK is shared by all three memory elements in this  $\Delta$ TMR concept, whereas the primary data path D is separated by the D-SET filter module. The output of each *baseline* memory cell is fed to the inputs of the *Voter Section* in a third stage. Here, the majority function is processed and the final result is passed to the last section. This is the optional *Driver* Section in which the internal TMR logic is decoupled from the connected external load.

By following this concept, each section can be individually realized and exchanged by various types of architectures depending on the target requirements. Some of the most promising configurations for each logical section are presented and compared in the next subsections.

#### 5.3.1 D-SET Filter Section

The D-SET Filter Section realizes the transient mitigation on the data path of the memory cells in a  $\Delta$ TMR arrangement. One of the most famous concepts is temporal redundancy on a single signal (see Section 2.9) by forking the input data signal (D) into individual branches, e.g., D0, D1, and D2, with a multiple of a delay ( $\delta$ -delay). The  $\delta$ -delay-size matches to the expected transient width. Consequently, when an edge arrives on the clock (CK) of a TMR memory cell, and the transient is shorter than the  $\delta$ -delay, the transient is mitigated by redundancy in time. In contrast, when the transient width exceeds the implemented  $\delta$ -delay, two faulty internal nodes are captured by the clock resulting in an internal upset. This concept applied to the D-SET Filter Section of a  $\Delta$ TMR memory cell is illustrated in Figure 5.5. The corresponding waveforms of a successful transient mitigation, and the error case have already been illustrated in Figure 3.11 (b) of Section 3.5.3 on page 51.



Figure 5.5: Transient filter concept applied to D-SET Filter Section

Even though, alternative approaches by the use of additional guard-gates, or explicit more complex AND-OR-MUX constructs have been introduced in Section 3.5, the first approach shown in Figure 5.5 is already a suitable solution for  $\Delta$ TMR flip-flops as presented in [16]. The transient filter length and the energy consumption scale with the number of internal delay elements used for  $\delta$ -delay realization. Moreover, the expected transient width mitigated by the  $\delta$ -delay is technology-dependent and defines the robustness against SETs. As a consequence, the more delay is required for mitigation of longer transients, the more cells need to be concatenated and internally routed. Thus, energy consumption and area occupation are again the main concerns and require detailed investigation. As an initial solution, the baseline approach depicted in Figure 5.5 is realized with inverter chains (cf. also Figure 3.11 (a)). This concept is denoted as DSET-D1D2 in this work. In order to address the aforementioned area and energy aspect, the usage of available dense standard delay cells is proposed alternatively (cf. also the scheme of Figure 3.19). These cells are optimized for longer delay generation with less silicon area occupation as introduced in Section 5.2.3. This second D-SET filter solution is denoted as DSET-D1D2-D in this work. However, the aforementioned D1D2-approaches require three  $\delta$ -delay element structures, either realized by inverter chains or dedicated delay cells. They have both a high impact energy consumption.

According to Section 3.5.3, an alternative configuration for TMR is the mitigation concept with GGs or C-elements in combination with  $\delta$ -delay element structures. Therein, the internal nodes D0, D1, D2 are protected by individual guard-gates to filter the transient. The main benefit of this solution is the reduction of the propagation delay and energy by one  $\delta$ -delay element in comparison to both DSET-D1D2 configurations. However, it is not required to suppress transients with guard-gates on all internal D-nodes. The smallest solution is a structure with one  $\delta$ -delay element for D1 and one GG for D2, whereas D0 is a feed-though from the primary D input [125]. Nevertheless, it is preferable to equalize the propagation delay in order to balance the timing windows for setup and hold time of the internal flip-flops additionally. As a consequence, a second GG is proposed for node D0 in this context. The resulting concept forms the third D-SET filter configuration (DSET-D1-GG) for  $\Delta$ TMR memory cells. The gate-level scheme is illustrated in Figure 5.6 (a).



Figure 5.6: Scheme and waveforms of the proposed DSET-D1-GG configuration: (a) gate-level scheme, (b) properly-filtered transient, (c) propagation of large transient to all three internal nodes, (d) one directly-induced internal transient

Figures 5.6 (b)-(d) show the waveforms of the behavior of the DSET-D1-GG in different scenarios. In the first one (b), the transient on D is only propagating to one internal node D1. In a TMR flip-flop arrangement, the second memory cell captures the faulty value of D1 (indicated by the dashed line), whereas the primary output Q (not shown here) would be corrected by the voter section afterwards. In the second waveform (c), the induced transient is wider than the  $\delta$ -delay of the D-SET filter section. Thus, the pulse is propagating and shifted by  $\delta$  in time to all internal D-nodes. It would be captured with the next rising edge of the shared clock CK. Consequently, an internal multiple-bit upset would occur. In the third scenario shown in Figure 5.6 (d), even though an SET is induced at node D1 (e.g., at the  $\delta$ -delay element), both nodes D0 and D2 would not be affected and would provide the correct value. As a result, a single bit error in the second flip-flop would be finally corrected by the majority voter.

In order to evaluate these different approaches, all three methods are implemented in the selected 130 nm standard technology and compared. The obtained results for area from layout views, and results from analog transient simulations are listed in Table 5.1. All numbers are normalized to the DSET-D1D2 solution with inverter chains. The  $\delta$ delay size is set to 500 ps for all three configurations based on the outcome of the SET characterization introduced in Section 3.4.

| Concept     | $\#\delta$ -delay | Normalized average |      |       | Min, Max.               |
|-------------|-------------------|--------------------|------|-------|-------------------------|
|             | elements          | Energy             | Area | Delay | <b>Path</b> $(D \to X)$ |
| DSET-D1D2   | 3                 | 1.00               | 1.00 | 1.00  | D0, D2                  |
| DSET-D1D2-D | 3                 | 0.20               | 0.20 | 1.00  | D0, D2                  |
| DSET-D1-GG  | 1                 | 0.08               | 0.15 | 0.55  | D1, D0/D2               |

Table 5.1: Comparison of different D-SET filter section configurations

As can be seen in the fourth column, all concepts have a direct impact on the timing performance of a final resulting RHBD- $\Delta$ TMR standard cell. The maximum delay of  $2 \times \delta = 1$  ns is already given by the architecture of both DSET-D1D2 configurations. This delay is determined by the maximum paths from  $D \rightarrow D2$ . Instead, the DSET-D1-GG solution with two guard-gates (see Figure 5.6) reduces the delay and energy overhead by half in comparison to the improved DSET-D1D2-D solution. Moreover, the largest penalty in terms of energy and area is given for DSET-D1D2, whereas the DSET-D1D2-D configuration with delay cells is already a quite attractive solution.

As a conclusion, three configurations for the D-SET Filter Section have been presented in this work. The key criterion such as area, delay or energy overhead are addressed by the different solutions. The best performance is obtained with the third concept DSET-D1-GG. However, additional design effort has to be considered, when the selected standard cell library does not offer guard-gates as standard cells for concept realization. Nevertheless, if overhead in delay is already introduced by the D-SET Filter Section stage, the next goal is to reduce or compensate this penalty by alternative suitable configurations of the next two sections, i.e., the *Memory Cell Section* and the *Voter Section*.

#### 5.3.2 Memory Cell Section

The second part of a  $\Delta$ TMR arrangement is the *Memory Cell Section*. It defines the baseline memory cells used for triplication in the TMR arrangement. These cells such as flip-flops, and latches are unhardened components and applied to the hardware-redundancy approach. The RHBD is obtained by a decisive logical scheme and an SEE-aware implementation of the proposed  $\Delta$ TMR cells. Contrarily to some other hardware-redundancy approaches with separated clock/delaying-the-clock signals for soft-error

mitigation (cf. Section 3.5.4), the concept of this work uses one shared clock signal for all internal memory cells.

However, since the sequential components are triplicated, any overhead of the unhardened baseline cell is affecting the area, delay, or energy performance trice in a final  $\Delta$ TMR cell. Therefore, it is worthwhile to explore some applicable circuit solutions for this second section of this logical scheme. In addition, while the D-SET filter section rather affects the setup and hold time margins of a memory cell, the total cell propagation delay of a complete RHBD- $\Delta$ TMR cell is mainly determined by the performance of the used memory cells together with its respective voter configuration. Reasonably and similarly as for the evaluation of the first section, the proposed concepts are implemented in the selected 130 nm standard technology. The obtained results are extracted from layout data and transistor-level simulations.

As a starting point, the baseline cell in the second section can be realized with a provided D-flip-flop of the unhardened library (i.e., DFF for comparison). In many cases, the circuit scheme is characterized by a highly-integrated internal master-slave D-latch-based structure, similarly as illustrated in Figure 3.5 in Section 3.3. About 30 MOS transistors are required for such a D-flip-flop with asynchronous reset function offered by the selected standard cell library.

An alternative composition is a full master-slave D-latch architecture with the use of stand-alone standard latch gates instead. This approach (LDFF) is beneficial when an unhardened library for the RHBD developments does not offer applicable D-flip-flops. Moreover, a decomposition gives access to the internal nodes between the master and slave stages. In a further proposal (LMDFF), the active latch output can be selected depending on the clock phase with an additional multiplexer cell. This introduces overhead at the logical level on the one side, but may improve the propagation delay in an optimized solution. Figure 5.7 illustrates some configurations for the memory cell section with single flip-flop and D-latch-based compositions.



Figure 5.7: Different D-flip-flop configurations with standard cell gates: (a) single D-flip-flop gate (DFF), (b) master-slave D-latch arrangements (LDFF), (c) decomposition with multiplexed D-latch outputs (LMDFF)

As can be seen in Figure 5.7, the single D-flip-flop of (a) is logically split in a decomposition to separated master and slave D-latch stages. The two clock phases are provided by either additional inverter cells or with the use of the opposite sensitive latch type counterparts (b), whereas an additional multiplexer selects the output in the LMDFF approach shown in Figure 5.7 (c).

Nevertheless, all presented circuits are limited by the performance of the selected memory standard cell gate. The concepts above utilize two phases of the clock signal and process many internal stages the signal has to propagate internally – that costs speed. Consequently, another circuit solution might be more beneficial for timing-critical designs with the use of high-speed memory cells as baseline components. This can be accomplished with the use of TSPC flip-flops (see Section 3.3) as alternative standard cell candidates for the memory cell section. Nonetheless, TSPC is a dynamic logic style and requires some additional restrictions for proper function. For the selected 130 nm technology, the limitations are clock transitions below 1 ns and a clock frequency above 50 MHz.

All four concepts are developed at gate and transistor-level and verified by simulation under nominal conditions. The obtained results are normalized to the standard unhardened flip-flop DFFand listed in Table 5.2. As can be seen, the average normalized cell delay of a non-inverting TSPC flip-flop (TDFF) is nearly halved compared to the singlegate D-flip-flop circuit style DFF. In a  $\Delta$ TMR arrangement, the internal inverter of the TDFF can be removed. Interestingly, the delay performance of TDFF and LMDFF is quite similar, whereas the energy consumption and area occupation are more than two times higher for LMDFF. However, it has to be additionally emphasized that all variants except the TSPC solution are equipped with asynchronous control functionality.

| Concept | Normalized average |      |                                   |  |
|---------|--------------------|------|-----------------------------------|--|
|         | Energy             | Area | <b>Delay</b> (CK $\rightarrow$ Q) |  |
| DFF     | 1.00               | 1.00 | 1.00                              |  |
| LDFF    | 1.68               | 1.47 | 1.04                              |  |
| LMDFF   | 1.47               | 1.71 | 0.58                              |  |
| TDFF    | 0.59               | 0.59 | 0.50                              |  |

Table 5.2: Comparison of different baseline memory cells proposed for triplication

As a conclusion, four different circuit solutions are proposed for the memory cell section of a complex RHBD- $\Delta$ TMR flip-flop. Depending on the targeted requirements, a suitable memory cell can be selected for triplication. All of them provide different characteristics in terms of energy, area, or delay overhead. The first three variants can be derived by pure standard cell-based design. Particularly, the latch-based configuration LDFF is advantageous if no suitable flip-flop is offered in the selected unhardened library. Nevertheless, the candidate with best performance is obviously the TSPC flipflop variant TDFF. However, a slight increase in cell-design effort has to be additionally considered in case when the TSPC flip-flop is not available.

#### 5.3.3 Voter Section

In the third section of the logical scheme, the individual outputs of the triplicated memory cells are post-processed by a single majority voter. Thus, one single error of the memory cells is masked. Similarly as for the previous sections, the propagation delay of the voter circuit mainly contributes to the overall gate delay of the  $\Delta$ TMR cell. Moreover, area occupation and energy overhead are also important aspects to consider. Consequently, it is worthwhile to provide a range of voter solutions to enable selection depending on the target requirements. Several standard cell-based majority voters have been introduced in Section 3.8.6. Among them are most popular AND-OR-gate-based structures, the XOR-multiplexer solution, and the the C-element/guard-gate-based concept illustrated in Figure 3.24 on page 62. However, all of them have to process the majority function  $Y_{\rm MAJ}$  of three inputs A, B, and C.

The most intuitive voter configuration realized with basic components can be derived from its Boolean equation listed in Eq. 5.1. It consists of 2-input AND and one or more OR-gates to correct (mask) one single error. The circuit scheme is illustrated in Figure 3.23 (a) on page 61. This solution is the first candidate of a voter set and denoted as V-AO in this work. An alternative NOR/OR-based configuration is denoted as V-NO.

Nevertheless, the use of AND and OR gates implies internal negation of signals leading to additional energy consumption and area occupation. As a consequence, an alternative voter configuration which omits the inverters leading to a shorter propagation delay and less silicon area in parallel is also beneficial for the voter set. A solution with lower gate count can be easily derived by optimization of the Boolean function of Eq. 5.1 to Eq. 5.2 with NAND operators only.

$$Y_{\rm MAJ}(A, B, C) = AB \lor BC \lor AC \tag{5.1}$$

$$= \overline{AB} \wedge \overline{BC} \wedge \overline{AC} \tag{5.2}$$

As can be derived from Eq. 5.2, three 2-input NAND gates and one 3-input NAND gate are required to realize the NAND-based voter configuration V-ND (see Figure 5.8 (a)). All these configurations can be fully implemented with the use of existing unhardened gates of the standard cell library. Nonetheless, another alternative to save more silicon area by reducing the number of transistor devices is the use of guard-gate inverters (GGIs) for a voter arrangement. The respective transistor scheme is illustrated in Figure 3.24, whereas the corresponding gate-level schemes for the inverting voter V-GGN, and the non-inverting solution V-GG are shown in Figure 5.8 (b), and (c) respectively.

Every GGI requires only four transistors for implementation. Thus, the inverted majorityvoted result is already obtained with only  $3 \times 4 = 12$  transistors. As a comparison, the same number of devices is already required in the V-ND configuration to calculate the intermediate results. Consequently, the GGI-based solutions are more area efficient.

For evaluation purpose, all four presented concepts are implemented in the 130 nm technology and examined by transistor-level simulations and layout view inspections.



Figure 5.8: Voter solutions: (a) NAND-gate variant V-ND, (b) inverting GG arrangement V-GGN, (c) V-GGN plus driver cell (V-GG)

The internal GGI cells have been additionally developed in order to realize the V-GG voter configuration. All extracted results are listed in Table 5.3 and normalized to the V-NO voter solution.

| Concept | Number of   | Normalized average |      |       |
|---------|-------------|--------------------|------|-------|
|         | Transistors | Energy             | Area | Delay |
| V-NO    | 30          | 1.00               | 1.00 | 1.00  |
| V-AO    | 30          | 0.96               | 0.87 | 0.88  |
| V-ND    | 18          | 0.47               | 0.57 | 0.47  |
| V-GG    | 14          | 0.37               | 0.52 | 0.58  |

Table 5.3: Comparison of different majority voter configurations

The results of the V-NO/-AO configurations in this table are based on successive 2input AND/OR gate implementations. As a consequence, these solutions have a larger propagation delay and higher energy consumption. They also require most of the silicon area in comparison. On the other side, the V-ND demonstrates a big improvement in all parameters. Furthermore, V-GG is characterized by a short propagation delay, the lowest energy consumption, as well as the lowest area occupation.

However, even though more configurations can be derived with standard cells, this short evaluation already demonstrates the performance trends of standard voters V-AO/-NO/-ND, to more custom guard-gate-based voter solution V-GG, which may require some additional design effort. Hence, different voter solutions can be selected from this tiny voter set to obtain applicable complex but efficient  $\Delta$ TMR cells depending on the target requirements.

#### 5.3.4 Driver Section

Finally, an optional *Driver Section* is proposed in the scheme in order to decouple the connected load of the  $\Delta$ TMR memory cell primary output Q from the internal voter output. This stage is either realized by an inverter or buffer depending on the desired logical output polarity. Thus, a net with a higher connected fan-out is directly driven by the  $\Delta$ TMR cell itself without undesired additional long wire segments. As early experiments have shown at place and route stage, this would increase the total capacitance

and resistance of the output net. Moreover, the section can also be made of driver cell chains in order to successively amplify the driver capability. Similarly, depending on the polarity at the voter section's output, this last section may already include the essential inverter of the voter (e.g., the inverter from V-GG).

Nonetheless, with all introduced and discussed details of the four logical sections above, complete hardened-by-design  $\Delta$ TMR memory cells can be obtained and easily derived with the use of available unhardened standard cell library gates.

## 5.4 Special RHBD TMR Cells

The concept of *Logical Sections* for TMR-based memory cells allows to develop further RHBD memory cells with the use of the offered cell set of an unhardened standard cell library. Any new  $\Delta$ TMR cell can be derived by selection of a configuration for each section. The result is always a RHBD- $\Delta$ TMR memory cell with flip-flop-based or latch-based functions.

However, one of the most desirable functional features for VLSI is the DfT support of scan-chain insertion and the respective ATPG. If this is supported, it allows the design tools to remap, e.g., flip-flops, to scannable counterparts and rearrange all flip-flops into shift registers (cf. Section 2.4.1). The resulting chains can be later programmed by decisive pattern to investigate a design for defects after fabrication. One straightforward solution for  $\Delta$ TMR flip-flops might be to select scan-flip-flops as baseline cells for the memory cell section as reported in [16] (see also Section 3.8.4). In this approach, the stored values of the internal flip-flops are masked by the voter and are not visible and accessible from the outside. As a consequence, the test patterns used to check the individual flip-flops for defects are not generated correctly.

Another aspect is the support of *clock-gating* (CG) low-power feature. In this case, the clock activity is stopped by remaining the clock signal in an inactive state. Consequently, all connected memory cells retain their last stored value. As a result, the dynamic part of the total power consumption is reduced to almost zero. Nevertheless, when clock-gating is applied to designs operating in radiation environments, SEUs would still accumulate over time and result in MBUs, even if robust RHBD- $\Delta$ TMR flip-flops had been used. When these faults occur in control parts that are evaluated by active (running) modules, it can put a system into a deadlock condition. Moreover, all internal flip-flops of a  $\Delta$ TMR cell are connected to the same, shared clock signal. Therefore, a robust clock-gating cell must additionally be able to handle SETs to prevent MBUs.

As a consequence, additional cell concepts are required to address both aspects, i.e., the scan-chain insertion and enable pattern generation, and the support of clock-gating for  $\Delta$ TMR designs. The DfT support is enabled with two  $\Delta$ TMR concept proposals for scannable flip-flops. Moreover, a robust  $\Delta$ TMR-based clock-gate architecture is proposed to enable the use of clock-gating feature in RHBD applications. In addition, three different approaches for self-correction with  $\Delta$ TMR-based flip-flops are presented in this Section. With these solutions, induced upsets within the  $\Delta$ TMR memory cells are corrected while clock-gating is enabled.

#### 5.4.1 RHBD TMR Scan-Flip-Flops

The use of scannable flip-flops arranged in chains is one DfT design technique to improve the testability of a design by scan-test (see Section 2.4.1). The presented concept in [16] for  $\Delta$ TMR flip-flops always provides a majority-voted output independent of the configured mode. Thus, any defect within an internal flip-flop of the memory cell section is hidden and masked by the voter (see also Figure 3.20 on page 59). Hence, it is mandatory to provide the access to the internal memory cells of the  $\Delta$ TMR gate and to enable the corresponding ATPG.

Two concepts are proposed in this Section to realize scannable RHBD- $\Delta$ TMR flip-flops which address exactly this issue. The first concept illustrated in Figure 5.9 (a) is denoted as S- $\Delta$ TMR-I in this work. This solution is similar to the general concept for  $\Delta$ TMR flipflops. As can be seen, a transient filter is only applied to the data signal in the first section. The memory cell section is realized by triplication of scannable unhardened D-flip-flop standard cells. The individual scan-enable (SE) connections are shared to a primary SE input.



Figure 5.9: Concepts for scannable  $\Delta$ TMR flip-flops with highlighted internal scanpaths: (a) S- $\Delta$ TMR-I with scannable D-flip-flops, (b) S- $\Delta$ TMR-II with the use of standard D-flip-flops and D-SET filter section with multiplexed inputs

Furthermore, an internal scan chain is arranged starting from the primary scan-data input SD, through the first memory cell to the SD input of second memory cell and from the second one to SD pin of the third flip-flop, respectively. However, the hold time requirement for the second and third scan-flip-flop must be met which may lead to additional delay elements as indicated by the dashed boxes in the figure. Moreover, a special scan-out pin SO is proposed to provide the test output in scan-test mode separately. This can optionally extended with an AND-gate controlled by the global SE signal. Thus, the signal propagation can be activated in scan-mode (SE=1) and gated in

normal operating mode (SE=0) in order to save energy. Otherwise, the SO port and the fan-out (e.g., buffers) connected to it would switch in normal function mode on every clock cycle and consume power unnecessarily. However, this approach is only a suitable solution when the unhardened standard cell library offers scannable D-flip-flops.

The second concept S- $\Delta$ TMR-II depicted in Figure 5.9 (b) is exactly applicable when a library does not provide this kind of memory cells. In this case, the triplication of standard D-flip-flops in the memory cell section together with an integrated DSET-D1D2-D transient filter section realizes the similar function. Therein, the input data to the flip-flops is controlled by the primary SE input. Moreover, the internal 3-bit scan-chain passes all delay elements of the D-SET filter section. As a consequence, the hold time of the last two flip-flops is already maintained and no insertion of extra delay elements is required.

As already mentioned, both proposed concepts consist of unhardened standard components. In the first concept, the D-SET Filter Section is fully decoupled from the memory cell function. Consequently, alternative D-SET filter configurations with smaller insertion delay or area-benefit are also applicable to obtain improved S- $\Delta$ TMR-I variants. On the other side, the SE signal has to be treated as a critical net and requires a robust implementation when this concept is selected. Contrarily, any transient on SE is immediately filtered by the integrated D-SET Filter Section of the S- $\Delta$ TMR-II approach. Moreover, alternative guard-gate-based transient filter configurations are readily applicable to this concept.

Nevertheless, both proposals enable the scan-insertion for circuits with robust  $\Delta$ TMR cells. Access to the internal flip-flops is possible during the scan-test, which increases the test coverage of a design. Their usage and the respective pattern generation are later described in Section 5.7.3. Finally, the concepts and the methodology have resulted in a pending patent enrollment [126], with the application of the  $\Delta$ TMR concept and circuit realization being a contribution of this work.

#### 5.4.2 RHBD TMR Clock-Gating Cells

Since the dynamic power is proportional to the frequency (see Section 2.4.2), to switchoff the clock activity is one beneficial technique to improve the designs overall power consumption. With the use of clock-gating standard cell gates, i.e., clock-gates, one is able to stop and release the clock signals safely. However, when reliability is increased by additional hardware redundancy concepts such as TMR, the power performance of a design is getting more important. The overhead is significant for the register triplication and a more complex clock network. Therefore, the use of clock-gating in TMR-based designs is beneficial. On the other side, the robustness against radiation must be considered in parallel. This is essential for the design methodology of this work.

The proposal for RHBD- $\Delta$ TMR integrated clock-gating cells is aligned to the introduced *Logical Section* scheme. In more detail, the memory cell section can be realized by triplication of existing unhardened CG-cells. Alternatively, a modular, e.g., latch-AND-gate

arrangement realizes the same functionality. The concept is illustrated in Figure 5.10. For a robust  $\Delta$ TMR solution, the enable control signal EN is shared by all three internal clock-gates/-modules. It is connected to an SET filter module to mitigate transients on this specific path. Following the  $\Delta$ TMR concept, the clock signal is globally shared. Thus, these signals are highly-critical nets. In an application, they have to be robustly implemented to cope SETs providing a reliable, glitch-free clock signal. Otherwise, when a transient occurs on an intentionally-gated clock signal, the connected flip-flops of an adjacent active clock domain could capture the intermediate (corrupted) data.



Figure 5.10: Logical scheme of the proposed RHBD- $\Delta$ TMR CG-cell

Furthermore, any occurrence of transients inside the voter or driver logic of the clockgate has to be suppressed to prevent multiple-bit upsets in connected memory cells. Thus, the used standard cells for the third and fourth logical sections should provide enough critical charge  $Q_{crit}$  to maintain a fault-free clock propagation. Obviously, this is in contradiction with a power and area-efficient design solution.

As a compromise, an inverting robustly-designed GG-based voter (V-GG) is proposed. This voter solution is advantageous due to its lower transistor count. It consumes the least amount of energy and area compared to other configurations, as discussed in Section 5.3.3. Consequently, the V-GG arrangement is the best candidate for a robustly-designed voter solution. The internally-inverted majority-voted node QN is generated by inparallel connected guard-gate inverters (GGI). The primary output Q is processed by a robust INVX inverter cell interfacing the QN signal. Fortunately, the critical charge  $Q_{crit}$  of this node is further increased and improves the robustness against SETs.

As can be seen in the figure, the proposed robust  $\Delta$ TMR clock-gate is compliant to the standard digital design flow. Its pin interface and function enable the automatic insertion of robust clock-gating cells during logic synthesis step. As a result, clock signals can be switched-off safely and the power consumption of RHBD- $\Delta$ TMR-based applications can be significantly reduced.

#### 5.4.3 Self-Correcting TMR Flip-Flops

The concept presented in the previous section enables the robust clock-gating feature for designs with hardware redundancy to save power. Nevertheless, memory cells such as D-flip-flops are edge-sensitive gates. They update their internally stored data with a rising or falling edge. When a clock signal is gated, the activity is stopped and the data remains stable. However, when systems are exposed to radiation, induced upsets are not recoverable. Even though a system is realized with TMR, the trigger signal for correction or refreshing the data is switched-off. As a result, logical errors would accumulate and lead to MBUs after a certain amount of time. In this context, it is preferable that memory cells are capable to self-correct their stored value when the clock signal is gated.

Several state-of-the-art concepts have been introduced in Section 3.8.5. However, this work is focusing on RHBD- $\Delta$ TMR-based single standard cell gates. Consequently, other circuit solutions are required which are applicable to the  $\Delta$ TMR standard cell concept. Regarding this, three different special self-correcting (SC) TMR-based flip-flop architectures are proposed in this Section.

Modern standard cell libraries provide different kinds of flip-flops with distinctive functional features such as scan-test support, data-retention, or e.g., initialization capability. Furthermore, some memory cells provide both, asynchronous reset and set function. For the first concept of self-correcting  $\Delta$ TMR flip-flops, this asynchronous control function is utilized to correct internal errors inside the TMR gate. The concept is denoted as SC-S- $\Delta$ TMR and follows the presented logical scheme of the first scannable RHBD- $\Delta$ TMR flip-flop approach (S- $\Delta$ TMR-I) introduced in Section 5.4.1. The block scheme of the first self-correcting variant without D-SET Filter Section is illustrated in Figure 5.11.



Figure 5.11: Block scheme of the first self-correction concept SC-S- $\Delta$ TMR

As can be seen in the figure, the extension of the baseline  $\Delta$ TMR is a feedback, starting from the voter section to an additional comparator module (SC\_CTRL). This module

generates the individual asynchronous reset and set control signals of the internal flipflops according to the level of the voted signal Q. However, transients in this feedback need to be suppressed as they would lead to a simultaneous triple-upset otherwise. Thus, the feedback loops are equipped with transient filter with distributed GGs for transient mitigation. The self-correction itself is enabled in normal operating mode while the clock phase is low, i.e., when SE=0 and CK=0. The three SN signals correct the individual flip-flops to logical one, while the RN signals are used to reset the cells to logical zero. Figure 5.12 shows the waveforms of an analog transient simulation as an example.



Figure 5.12: Transistor-level simulation with activated self-correction mode

The SC-S- $\Delta$ TMR concept has one main advantage. The use of scannable flip-flops as baseline cells allows to test the self-correction functionality by programming the internal scan chain of one SC-S- $\Delta$ TMR cell with unequal pattern triples. For instance, when scan-mode is activated (SE=1), a pattern representing an internal fault, e.g., '0,0,1' can be written into the TMR cell with three clock cycles. Afterwards, the SE signal is put to logical zero with SE=0 to enable the functional mode. Thus, the self-correction is executed starting with the next low phase of the clock signal CK. The third flip-flop (stored '1') is then reinitialized to '0' by the SC\_CTRL module. Thereafter, the scanmode is reactivated again. The serial readout data after three clock cycles is now '0,0,0' demonstrating a successful self-correction inside the  $\Delta$ TMR flip-flop.

However, the design overhead for the controller and the transient filter in the feedback is not negligible. As a teaser, internal sensitive nodes are collected to logically-connected groups (LCGs) to allow for more robust placement by spacing between sensitive regions, i.e., MNCC-aware layout design in this work (see Section 5.6). This leads to twice the area of the standard cell, as initial experiments in the selected 130 nm technology have shown. Nevertheless, this first concept can be realized with low design overhead to obtain RHBD self-correcting  $\Delta$ TMR D-flip-flops.

The second approach addresses the area penalty of the first proposal. Instead of using the asynchronous control function of selected memory cells, a minor custom modification of the original baseline flip-flop circuit scheme is proposed to enable self-correction. The main idea is to modify the baseline slave latches and to create a majority-voted feedback which is enabled when clock phase is low. This concept is denoted as SC- $\Delta$ TMR-V in this work. In more detail, the latch-hold function is formed together with the voter as part of the feedback loops. The self-correction itself is a result of the voted signal provided as an input for the open slave latches. However, transients may occur and propagate through the internal feedback loops. Without modification, any single transient might lead to MBUs of such a self-correcting  $\Delta$ TMR flip-flop. Therefore, a second integrated SET filter block (FB-SET) is proposed for the feedback signals. The SC- $\Delta$ TMR-V concept without D-SET Filter Section is illustrated in Figure 5.13 (a) and (b). It shows the open slave latches of the baseline flip-flops (O\_FF) and the TMR arrangement with the FB-SET transient filter.



Figure 5.13: Block scheme of SC- $\Delta$ TMR-V concept: (a) open slave latch of the modified baseline flip-flops (O\_FF), (b) TMR arrangement with additional FB-SET module

The dashed lines inside the flip-flop blocks of Figure 5.13 (b) mark the separation border of the master and open slave latch. The interface pins of the O\_FF components are the individual Q outputs and the three QI feedback inputs, i.e., the provided output signals by the FB-SET transient filter. Moreover, the primary output Q is derived from the third transient filter output FBD2 in this arrangement. Nevertheless, from functional point of view, when the clock phase is low, the feedback loop is closed. If an SEU occurs in one flip-flop, the upset will be self-corrected by the feedback loop with its majority-voted signal.

In comparison to the first solution, the proposed SC- $\Delta$ TMR-V concept is advantageous since only slight changes on the baseline memory cell elements are required. There is no need to add extra control logic for self-correction. However, the use of a second SET filter in the feedback occupies additional silicon area and consumes more energy in comparison to a non-self-correcting standard RHBD- $\Delta$ TMR flip-flop. Moreover, an increase in propagation delay due to the selected  $\delta$ -delay for the internal transient filter has to be considered in later design usage. This issue is addressed with the next concept for self-correction in  $\Delta$ TMR flip-flops. As introduced in Section 3.8.5, the use of open slave latches and a voter logic realized with distributed GGs is used for multi-bit flip-flops (see also Figure 3.22). Distributed GGs do not require additional transient filter. Thus, this concept is therefore a suitable self-correction solution with shorter propagation delays. Nevertheless, single self-correcting  $\Delta$ TMR flip-flop gates are targeted and require some modifications.

This third concept is denoted as  $SC-\Delta TMR$ -GG in this work. The respective circuit and block-level schemes are illustrated in Figure 5.14. As can be seen, every opened latch of the baseline flip-flops consists of a three-input clock-controlled GG. It opens or closes the feedback loop depending on the clock phase. The feedback data A is passed-through when the loop is closed while CN=0 and CB=1, whereas the feedback is put into high impedance while the master stage is driving node A (when CN=1 and CB=0). Moreover, each slave latch consists of an output GGI cell, which implements the first half of a GGbased voter (V\_GG) (see GG in Figure 5.14 (a) in comparison to Figure 3.24 on page 62). This module processes two values in parallel. Its output QI is generated by the data to be latched (A) and the output data (B) of another flip-flop of the TMR arrangement. By triplication as shown in Figure 5.14 (b), shorting the QI node of all three flip-flops forms the inverted majority-voted output signal. Simultaneously, the inverted data of node A is forwarded for self-correction purpose at the SC outputs. These signals are interfacing the SCY/SCZ inputs of the other flip-flops in the TMR arrangement. As a result, every flip-flop stores (or corrects) its data depending of the value of both other flip-flops of the TMR. Finally, the inverted QI signal is driven by an inverter stage to provide the primary output Q.



Figure 5.14: Circuit and block-level scheme of the SC- $\Delta$ TMR-GG approach: (a) internal architecture of a single open latch (the second stage of the open flip-flop with GG-outputs (O\_FFG)), (b) self-correcting TMR flip-flop without D-SET Filter Section

Even though this concept for self-correcting TMR flip-flops requires more effort in cell design, no second SET filter is used for transient mitigation, and therefore, the propagation delay is significantly reduced. Similarly, the area and power overhead is not affected. In particular, the use of GGIs for calculation of two different signals (QI/SC) additionally saves some resources.

Table 5.4 gives an overview of the effort for the three presented self-correction concepts. The impact on the design's costs due to modifications, the overall overhead (in power, area, and delay), and the support of special features are indicated. The first concept reuses the asynchronous control function and supports scan-test feature. The second one uses an integrated voter with a slight modification of the baseline memory cells. Both concepts require a second SET filter to mitigate transients inside the self-correction feedback. The third concept also requires modification of the baseline cells. It utilizes the use of distributed guard-gates for self-correction acting as voters. Since no additional SET filter is necessary, this solution has the lowest increase in area, power, and delay. Nevertheless, an increase in design effort has still to be considered.

| Concept             | No. of SET | Design           | Overall  | Feature          |
|---------------------|------------|------------------|----------|------------------|
|                     | Filter     | $\mathbf{Costs}$ | Overhead |                  |
| $SC-S-\Delta TMR$   | 2          | low              | high     | scan-test        |
| $SC-\Delta TMR-V$   | 2          | med              | med      | _                |
| SC- $\Delta$ TMR-GG | 1          | high             | low      | short cell delay |

Table 5.4: Comparison of the proposed self-correction approaches

However, all proposed concepts are applicable to the  $\Delta$ TMR approach in order to obtain RHBD- $\Delta$ TMR memory cells. The self-correction is always activated when the clock phase is low. Furthermore, these concepts support clock-gating feature in order to save more power of a design while maintaining stable, correct data under radiation. Finally, the SC- $\Delta$ TMR-V and SC- $\Delta$ TMR-GG concepts are applied for patent applications (EP3965294A1/US20220076718A1, and EP201946845) as introduced in Section 1.4.

## 5.5 Characterization of RHBD TMR Cells

The novel  $\Delta$ TMR cells are complex gates and require a proper standard cell characterization (CZ) to provide accurate timing and power values for design generation. The CZ has been introduced in advance in Section 2.2.3 on page 15. One of the resulting views of CZ are .lib-files, which contain the cell-specific information such as function, area, power, and timing. However,  $\Delta$ TMR memory cells are equipped with D-SET Filter Section, Memory Cell Section, and a Voter and Driver Section, respectively. Moreover, the voter masks internal errors, which are important to identify the accurate setup and hold windows from characterization. As a consequence, some additional effort is required when complex  $\Delta$ TMR cells have to be modeled.

First of all, the data path D is internally separated by the integrated D-SET filter in a  $\Delta$ TMR arrangement (see Figure 5.4). In a typical CZ-setup for flip-flops, all primary pins such as data D, reset RN, clock CK, and output Q are selected points of interest for probing and result validation purpose. During timing check characterization (e.g., setup/hold), the stimuli transitions of the D and CK pins are shifted in time closer to each other, whereas the output probe Q is measured and compared to the expected value. An offset  $\Delta$ t<sub>s</sub> between both input transitions is continuously reduced as long

as the measured value is equal to the expected one. If the measured value is different from the predicted value, a timing violation, e.g., setup violation has occurred. In this case, the previous offset is added in the LUT of the .lib-file as a constraint for the setup window of the flip-flop.

However, when this setup is applied to setup and hold window characterization for  $\Delta$ TMR memory cells and the voter masks such internal violation, the primary output Q (the main probe) remains correct. As a result, the flip-flop is improperly modeled, and such a wrongly-characterized cell could be stuck in an internal permanent hidden timing violation in a real application.

The scenario is illustrated in Figure 5.15 for a  $\Delta$ TMR flip-flop. As can be seen, the transition of the adjusted edge of data input D (here D0 of a DSET-D1D2 filter) is propagating. This signal arrives at the data input of the third flip-flop (D2) after the rising clock edge of CK has arrived. Thus, an internal setup violation has occurred and Q2 remains logical zero (highlighted in red). However, the voter already updates its value since two out of three flip-flops have captured equal values ( $Y_{MAJ}(A, B, C)$ ). Thus, no internal timing check violation is seen on the primary output Q, if specified as a probe node during setup/hold-CZ. The timing check is measured poorly and the CZ-result of the flip-flop is incorrect.



Figure 5.15: Hidden timing violation of the third flip-flop during CZ

As a conclusion, the timing check setup with its probe definitions have to be set according to the selected D-SET filter configuration for CZ of  $\Delta$ TMR cells. With regard to the proposed DSET-D1D2 filter architectures compared in Table 5.1 on page 105, the timing windows for these configurations can be expressed as:

$$t_{setup}(\Delta TMR_{DSET-D1D2}) \approx max(t_{setup}(ff_i)) \approx t_{setup}(ff_2)$$
(5.3)

$$t_{hold}(\Delta TMR_{DSET-D1D2}) \approx \max(t_{hold}(ff_i)) \approx t_{hold}(ff_0)$$
(5.4)

, where i = 0, 1, 2 indicates the individual flip-flop of the triplication in TMR.

Similarly, when the D-SET filter is realized with a DSET-D1-GG solution (see Figure 5.6 on page 104), the timing windows are given by:

$$t_{setup}(\Delta TMR_{DSET-D1-GG}) \approx max(t_{setup}(ff_i)) \approx t_{setup}(ff_{0,2})$$
 (5.5)

$$t_{hold}(\Delta TMR_{DSET-D1-GG}) \approx max(t_{hold}(ff_i)) \approx t_{hold}(ff_1)$$
 (5.6)

Considering these relations given in Eq. 5.3 – 5.6, timing windows of  $\Delta$ TMR cells can be properly derived during setup and hold window characterization. The LUTs are generated and exported in the .lib-file. Thus, any internal timing violation is not hidden, and therefore, considered as during STA at logic synthesis or place and route stage. As a result, a design generation with accurate timing behavior and correct constraining of RHBD circuits with  $\Delta$ TMR memory cells is enabled.

## 5.6 Layout Aspects for RHBD TMR Cells

The modification at gate or transistor-level is essential to cope with SEEs by obtaining RHBD- $\Delta$ TMR cells. Whereas the SEU is logically mitigated by triplication as TMR, SETs on the data path are filtered by internal D-SET filter modules in a  $\Delta$ TMR arrangement. In particular, the robustness which is given-by-design at logical level requires a similar radiation-aware implementation at layout-level due to the charge-sharing effect between adjacent nodes. First, a simultaneous bit-flip of two memory cells in different adjacent rows may occur when enough particle energy is deposited between. Second, when the memory cells are placed in the same standard cell row, an MBU may be generated as well. As a consequence, well-known measures such as component spacing and nodal separation can be applied to address these issues (cf. Section 2.7.5). With respect to  $\Delta$ TMR-based standard cell layouts, this kind of measure is denoted in this work as a Multiple Node Charge Collection-aware (MNCC-aware) layout design.

However, the increased complexity of  $\Delta$ TMR memory cells with different functional features and architectures demand an equally robust layout implementation to obtain RHBD circuits. Moreover, challenges with respect to placeability and routeability of larger-sized  $\Delta$ TMR standard cell gates is one additional concern. Another limitation is, that the layout frames of unhardened cells and the special robust  $\Delta$ TMR gates have to be compliant to each other to enhance their usability and applicability in later RHBD designs. In this case, the cell layouts can be freely arranged as one-row or multiple-row frames as long as the compliance to the technology and to the unhardened standard cell library is maintained in terms of placement and routing grid definitions. Finally and as mentioned before, the components of the RHBD TMR memory cells have to be internally placed in MNCC-aware manner to preserve the targeted SEE-robustness. To ensure this, two spacings are proposed for  $\Delta$ TMR cells:

- 1. A global minimum memory cell spacing ( $\Delta x_{\rm MC}$ ), which is defined by the edge-toedge distance of two sequential, logically-connected memory cells
- 2. An additional section spacing ( $\Delta x_{sec}$ ) between groups/modules of sensitive nodes, i.e., Logically-Connected Groups (LCGs) at circuit or gate-level

The members of one LCG can be easily derived by investigation of the *Logical Section* scheme in horizontal manner. Every memory cell is added into one group with all D-SET filter components of the same data path. Moreover, parts of the voter solution can also be distributed into the three different LCGs, e.g., the GGs of V-GG voter. Thus, a resulting 1-row layout frame of a general RHBD- $\Delta$ TMR cell as illustrated in Figure 5.16 can be obtained.



Figure 5.16: General cell frame of a RHBD- $\Delta$ TMR flip-flop with spacing

As can be seen therein, all memory cells are distributed in the frame with a global minimum  $\Delta x_{\rm MC}$  spacing. The logic gates of each LCG are placed together in their respective region as indicated by the highlighted rectangles in orange. Moreover, the LCGs are separated by  $\Delta x_{\rm sec}$  in X direction. Obviously, the section spacing is equal for all regions, whereas the memory cell spacing is individual for the flip-flop groups. Any unused area between active logical elements, i.e., *Spare Area* in this work, can be preserved for standard filler gates in order to achieve the required metal density and maintain the technology-related design rules. Alternatively, the insertion of decoupling cells is also applicable in order to stabilize the power and ground nets.

Furthermore, RHBD- $\Delta$ TMR cells imply a more complicated routing solution due to the memory cell triplication of TMR, the voter, and the integrated transient filter respectively. Thus, upper metal layers are proposed for the pin-shape definitions. This enhances the final pin-access, especially when crossing signal routes from one LCG to another LCG above the spare area are routed with the lowest possible metal layer. Consequently, this approach saves free routing channels for top-level signal routes at upper design hierarchy. In addition, it is proposed to route the internal signals of  $\Delta$ TMR gates mainly with the use of the first two or three routing tracks and inside a box defined by the second and n-1 routing track in each direction (see dashed lines in the figure). Moreover, extra-defined routing blockages around the pins and cut-outs (not shown in the figure) also enhance direct routes and simplify the pin-access. As a result, the placement of  $\Delta$ TMR cells below mandatory, vertical PDN stripes is also supported and enhanced in later application. Without these modifications, the metal enclosure of the vias of the stripes would overlap with internal cell routing, which would lead to illegal placement locations of  $\Delta$ TMR cells in worst case.

By following these proposals, compliant and ready-to-use cell layouts of complex  $\Delta$ TMR cells can be obtained and other multi-row layouts can be derived. Their handling is enhanced by special routing layer usage, blockages and cut-out definitions. The MNCC-aware component placement is addressed with two proposed spacings for memory cells

and logical groups in order to maintain the SEE robustness. The spacings can be chosen according to the target robustness. Thus, the layouts of these hardened  $\Delta$ TMR gates can be used together with the respective unhardened standard cell library for layout generation of complex RHBD hardware systems.

## 5.7 Design Methodology for Radiation-Hardening-by-Design

The circuit proposals for robust RHBD- $\Delta$ TMR flip-flops address SEEs mitigation only at cell level. However, target applications are robust VLSI hardware systems using a conventional design methodology with a standard cell-based design approach. Even though the compliance to the standard design flow of new cells is enabled by the modeling concepts, additional effort is required at methodology level. Particularly, maintaining the robustness in complex hardware systems, and enabling the test pattern generation for  $\Delta$ TMR designs are quite challenging. This Section discusses the handling of critical nets, the challenges at the logic synthesis level to keep the design's robustness. Furthermore, flows for test pattern generation and the effort at place and route stage for the development of RHBD circuits are presented.

#### 5.7.1 Handling of Critical Nets

Clocks and asynchronous control signals are critical shared nets in the  $\Delta$ TMR concept. Inside a  $\Delta$ TMR cell, they are implemented without any hardening and are therefore sensitive to SETs. Moreover, any additionally inserted unhardened logic on these signal paths is a potential SET victim, in which transients could be easier induced by highenergetic particle hits. As an example, when unhardened bypass logic is integrated on the global asynchronous reset, e.g., controlled by test-mode for scan-test, and a transient is generated inside, the resulting glitch could propagate through the connected reset network and lead to MBUs. A similar scenario which could result in MBUs is when unhardened multiplexer logic is connected to the clock network. As a consequence, when logic is required on these signals, it is proposed to treat the resulting connections and instances as *ideal, untouchable* objects at logic synthesis stage. Their detailed implementation and hardening are postponed to the PnR stage.

A second challenge are multi-clock designs in RHBD circuits. Digital counters are a sufficient circuit solution to generate several rational clocks for a design. The output clocks are often realized by flip-flops and the divided clock signals are distributed on clock paths forming various clock domains. Although  $\Delta$ TMR cells can be used in this application, divided clocks are processed by the unhardened voter section. Hence, potential SET-sensitive combinational logic remains on these clock paths. As a consequence, it is proposed to transform multi-clock domain designs to single clock domain designs whenever applicable with the use of synchronous enable signals. Thus, all previous rational clocks are activated by multi-cycle synchronous control signals in the same clock domain. Moreover, the combinational logic for the enable function is part of the data path, which is protected by the integrated *D-SET Filter Section* of the  $\Delta$ TMR flip-flops.

As a result, when the clock tree of this single clock domain is robustly implemented at PnR stage, the circuit is less vulnerable against transients. However, the impact on power consumption has to be considered during the design phase, which is not explicitly addressed in this work.

Finally, a continuously running system clock is preferable. With every new clock cycle, potential upsets are corrected inside the  $\Delta$ TMR cells by rewriting. Thus, fault accumulation over time is interrupted by refreshing the register data. Nevertheless, when continuously running clocks are not applicable to an application, the use of self-correcting  $\Delta$ TMR flip-flops presented in this work (see Section 5.4.3) are an alternative solution.

#### 5.7.2 Gate-Level Synthesis

The design methodology for RHBD circuits is aligned to the standard digital design approach. An unhardened standard cell library and an additional library which offers the new RHBD cells are linked together to the logic synthesis tool. Both libraries must be robust against SEL effect. The RHBD-library includes the extra baseline/robust guardgate inverters, buffers, and transient filter cells, and novel  $\Delta$ TMR cells with different features such as, clock-gate function, scan-functionality or self-correction (cf. Section 5.2 - 5.4). Moreover, in order to prevent a mapping to unhardened memory cells, it is proposed to exclude them during this design task.

Each behavioral description of a memory cell in the RTL design is directly mapped to a RHBD- $\Delta$ TMR cell during GLS step. As a result, the design is immediately immune against SEUs at cell level. When the scan feature is desired, the mapping to scannable  $\Delta$ TMR memory cells can be enabled. As mentioned in the previous paragraph, critical nets such as clock and asynchronous control signals require special handling for a robust implementation. They are declared as *ideal untouchable* signals at this stage to avoid any optimization of these paths with unhardened, SET-sensitive logic. On the other side, if combinational logic exists on these nets, additional robust SET filters can be added in order to suppress upcoming transients. In particular, it is proposed to protect critical nets with additional RHBD SET filter cells at input side directly behind the IO-pad electronic. Similarly, transient filter insertion for protection of domain crossing signals, e.g., from digital to an analog domain (D2A, or A2D) is also proposed. Furthermore, transients on the input signals from an analog domain are filtered by the integrated D-SET filter of the  $\Delta$ TMR cells in the digital domain. The critical output signals to the analog domain can be additionally protected with the use of special transient filter cells as driver cells. However, the filter insertion can also be achieved in advance by direct instantiation at RTL level or by logic synthesis tool-specific TCL commands.

If hold time violation fix is required at this design stage, no SET-sensitive cells should be selected for this purpose. For example, if the logic paths are additionally buffered with special delay cells (see also Section 5.2.3) to meet the hold time requirement efficiently, induced transients could be broadened above the implemented D-SET filter size ( $\delta$ ) of a  $\Delta$ TMR flip-flop. Thus, the pulses cannot be filtered properly and lead to upsets inside

the  $\Delta$ TMR flip-flops. As a consequence, buffer cells for hold time optimization or delay generation have to be selected as such, their concatenation does not widen the pulse and exceeds the maximum specified transient width  $\delta$ . However, the definition a suitable cell set requires a deeper investigation of the logical chains on cell and simulation level not covered within the scope of this work.

Following these strategies, a netlist mapped to a standard technology is generated. The design consists of robust  $\Delta$ TMR memory cells and special transient filter cells. The netlist is partially radiation-hardened-by-design. It still contains *ideal* nets and *dont\_touch* cells, those paths have to be robustly implemented at PnR stage to obtain a full robust implementation. Hence, it is proposed to export an explicit list of all these objects, and excluded cells (e.g., delay cells) by scripting to an additional design constraint file. This file can be later used together with the common SDC file at PnR stage.

#### 5.7.3 Scan-Test – Pattern Generation

As known from the discussions of Section 2.4.1 and 5.4.1, scan-tests are one technique to detect faults in a design. Moreover, a high test converge is only obtained when all flip-flops in the chains are replaced by scannable counterparts. Thus, the ATPG for different kinds of faults can be properly generated by appropriate design tools. However, in the presented concept of this work, the  $\Delta$ TMR cells are modeled for compliance purpose as 1-bit memory cells in the .lib-files. As a result, their scan insertion is equal as for common unhardened flip-flops and designs. Independent of the scan concept for  $\Delta$ TMR, i.e., S- $\Delta$ TMR -I/II, their mapping is done during GLS and the scan chains are inserted and connected during DfT step. However, when the patterns would be generated with the use of the same .lib-models, the number of test cycles is derived based on the 1-bit memory cell representation. Thus, the internal 3-bit scan chain of every S- $\Delta$ TMR cell is hidden and not accessible for ATPG tool. As a result, a low test coverage would be obtained. Consequently, alternative approaches are demanded for ATPG of  $\Delta$ TMR designs.

In a first one, an alternative second .lib-model file can be linked during pattern generation. It can include a behavioral description with an internal 3-bit scan chain of the S- $\Delta$ TMR cells. Thus, the test pattern can be properly generated by traversing the individual three flip-flops per  $\Delta$ TMR cell. In a second approach, the Verilog® gate-level netlist of the S- $\Delta$ TMR cells or a simplified TMR model is linked instead. As a consequence, the module description can be easily processed by the ATPG tool and the pattern can be derived. The advantage in this case is that no additional .lib-file is required, as long as the content of  $\Delta$ TMR cells are components of the linked unhardened standard cell library.

Even though, a slight increase in design effort must be considered independent of the selected procedure, both approaches enable a proper test pattern generation for RHBD circuits with the use of  $\Delta$ TMR memory cells. They are also part of the pending patent enrollment [126].

#### 5.7.4 Place and Route

At place and route stage (PnR), the physical implementation of the netlist design obtained from the gate-level mapping during logic synthesis stage is realized (see also Section 2.3). The synthesized RHBD netlist contains  $\Delta$ TMR flip-flops and a hardened signal implementation with the use of robust cell usage or, e.g., special transient filter to mitigate SEEs and the additional constraint file for objects of interest. As mentioned in the previous section, the implementation of highly-critical nets such as asynchronous control signals, and clocks, is postponed to this stage.

However, when hardware redundancy such as TMR is selected at cell level, the overall power consumption and area occupation are increased. In addition, the internal routing of the complex  $\Delta$ TMR memory cells has an impact of the routing congestion in comparison to a standard, unhardened design. As a consequence, some extra tasks have to be considered at this stage to obtain efficient RHBD circuits with  $\Delta$ TMR cells.

The development of the floorplan can be similarly done as for an unhardened standard design. The higher power consumption of robust designs demands for a more robustly implemented PDN to provide a stable and reliable power and ground supply network. In addition, the signal routing metallization concept of RHBD- $\Delta$ TMR in this work utilizes up to three thin metal layers for internal routing. If a PDN would be drawn at lower metal layers, more routing channels for signal tracks are immediately occupied and locally blocked. Thus, the routing congestion would be increased for certain regions. Consequently, it is proposed to create the global power routing on the top thick metal layers if applicable (this includes power and ground rings and PDN stripe generation). Upper thicker metal layers are often characterized by higher current density and lower resistance that contributes to the quality and robustness of the PDN. Afterwards, the placement of the standard cell can be done without any special interaction.

Thanks to the internal MNCC-aware component placement inside the  $\Delta$ TMR cells, there is no need for special placement constraints such as extra instance spacing or cell padding. Any design can be placed in standard manner. However, depending in the PnR-tool and its optimization behavior, any intentional remapping to unhardened standard memory cells has to be prevented by adequate setup and tool control. Alternatively, this undesired remapping can be explicitly suppressed by SDC dont\_touch and dont\_use settings for critical objects individually or with the use of the additional constraint file.

When the design is timing clean and does not violate any design rule for standard paths, all critical (*ideal*) data path-related nets, such as, e.g., asynchronous controls, scan-enable, or testmode signals can be implemented with a local Buffer Tree Synthesis (BTS), or with direct net-specific CTS runs. As a result, these nets (except clocks) are physically implemented by robust buffer trees realized with RHBD inverters (with enough critical charge ( $Q_{crit}$ , see Sections 2.7.3 and 5.2.2). Their implementation is additionally controlled by maximum fan-out, transition time constraints. Moreover, the respective objects are declared as fixed and untouchable to prevent any further optimization in area, power, or delay on these paths. In the next step, all defined clocks of the SDC-file are similarly implemented with the use of SET-robust inverter cells for clock tree (CT) implementation. Apart from this restriction, the tree insertion is absolutely identical to a standard clock tree implementation procedure. Since all  $\Delta$ TMR memory cells share the same clock net internally, the methodology for RHBD design together with the respective concept of  $\Delta$ TMR cells support clock stealing techniques such as useful skew for timing closure in timing critical designs. However, this technique mainly contributes to setup time improvement between connected registers. In order to meet the hold time of memory cells on the other side, one approach is to delay the data paths more aggressively. In this case, only cells without internal inverter stages operating in weak inversion should be specified as usable hold time buffer cells for data path optimization. Thus, the standard delay cells are excluded for this task. Otherwise, transient pulses could be widened above the maximum  $\delta$ -delay and lead to potential upsets in  $\Delta$ TMR cells. The alternative solution for hold time fix is clock path optimization by shortening the clock insertion delay of the sinks instead. Since shorting delays may also imply a mapping to alternative cell variants, it is proposed to use only robust driver cells (see Section 5.2.2) to maintain the clock tree robustness.

As a result, all clocks and critical nets are properly implemented and the design is fully free of *ideal* nets. It can be extracted afterwards to estimate the timing margin or to derive the power consumption. Mandatory implementation steps such as, standard filler insertion, signal routing, and design verification can be done thereafter. However, the layout generation for RHBD circuits with  $\Delta$ TMR cells is similarly done in the same manner as for unhardened standard designs. Nevertheless, some additional constraints must be met, and interaction is required in order not to compromise the robustness of the application, as described in this Section.

## 5.8 Summary

With the presented standard cell concepts, the robust TMR-based memory cells, transient filters, and the respective design methodology, the design of RHBD circuits is enabled with the use of an unhardened technology. The SEE mitigation is obtained by new  $\Delta$ TMR flip-flops. The low-power aspect and the DfT are addressed with concepts for robust  $\Delta$ TMR clock-gates, self-correcting  $\Delta$ TMR flip-flops, and with variants with scan-test support. Moreover, concepts for standard cell characterization and a MNCC-aware layout proposal are given in this Chapter. Following the principal of a construction kit, the architecture of  $\Delta$ TMR cells is divided into *Logical Sections* in order to derive further, alternative configurations in different flavors in terms of difference in area, power, or delay overhead. The concepts of each section are implemented in a 130 nm technology and evaluated by simulation. The presented concepts are transferable to further lower-scaled standard technology nodes.

"Wenn sie nur kann, wird die Natur dich dreist belügen." — Charles Darwin

## Chapter 6

## **Evaluation of the Concepts**

Reliable and robust hardware systems (RnRS) can be developed with the standard cell-based concepts and design methodologies proposed in this Thesis. Two systems in standard technology are addressed: the differential logic design and the development of RHBD circuits. The approaches of both systems have to be evaluated in practice and validated by experimental results.

This Section starts with an example of a standard cell-based bipolar differential CML design to demonstrate the applicability of the proposed design approach. In this context, the demonstrator circuit and the respective design flow have been already partially published in [17].

For the field of RHBD circuits, the  $\Delta$ TMR concept is applied to several memory cell architectures in the 130 nm standard technology. The implementation details of the new standard cells and the corresponding test vehicles for radiation experiments are presented afterwards. Some of the presented cell architectures and the obtained simulation results have already been published in [20, 21]. In particular, the irradiation test results which confirm the robustness of proposed novel  $\Delta$ TMR memory cells are published in [20] and [22], respectively.

Finally, this Chapter concludes with two realistic design examples for RHBD applications to which the cell concepts and methodology were applied to. The first design is a robust implementation of the digital part of an Analog-to-Digital Converter (ADC) core for space applications. The second example discusses the overhead that arises when hardware redundancy is implemented using  $\Delta$ TMR on a microcontroller design. Two different standard cell libraries have been used for this purpose and comparison.

## 6.1 A Reliable Digitally-Designed Differential Logic Design

The first RnR-System example is a digitally-designed differential CML-based logic design. It is developed with the proposed extended standard digital design flow using the presented cell concepts. The circuit development is achieved in digital manner with an RTL behavioral design description, over logic synthesis, to final standard cell-based layout generation. The implementation is done in IHP's standard 250 nm BiCMOS technology.

#### 6.1.1 Architecture Overview

The selected demonstrator design is an M/N+1 counter which is part of the feedback loop of a PLL circuitry. The registers of the counter are configurable in order to set the desired output frequency of this PLL. The counter itself acts as a clock divider and can be later integrated. However, the PLL is supposed to be used in a high-speed data transmitter with operating speeds around 12.5 Gbps. Since this speed requirement is well above the maximum frequencies of CMOS designs, logic cells with faster bipolar devices are required to implement the counter design. Moreover, PLL circuits are sensitive especially to digital switching noise. Consequently, the use of differential logic design style and signaling is a suitable circuit solution.

Figure 6.1 shows an abstract block diagram of the integrated M/N+1 counter design in PLL application. The dashed lines in the figure mark the counter module to be realized with differential logic gates.



Figure 6.1: Block diagram of the M/N+1-counter in a PLL application

As can be seen therein, an additional multiplexer and prescaler are added in the feedback loop to select an input frequency. Furthermore, the bias of the differential logic can be configured additionally in order to increase the voltage swing for reliability aspects. The M/N configuration itself is accessible through a Serial Peripheral Interface (SPI) module mapped to single-ended low-power CMOS standard cell gates.

The high-speed clock (HSCLK) is the buffered output clock of the PLL. The frequency  $f_{\text{PLL}}$  can be adjusted when the prescaler is disabled as follows:

$$f_{\rm PLL} = f_{\rm REF} \cdot (4M_{[+1]} + S). \tag{6.1}$$

As an example, a system with a reference clock frequency of 100 MHz and a setting for M=30, and S=1 would synthesize an output clock  $f_{PLL}$  with a frequency of 12.5 GHz.

#### 6.1.2 Implementation

The behavior of the counter is described in VHDL without differential signaling. The design generation is done following the proposed design methodology for differential logic standard cell-based design (see Section 4.4) with the use of commercial logic synthesis tool [30] and PnR tool [26].

The intermediate netlist is generated from the behavioral design mapped to SEPG elements. The SEPG library is linked during this logic synthesis run. Afterwards, the pseudo layout design, i.e., the fat-wire layout of the SEPG netlist is developed.

In this design phase, the PnR tool is set up with the fat-wire libraries, i.e., the fatwire technology (tech.lef) and the layout representation of the SEPGs. Both libraries provide the essential grid and routing layer definitions, and the pin-shape information of each cell to enable the fat-wire layout generation. The floorplan and placement is done in typical manner. Due to the low number of less than 12 clock sinks in this design example, there is no need to implement a fully-synthesized complex clock tree. The external clock buffer has enough driver strength to provide a stable and proper clock signal. Consequently, the data paths are optimized for timing and design rule targets only. Nevertheless, the correct mapping to corresponding level-shifters depending on the required output voltage level (see  $V_{BE}$ -level in Section 4.2.6 and Figure 4.5) requires some manual optimization interactions.

The route engine is configured such that only preferred routing directions are considered for each metal layer. As a result, every change in routing direction from vertical-tohorizontal track definitions (and vice versa) is forced to be realized by dedicated signal vias. Thus, shorts after the design conversion step are prevented. Since the selected technology provides only three thin out of five routing layers, only three layers are considered and specified for signal routing at fat-wire PnR stage. Even though this is a hard limitation and might lead to an increase in routing congestion, the design converter absolutely requires identical width and pitch definitions for wires to be split. Furthermore, all signals that are not excluded from routing are finally treated and routed in fat-wire manner. Their segments are interfacing the virtual fat-wire pin-shapes located close to the boundary in the fat-wire layout representation of the SEPGs.

Afterwards, the design is converted by the custom Design Converter tool as presented in Section 4.4. All pre-routed fat-wire wire segments are split and replaced by thinner differential wire-pair counterparts. The connected boundary pin-shapes split, doubled and relabeled in differential manner (e.g., fat-wire pin  $D \rightarrow Dp Dn$ ). The resulting intermediate layout is afterwards streamed-in again in the PnR tool for final pin-access and pin-connection realization in the second route phase (2-Phase routing solution). For the counter design example, 122 differential signal pairs are finally routed. The wires are drawn in parallel with the use of the fat-wire approach and the design conversion tool. Nevertheless, some minor manual routing corrections are necessary in order to clean-up generated DRVs. In this Section, the focus is set on the obtained routing result rather than on power and area optimization which has been already addressed in Section 4.2.5. Nonetheless, the circuit is realized with more than 120 differential logic standard cell gates. Among them are 16 flip-flops in total. The overall area frame of this M/N+1-counter IP is 0.70 mm x 0.64 mm and a static power consumption of about 250 mW can be estimated.

#### 6.1.3 Results and Discussion

Figure 6.2 shows a routing result of the first phase, i.e., the fat-wire design of the counter. As can be seen at the bottom edge in this figure, the layout consists of an open (not-routed) FW-pin, which is therefore not considered for wire splitting at conversion stage. Moreover, the design is explicitly specified with additional padding for placement of the differential logic standard cells. This preserves enough free routing channels for later signal routing. The area between has been filled with standard filler cells before fat-wire routing has been executed.



Figure 6.2: Fat-wire routing result (left) with region of interest after second routing phase (right) with routing artifacts due to signal inversion

Furthermore, power, ground and vbias core rings are special nets and are routed at the upper thick metal layers as indicated by the colors in the figure. All logical signals treated as differential pairs in later design stage are routed in fat-wire manner, whereas blue is the first, black is the second and red is the third routing layer. The highlighted box in the image surrounds the region of interest in this section. It reflects the same region after design conversion and the second route phase. As can be seen in the picture to the right, the differential signal pairs are routed in parallel. Only in case of direct pin-access, the detailed routes are slightly detoured (e.g., short routing at the bottom edge). This happens either when the signal is logically inverted leading to a crossing of the differential signal pair, or the orientation of the pin of the sink cell is different from the driver cell pin. In the second case, the locations of the p- and n-pin-shapes

of a differential pin pair are flipped. As a consequence, the pre-routed nets need to be rerouted slightly leading to tiny routing artifacts as shown in the picture of Figure 6.2.

However, in order to evaluate the benefit of the proposed methodology for standard cell-based differential logic design, the obtained 2-Phase routing solution needs to be compared to a regular routing solution. To accomplish this, the input for the PnR tool is the differential netlist, the cell placement and the pin-shape locations are generated by the conversion tool. In a next step, all nets are routed in standard manner by the route engine with thin width and pitch settings but without differential pair definitions. After the routing has succeeded, the parameters such as signal wire length and total wire capacitance can be extracted for every differential pin pair. Thus, both solutions can be compared.

The obtained results are illustrated for the *Regular* and the 2-Phase routing solution in Figure 6.3 (a) and (b), respectively. The left figure shows the wire length ratio of the differential signal pairs. As can be derived from this histogram, the total wire length between the positive and negative wire segments of the differential pairs match closer in the case of the 2-Phase routing approach. Even though, the regular routing result is already quite well-distributed, a slight improvement can still be achieved. A standard deviation for the wire length ratio of 0.15% can be noted for the regular routing case compared to 0.08% of the 2-Phase routing solution.



Figure 6.3: Comparison of the routing results of the regular routing and the proposed 2-Phase routing: (a) histogram of wire length, (b) X-Y plot of the total wire capacitance

Similarly, the right Figure 6.3 (b) depicts the extracted total wire capacitance of the differential signal pairs of both routing experiments. Each pair of coordinates of one marker represents the respective capacitance values of one differential pair. The closer these markers to the dashed main diagonal in this X-Y plot, the more equal the capacitances of the positive ( $C_p$ ) and negative ( $C_n$ ) wire segments signal pair. As can be seen in the figure, the imbalance of the total wire capacitances is already quite low in the regular routing case. However, one of the key benefits of FW routing is to preserve a parallel

routing result in order to balance the parasitic resistance and capacitance of the differential signal pairs. The variation stays closer to the equal case (dashed line). As a result, a slight improvement of 10% for the proposed fat-wire approach with virtual boundary pin-shapes can be derived with respect to the standard deviation of capacitance ratio compared to the regular routing result.

Finally, it is also worth to notice that a design rule-correct routing solution is only found using the standard cell layout abstracts with the proposed fat-wire boundary pins. An alternative fat-wire abstract representation with pins at the original differential pinshape location has led to many routing congestion overflows. One of the reasons is the low number available thin metal layers of the selected technology. A second one is the internal routing of the bipolar differential logic gates. Consequently, a satisfying routing result could not be generated, whereas a design rule-correct solution is found with virtual fat-wire boundary pin abstracts as demonstrated here.

Anyway, the quality of a routing result always depends on the correctness and accuracy of the abstract layout information of the standard cell gates. For bipolar differential logic designs, the fat-wire layout representation with its virtual fat-wire boundary pins is selected during the first routing phase at PnR stage. The final pin-access to the pin pairs of real-existing cell counterparts is done in a second routing step. This may lead to slight routing artifacts in the final design. Nevertheless, the development of a differential logic standard cell-based designs with a design flow-compliant layout generation has been successfully demonstrated.

## 6.2 Development and Evaluation of Radiation-Hardened △TMR Standard Cells

The proposed concepts for radiation-hardened TMR-based standard cells are evaluated on a 130 nm standard technology of IHP. This technology offers seven routing layers in total and provides a small SEL-robust CMOS standard cell library (see Section 2.1). Various  $\Delta$ TMR configurations are selected for implementation based on the overall performance and function (see also Section 5.3). This subset of cells under test (CUTs) is developed at gate and transistor-level for concept validation. The layouts of all configurations are drawn in MNCC-aware manner to maintain the SEU and SET robustness after physical implementation. This Section gives an overview and comparison of the implemented  $\Delta$ TMR cells and their obtained results. In addition, test vehicles of selected CUTs are arranged as shift register. Their measurement results of heavy ions irradiation campaigns conclude this Section.

#### 6.2.1 Development of robust $\Delta$ TMR cells

One of the mostly-mapped memory cells in digital designs is a D-flip-flop with asynchronous control function (i.e., DFF in Table 5.2). Thus, it is obvious to select this type of memory cell as a baseline cell for implementation of memory cell section of the first  $\Delta$ TMR arrangement. All configurations developed first are identical in memory cell selection. They only differ in filter option for D-SET filter section and component spacing for MNCC-aware layout design. An internal memory cell spacing ( $\Delta x_{MC}$ ) and the section spacing ( $\Delta x_{sec}$ ) between the logically connected groups (LCGs) are maintained for robustness. The reported spacings of recent works lead to a range around 2.0 µm to 8.5 µm (cf. also Section 2.7.5). Thus, a spacing of 10.0 µm has been selected as initial spacing for an MNCC-aware implementation of  $\Delta$ TMR cells.

More than 15  $\Delta$ TMR configurations have been developed and six variants are selected for comparison and contribution in this work. All of them consist of the well-known D1D2-D D-SET-filter and a NOR/OR-based voter architecture (V-NO). The primary output of all first  $\Delta$ TMR cells is driven by the smallest available driver type (X1). The internal components of the flip-flops are distributed in three LCGs. Moreover, one of these baseline  $\Delta$ TMR flip-flops is drawn with an extra deep N-well region for further possible latchup improvement capability.

The arrangement of the cells into LCGs is illustrated in Figure 6.4 (without asynchronous control nodes). As can be seen in this figure, the upper flip-flop together with the essential gates of the voter and driver section form the first LCG (LCG1), whereas the second and third flip-flop with its respective delay elements form the LCG0 and LCG2.



Figure 6.4: Logical sections of  $\Delta$ TMR flip-flop with highlighted logical grouping

An example of a layout frame of these  $\Delta$ TMR standard cell gates is illustrated in Figure 6.5. To cope the charge-sharing effect, a minimum  $\Delta x_{sec}$  is maintained between all three LCG regions. The resulting spare area between these groups is routed with the lowest possible layer (here, first metal layer highlighted in blue).



Figure 6.5: Placement and signal routing of a baseline  $\Delta$ TMR flip-flop

Finally, the pins are placed at the third metal layer and the signals are routed with double cuts (vias) to address the yield for the design for manufacturing aspects.

However, a short comparison of these different  $\Delta$ TMR implementations is given in Table 6.1. It lists details of the integrated  $\delta$ -size of D-SET filter, the memory cell spacing  $\Delta_{MC}$  and the minimum section spacing  $\Delta_{sec}$ . The results are normalized values for area occupation A, average propagation delay  $t_{pg}$ , and energy E. The values are obtained by transistor-level simulations under nominal conditions. The normalization is done in comparison to the unhardened single gate D-flip-flop (DFF) as a reference cell (Ref.) offered in the standard cell library. In the second row of the table, an implementation of a  $\Delta$ TMR is listed as DTMR\_R with a total  $\delta$ -delay of only 0.18 ns. This is a second reference cell which is one of the first application of  $\Delta$ TMR applied to 130 nm technology with a similar layout frame as shown in Figure 3.21. This cell consumes nearly twelve times of the standard DFF area and has a factor of 1.7 for the propagation delay. Furthermore, it requires more than 14 times more energy. This is mainly due to D-SET filter realization with inverter chains (DSET-D1D2) in this reference  $\Delta$ TMR. On the other side, a large memory cell spacing of 15 µm is assured in the layout. However, as listed in the table a small section spacing of only 0.5 µm is maintained.

| Туре                | Cell       | $\delta$ -size | Min $\Delta_{\rm MC}$ | Min $\Delta_{sec}$ | No   | rmali              | zed  |
|---------------------|------------|----------------|-----------------------|--------------------|------|--------------------|------|
|                     |            | [ns]           | $[\mu m]$             | $[\mu m]$          | A    | ${\rm t}_{\rm pg}$ | Е    |
| D-flip-flop (Ref.)  | DFF        | n/a            | n/a                   | n/a                | 1.0  | 1.0                | 1.0  |
| $\Delta TMR$ (Ref.) | DTMR_R     | 0.18           | 15                    | 0.5                | 11.9 | 1.7                | 14.4 |
| $\Delta \text{TMR}$ | DTMRAR00   | _              | 10                    | 10                 | 6.7  | 1.9                | 4.6  |
| $\Delta \text{TMR}$ | DTMRAR05 N | 0.5            | 14                    | 10                 | 8.1  | 1.9                | 7.4  |
| $\Delta \text{TMR}$ | DTMRBR05   | 0.5            | 19                    | 15                 | 9.3  | 1.9                | 7.4  |
| $\Delta TMR$        | DTMRBR10   | 1.0            | 19                    | 15                 | 10.7 | 1.9                | 10.3 |
| $\Delta \text{TMR}$ | DTMRCR05   | 0.5            | 9                     | 5                  | 6.9  | 1.9                | 7.4  |
| $\Delta \text{TMR}$ | DTMRDR05   | 0.5            | 4                     | 0                  | 5.8  | 1.9                | 7.4  |

Table 6.1: Novel  $\Delta$ TMR cells compared to reference DFF and reference  $\Delta$ TMR flip-flop

In contrast to this first DTMR\_R reference implementation, all proposed  $\Delta$ TMR cells have an MNCC-aware internal component placement by special spacing. Moreover, the size of  $\delta$ -delay is increased to mitigate wider transients. In addition, the wider but more area-efficient DSET-D1D2-D filter configuration is selected with  $\delta$ -sizes of 0.5 and 1.0 ns. For the first four  $\Delta$ TMR variants, a minimum  $\Delta$ x<sub>MC</sub> and  $\Delta$ x<sub>sec</sub> of at least 10 µm is implemented. Furthermore, two cells with a reduced spacing are developed for further investigations. However, all new  $\Delta$ TMR gates except the DTMRAR00 (w/o D-SET filter) are expected to be more robust due to the larger D-SET filter size, as can be derived from the table. They occupy less silicon area and energy in comparison to the DTMR\_R implementation. A slight increase in propagation delay is seen due to an alternative 2-input logic gate usage inside the voter section. In particular, the DTMRAR05 and the DTMRAR05N cell with deep N-well regions are some of the most promising candidates for robust  $\Delta$ TMR memory cells for RHBD circuits.

#### 6.2.2 Development of robust L- $\Delta$ TMR and LM- $\Delta$ TMR cells

 $\Delta$ TMR memory cells characterized by a decomposition of D-flip-flops to master-slave latch compositions are referred to as L- $\Delta$ TMR flip-flops in this work. If the latch outputs are additionally selectable by a multiplexer (MUX), this approach is called LM- $\Delta$ TMR (see also Figure 5.7 and Table 5.2). Nonetheless, the baseline flip-flop decomposition (LDFF) has a larger cell delay and requires more silicon area as discussed in Section 5.3.2. At a first glance, it does not seem reasonable to think about an efficient  $\Delta$ TMR implementation with the use of latch cells. Nevertheless, if an unhardened standard cell library does not offer suitable D-flip-flops for  $\Delta$ TMR cell development, but provides latch gates with better performance in terms of (area, power, or delay), an L-/LM- $\Delta$ TMR solution may be more attractive.

Contrarily to the baseline  $\Delta$ TMR approach with flip-flops, L- $\Delta$ TMR has twice the number of memory cells. Moreover, when the phase of the slave latch group has to be inverted by a robust inverter (INVX), the area and the internal signal routing density are additionally increased. As early experiments have shown, no single-row cell frame could be successfully implemented with less than four metal layers in this technology. Furthermore, no design-rule clean 1-row L- $\Delta$ TMR cell layout was obtainable when double cut usage had been enabled during routing. In general, the longer a cell frame, the fewer placement solutions can be offered in later applications due to routing of internal connections. As a consequence, a compromise is to use a multiple row arrangement for the L- $\Delta$ TMR cell layout frame. Thus, the routing congestion is relaxed since more horizontal tracks are offered when the cell height is increased. An example of such a layout frame of a three-row L- $\Delta$ TMR cell arrangement is shown in Figure 6.6.



Figure 6.6: Abstract component placement of an L- $\Delta$ TMR flip-flop

The L- $\Delta$ TMR layouts are not equipped with special spare area by intention as it is implemented in  $\Delta$ TMR 1-row layouts to maintain some section spacing ( $\Delta x_{sec}$ ). Due to the high-density layouts and the 3-row placement, the memory cell spacing ( $\Delta x_{MC}$ ) is the only fulfilled spacing constraint. As can be seen in the figure, the robust clock inverter INVX is placed in the center of the cell. Moreover, the master latches, i.e., M-/S-latches, are distributed in diagonal in the standard cell frame to keep the  $\Delta x_{MC}$  spacing. The slave stages are placed in similar manner. However, the decomposition to latches may increase the cell delay as the comparison has already shown in Section 5.3.2. The usage of multiplexers behind the latch outputs is an attractive solution to reduce this overhead. As a consequence, this concept of a single cell is applied to  $\Delta$ TMR and refers to a LMDFF flip-flop in this work. It consists of two latches and one multiplexer cell for each memory stage of the TMR arrangement. Moreover, instead of the use of one robust clock inverter (INVX), the corresponding sensitive latch type counterpart is selected of the unhardened library for the slave stages. Table 6.2 shows some LM- $\Delta$ TMR flip-flop experiments with different multiplexer and voter configurations in comparison to the baseline L- $\Delta$ TMR counterpart with same functionality. The circuits are implemented in the 130 nm technology. The normalized results are extracted from transistor-level simulations.

| LTMR     | $\mathbf{t}_{\mathbf{pg}}$ | I <sub>avg</sub> | Voter Type | MUX Arch. |
|----------|----------------------------|------------------|------------|-----------|
| LDTMRR05 | 1.00                       | 1.00             | V-AO       | n/a       |
| LMDTMR_A | 0.78                       | 1.02             | V-AO       | Std. MUX  |
| LMDTMR_B | 0.62                       | 0.79             | V-ND       | Std. MUX  |
| LMDTMR_C | 0.39                       | 0.59             | V-GG       | T-Gate    |

Table 6.2: LM- $\Delta$ TMR flip-flop experiments in comparison to reference L- $\Delta$ TMR

As can be seen in the second row of the table, an improvement in average propagation delay  $(t_{pg})$  of 22 % can be obtained in an LM- $\Delta$ TMR configuration when the multiplexer is realized with standard cells (i.e., Std. MUX). In addition, when the voter section is realized with NAND gates (V-ND) a speed improvement of further 16 % to 38 % is achieved. Finally, the shortest propagation delay can be noticed if the voter is realized with GGs (V-GG). The average current I<sub>avg</sub> decreases as well, whereas this is particularly obtained with the use of the alternative D-SET filter architecture DSET-D1-GG and the transmission gate (T-Gate) multiplexer arrangement. Moreover, the decision to use the latch type counterpart for the slave stages instead of the use of a robust clock inverter for global clock signal inversion additionally saves silicon area (not listed in the table). Hence, this space is partially occupied by the multiplexer logic.

All these architectural changes allow to arrange single-row layouts for the novel LM- $\Delta$ TMR cells with the use of less than four metal layers. Furthermore, an additional  $\Delta x_{\rm MC}$  spacing between the master and slave latch gates is still possible. Two experimental configurations with different spacings are developed. In a first configuration, the master stages and their corresponding slave latches are placed next-to-next to each other without any spacing. This placement is quite similar to the  $\Delta$ TMR layouts presented in the previous section. The difference is that this LM- $\Delta$ TMR cell has a high-density layout since there is no more unused inactive silicon area available.

In a second configuration, the master and the corresponding slave latches are separated by  $6 \,\mu\text{m}$  micrometers in X direction. The area between the latches is not unoccupied. It is used for transient filter components, voter, driver, and multiplexer logic. The inner slave latches are moved closer to the next master counterpart. As a result, all latches are placed with an interleaved  $\Delta x_{\rm MC}$  spacing as depicted in Figure 6.7. It shows the abstract of the one-row layout of an LM- $\Delta$ TMR cell with the interleaved memory cell spacing. These variants have a placement density of 100% and are realized only with three metal layers for internal signal routing.



Figure 6.7: One-row layout frame of a novel LM- $\Delta$ TMR flip-flop

Similarly as for the D-flip-flop-based  $\Delta$ TMR configurations presented in the previous section, the developed L- $\Delta$ TMR and LM- $\Delta$ TMR flip-flops are compared. All variants are equipped with asynchronous reset function. The L- $\Delta$ TMR cells consist of a V-AO voter section, whereas the LM- $\Delta$ TMR architectures are designed with V-ND voters and driver section with the next higher driver strength. Moreover, the here presented candidates also differ in terms of D-SET transient filter configuration. No filter or DSET-D1D2-D is used in L- $\Delta$ TMR. Instead, the DSET-D1-GG is integrated in LM- $\Delta$ TMR solutions. Besides these variants, two L- $\Delta$ TMR cells with voter modules after the master and slave stages (LDTMR2) are developed. This introduces an additional single point of failure inside the TMR gate. Therefore, both LDTMR2 variants and the variant without D-SET filter are expected to be less robust. They are considered as error devices for comparison purpose only.

| Type                | Cell      | $\delta$ -size | Min $\Delta_{\rm MC}$ | Min $\Delta_{sec}$ | No   | rmali                      | zed  |
|---------------------|-----------|----------------|-----------------------|--------------------|------|----------------------------|------|
|                     |           | [ns]           | $[\mu m]$             | $[\mu m]$          | A    | $\mathrm{t}_{\mathrm{pg}}$ | Ε    |
| D-flip-flop (Ref.)  | DFF       | n/a            | n/a                   | n/a                | 1.0  | 1.0                        | 1.0  |
| $\Delta TMR$ (Ref.) | DTMR_R    | 0.18           | 15                    | 0.5                | 11.9 | 1.7                        | 14.4 |
| $L-\Delta TMR$      | LDTMRR00  | _              | 5                     | n/a                | 7.9  | 1.9                        | 5.5  |
| $L-\Delta TMR$      | LDTMRR05  | 0.5            | 5                     | n/a                | 7.9  | 1.9                        | 8.4  |
| $L-\Delta TMR$      | LDTMR2R00 | —              | 5                     | n/a                | 7.9  | 1.9                        | 7.5  |
| $L-\Delta TMR$      | LDTMR2R05 | 0.5            | 5                     | n/a                | 9.4  | 1.9                        | 10.3 |
| $LM-\Delta TMR$     | LMDTMRA05 | 0.5            | 10                    | n/a                | 7.2  | 1.2                        | 7.3  |
| $LM-\Delta TMR$     | LMDTMRB05 | 0.5            | 14                    | n/a                | 7.2  | 1.2                        | 7.3  |

Table 6.3: L- $\Delta$ TMR and LM- $\Delta$ TMR cells in comparison to reference flip-flops

As a short conclusion, all L-, and LM- $\Delta$ TMR cells are characterized by lower overhead in terms of area and energy consumption. The most promising candidates are LDTMRR05, LMDTMRA05, and LMDTMRB05. All of them offer the widest transient filter size. In particular, a significant improvement in propagation delay can be additionally noted for both LM- $\Delta$ TMR configurations.

#### 6.2.3 Development of robust S- $\Delta$ TMR and SC- $\Delta$ TMR cells

In the previous sections,  $\Delta$ TMR architectures with basic memory function were presented. The support of scan function (S) with access to the internal flip-flops, or the self-correction (SC) are not supported therein. For these features, the S- $\Delta$ TMR and the SC- $\Delta$ TMR concepts are proposed in this work.

Three implementations of the S- $\Delta$ TMR-I concept illustrated in Figure 5.9 from Section 5.4.1 are realized. They use unhardened scannable D-flip-flops for memory cell section and provide the internal access to the flip-flops during scan-test. The S- $\Delta$ TMR configurations are implemented with and without asynchronous initialization control function. The layouts of all scannable  $\Delta$ TMR flip-flops are arranged in a one-row standard cell frame (see illustrated layout frame of Figure 5.16). Spacing requirements for  $\Delta x_{sec}$  and  $\Delta x_{MC}$  are fulfilled to maintain the robustness against charge-sharing effect. Due to the extra complexity by scan function, the S- $\Delta$ TMR cell width is slightly increased in comparison to the common  $\Delta$ TMR flip-flops. One of the reasons is the size of a scannable D-flip-flop cell, which is larger than the baseline FF counterpart. The internal multiplexer for data path and scan path selection requires additional resources. Moreover, the concept proposes internal buffers in order to meet the hold time requirements of the second and third flip-flop of the TMR arrangement (in scan-mode).

In addition to the S- $\Delta$ TMR cells, the scan variant with reset and set control is extended with self-correction function following the SC-S- $\Delta$ TMR concept. In this case, the increased complexity by additional logic requires an alternative layout arrangement. Thus, an improved placement capability of the cell and the use of few routing layers can be accomplished. Contrarily to all other aforementioned layout proposals, a two-row cell frame is selected. Reason for this is the amount of additional active area for the control logic of the self-correction blocks (cf. SC\_CTRL in Section 5.4.3). Moreover, the self-correction logic of one of the three internal flip-flops is always put together in one LCG for an MNCC-aware component placement. Consequently, all components of one group are placed into the same region. To reduce the width of the cell, these groups are placed in two rows as illustrated in Figure 6.8.



Figure 6.8: Two-row layout frame of a novel SC-S- $\Delta$ TMR flip-flop

Even though a lot of silicon area is spare area and filled with standard cell fillers or decoupling cells in this proposal, the distribution of the three LCGs into a two-row layout allows for freeing routing tracks at the lowest metal layer. Furthermore, finding legal placement locations for these complex cells is additionally enhanced when the cells are less wide. As can also be seen in the figure, both spacings are maintained in both rows.

For comparison, all four implementations are evaluated at transistor-level by simulation. The extracted results are listed in Table 6.4 and compared to the reference unhardened flip-flop DFF, and the reference  $\Delta$ TMR DTMR\_R realization. The three S- $\Delta$ TMR circuits differ in supported asynchronous control function, i.e., reset (infix R), both, set and reset function (infix B), and infix N for the cell without any asynchronous control.

| Туре                | Cell      | $\delta$ -size | Min $\Delta_{\rm MC}$ | Min $\Delta_{\text{sec}}$ | No   | rmaliz                     | zed  |
|---------------------|-----------|----------------|-----------------------|---------------------------|------|----------------------------|------|
|                     |           | [ns]           | $[\mu m]$             | $[\mu m]$                 | A    | $\mathrm{t}_{\mathrm{pg}}$ | Ε    |
| D-flip-flop (Ref.)  | DFF       | n/a            | n/a                   | n/a                       | 1.0  | 1.0                        | 1.0  |
| $\Delta$ TMR (Ref.) | DTMR_R    | 0.18           | 15                    | 0.5                       | 11.9 | 1.7                        | 14.4 |
| $S-\Delta TMR$      | SDTMRBR05 | 0.5            | 19                    | 9                         | 10.8 | 2.1                        | 7.9  |
| $S-\Delta TMR$      | SDTMRRR05 | 0.5            | 19                    | 9                         | 10.8 | 2.1                        | 7.8  |
| $S-\Delta TMR$      | SDTMRNR05 | 0.5            | 19                    | 9                         | 10.8 | 2.0                        | 7.8  |
| $SC-S-\Delta TMR$   | SCSDTMR05 | 0.5            | 16                    | 9                         | 16.0 | 1.9                        | 6.1  |

Table 6.4: Novel S- $\Delta$ TMR and SC-S- $\Delta$ TMR cells compared to reference flip-flops

As shown in Table 6.4, even though the flip-flops are equipped with scan function or self-correction, the overhead in delay and energy consumption is at an acceptable level. In particular, self-correction is an expensive feature with respect to area occupation. It is 16 times larger than the unhardened D-flip-flop. With the use of V-GG voter and DSET-D1-GG filter configuration for this kind of  $\Delta$ TMR flip-flop, this increase in overhead is attempted to be addressed. Nevertheless, all of these presented scannable RHBD flip-flops enable the scan-test of internal flip-flops inside the  $\Delta$ TMR cells and are therefore an important contribution to this work.

#### 6.2.4 Development of robust T- $\Delta$ TMR cells

All  $\Delta$ TMR cells presented above use baseline flip-flop gates which are offered in most unhardened standard cell libraries. As known from the comparison of baseline cell candidates in Table 5.2, dynamic single-phase flip-flops, i.e., the TDFF TSPC flip-flops, are advantageous for short gate delays with a low area occupation. As a consequence, a  $\Delta$ TMR flip-flop made of TSPC memory cells, a fast voter configuration, and a D-SET filter configuration with low latency is a promising architecture to reduce the introduced overhead by hardware redundancy in  $\Delta$ TMR.

The resulting T- $\Delta$ TMR cells consist of similar TSPC standard cell gates presented in Section 5.3.2. In  $\Delta$ TMR, the internal TSPC cells are inverting gates of three stages as illustrated in Figure 3.6. This saves three obligatory output inverters per memory cell. The voter section is realized with the NAND-based solution (V-ND) to further reduce the delay overhead (cf. Table 5.3). Moreover, the DSET-D1-GG transient filter architecture is selected to achieve a shorter latency due to the use of one  $\delta$ -delay element. This improves the setup time of the entire T- $\Delta$ TMR. The output of these cells is driven by an inverter, to decouple the external load from the voter logic and to provide the correct signal polarity. Based on this approach, further drive type configurations can be easily derived.

The standard flip-flop (DFF), several TSPC (TDFF) and T- $\Delta$ TMR cells are implemented in the 130 nm technology. They are arranged in chains in a test chip for electrical validation. Moreover, an additional ASIC with one T- $\Delta$ TMR cell candidate is fabricated for experimental irradiation measurement campaigns. The post-layout cell propagation delay is extracted from the test chip as an average of all 200 elements of each chain. The results for the DFF, for the novel TSPC flip-flop (TDFF), and for one T- $\Delta$ TMR cell are illustrated in Figure 6.9. The concept of T- $\Delta$ TMR, the test chip and the post-layout simulation results have also been published in [21].



Figure 6.9: Extracted propagation delays of DFF, TDFF, and a T- $\Delta$ TMR cell

As can be seen, the propagation delay of the TSPC flip-flop with output inverter (TDFF) is nearly half of the delay of the baseline counterpart DFF for each digital corner min (-40°C at 1.32 V), typ (25°C at 1.20 V), and max (125°C at 1.08 V). Moreover, a T- $\Delta$ TMR configuration has an increase in delay of only 130 ps for the slowest corner (max) in comparison to the unhardened baseline D-flip-flop DFF (indicated by the dashed line). In addition, a saving of 22% in silicon area is also obtained compared to a similar  $\Delta$ TMR arrangement with the use of DFF as baseline cells inside memory cell section.

Since TDFF has a smaller cell frame and the TMR arrangement itself is less complex, the layouts of T- $\Delta$ TMR cells are arranged in a single row. The robust implementation is maintained by fulfilling the spacing requirement for both spacings, i.e., a  $\Delta x_{sec}$  of at least 10 µm and the resulting section spacing  $\Delta x_{MC}$  given by the respective LCG arrangement.

Finally, the same cell-related results are extracted out of the pure transistor-level simulations and listed in Table 6.5 for comparison. Even though the results might be too optimistic due to the lack of parasitic effects, they show the same trend. The  $\Delta$ TMR arrangement with TSPC cells has the shortest propagation delay. Moreover, a similar energy consumption compared to other  $\Delta$ TMR configurations is obtained. Furthermore, the lowest area occupation can be noticed. As a result, this configuration is an ideal candidate for designs with tight area-, and delay-requirements.

| Type                | Cell       | $\delta$ -size | Min $\Delta_{\rm MC}$ | Min $\Delta_{sec}$ | No   | rmali              | zed  |
|---------------------|------------|----------------|-----------------------|--------------------|------|--------------------|------|
|                     |            | [ns]           | $[\mu m]$             | $[\mu m]$          | A    | ${\rm t}_{\rm pg}$ | Ε    |
| D-flip-flop (Ref.)  | DFF        | n/a            | n/a                   | n/a                | 1.0  | 1.0                | 1.0  |
| $\Delta$ TMR (Ref.) | DTMR_R     | 0.18           | 15                    | 0.5                | 11.9 | 1.7                | 14.4 |
| $\Delta TMR$        | DTMRAR05 N | 0.5            | 14                    | 10                 | 8.1  | 1.9                | 7.4  |
| $T-\Delta TMR$      | TDTMRAN05  | 0.5            | 18                    | 10                 | 5.8  | 0.9                | 7.4  |

Table 6.5: T- $\Delta$ TMR in comparison to  $\Delta$ TMR cells and standard flip-flop (DFF)

#### 6.2.5 Development of robust clock-gate CG- $\Delta$ TMR

The last example for  $\Delta$ TMR applicability is the  $\Delta$ TMR clock-gate (CG- $\Delta$ TMR). It is mainly realized with the use of standard cell library gates following the concept presented in Section 5.4.2. Therein, the enable signal is fed through the D-SET filter section. The memory cell section consists of three unhardened standard clock-gates. They are internally arranged in a latch/AND-gate combination each (see Figure 5.10). Contrarily to all previously proposed  $\Delta$ TMR flip-flops, the voter is made of robust cells only. This is mandatory, otherwise a resulting transient on the shared clock path may lead to multiple bit upsets.

It is a matter of fact that the combinational voters such as V-ND or V-OA consist of many internal devices. Consequently, this would lead to more area occupation, and more importantly, to higher energy consumption during clock signal processing. On the other side, any transient occurrence inside unhardened voter and driver logic has to be suppressed in a RHBD CG- $\Delta$ TMR gate. To address both aspects, the voter is made of robust feedback-less guard-gate inverters (GGI) arranged as an inverting V-GGN voter depicted in Figure 5.8 (b). The advantage of this solution is a lower number of active transistors which have to be scaled to obtain large-enough critical charge.

Moreover, the LCGs are logically separated in the horizontal direction in the logical section scheme. The respective connected components of an unhardened clock-gate is set as a member of the LCG (see Figure 6.10). The first and the third LCG consist of one clock-gate, a GG of the DSET-D1-GG module, and one robust GGI as part of the V-GGN module. The second LCG is implemented with the delay element of the D-SET filter section, the second clock-gate, and with the respective GGI of the voter. The final output is processed by a robust inverter cell (INVX) to implement the entire V-GG voter. The placement of all components is similarly arranged as for other  $\Delta$ TMR cells with spacing constraints for memory cells and sections above 10 µm. Last but not least,

the cell layout follows the well-known one-row proposal with the use of only three routing layers and double cuts.



Figure 6.10: Block scheme of the proposed RHBD- $\Delta$ TMR CG with highlighted LCGs

The obtained transistor-level simulation results for the CG- $\Delta$ TMR cell are listed in Table 6.6. Even though the  $\Delta$ TMR clock-gate is compared to flip-flop counterparts, the trend for the overhead can still be derived.

| Type                | Cell       | $\delta$ -size | Min $\Delta_{\rm MC}$ | Min $\Delta_{sec}$ | N    | Normalized |      |
|---------------------|------------|----------------|-----------------------|--------------------|------|------------|------|
|                     |            | [ns]           | $[\mu m]$             | $[\mu m]$          | A    | $t_{pg}$   | Ε    |
| D-flip-flop (Ref.)  | DFF        | n/a            | n/a                   | n/a                | 1.0  | 1.0        | 1.0  |
| $\Delta TMR$ (Ref.) | DTMR_R     | 0.18           | 15                    | 0.5                | 11.9 | 1.7        | 14.4 |
| $\Delta \text{TMR}$ | DTMRAR05 N | 0.5            | 14                    | 10                 | 8.1  | 1.9        | 7.4  |
| $CG-\Delta TMR$     | CGDTMRA05  | 0.5            | 22                    | 10                 | 8.0  | (0.9)      | 6.3  |

Table 6.6: CG- $\Delta$ TMR in comparison to standard and  $\Delta$ TMR D-flip-flops

As can be seen, the penalty in terms of area occupation for a robust CG- $\Delta$ TMR cell is quite similar to the  $\Delta$ TMR flip-flops DTMRAR05|N. This is obviously due to the largersized voter section. On the other side, a slightly reduced overhead in energy can still be noted. Nevertheless, the solution provides the design of robust RHBD- $\Delta$ TMR-based clock-gates made of partially unhardened components.

#### 6.2.6 Shift Register Test Vehicles

Shift registers are ideal structures as test vehicles for SEE irradiation measurement experiments. Shifted output data of a register chain can be easily compared to the expected values during the irradiation campaign. The following test pattern (TP) are applied for different faults:

- solid-0 to see the circuit sensitivity for faults forcing the memory cells outputs to logical *high*;
- solid-1 for faults forcing the outputs of memory cells to logical *low* level;
- Alternate 0-1-0 TP with changes in logic state for detection of clock tree failures;
- Combined TP for some more realistic scenario with changes of the logic states with some probability.

With the use of these TPs, upsets in the bit sequence of the internal data stream of each shift register is classified. As a consequence, every promising  $\Delta$ TMR cell candidate presented in the previous sections is selected as a cell under test (CUT) for heavy ion irradiation campaigns. The resulting test vehicles with the CUTs are arranged as single 1024 bit-wide shift registers. The layout design of these ASICs follows the proposed design methodology without any extra design effort. No extra optimization on the design rules or netlist is done. Neighboring  $\Delta$ TMR cells of the chain are placed next-to-next to each other. The SET mitigation of critical signals such as clock, asynchronous reset, or set is addressed using robust inverter trees only, whereas signals are not additionally protected when traversing a D-SET filter section inside a  $\Delta$ TMR cell. Furthermore, the placement of the proposed RHBD- $\Delta$ TMR flip-flops is similar for all shift register implementation of one  $\Delta$ TMR-family, i.e., for the baseline, the L- and the S- $\Delta$ TMR test vehicles, respectively. Moreover, the positions of clock tree and reset tree driver cells are also family-wise similar. As a result, the final shift register layouts of one  $\Delta$ TMR family only differ in different baseline cell selection and a slightly modified signal routing.

Nevertheless, the test chip for CG- $\Delta$ TMR cells requires an alternative design concept. Therein, a chain of 1024 bit-wide robust  $\Delta$ TMR flip-flops is equivalently arranged as a common  $\Delta$ TMR cell shift register design. However in this case, an individual instance of the CG- $\Delta$ TMR gate is connected and placed directly in front of the clock input of each of the 1024  $\Delta$ TMR flip-flop elements. The buffer tree for the enable signal is not robustly implemented with specific, particular attention. Any transient should be filtered by the internal transient filter section inside the CG- $\Delta$ TMR cell itself.



Figure 6.11: Implementation of test vehicles for electrical measurements and irradiation tests: (a) layout view of  $\Delta$ TMR shift register, (b) a micro-photograph

Figure 6.11 (a) illustrates the layout of the most promising  $\Delta$ TMR candidate with highlighted cell placement. The dimension of the shift register test vehicle is 1.25 x 1.29 mm. A micro-photograph of an L- $\Delta$ TMR test vehicle is shown in Figure 6.11 (b).

On-wafer measurements are carried out with the use of the Advantest 93k SoC tester for electrical validation and device pre-selection for future heavy ion tests. The scannable  $\Delta$ TMR cells (S- and SC-S- $\Delta$ TMR) allow direct tests on the voter function. As an example, a test pattern 0.1.0 can be continuously written in series during the scanin mode to the shift registers for  $3 \times 1024$  clock cycles. Afterwards, when the entire  $\Delta$ TMR chain is programmed, the scan-enable input is released bringing the cells into normal operation mode for one additional clock cycle. Thus, the entire shift register content is shifted for one position to the right. If all voters work correctly, every single  $\Delta$ TMR flip-flop should have finally voted to 0 by its majority voter. The readout data in scan-mode is  $1024 \times (0,0,0)$  (for all  $\Delta$ TMR flip-flops) in this example. With the use of that approach, any permutation of the triple can be used to test the internal TMR structures. Moreover, it also enables to check the self-correction function of the SC-S- $\Delta$ TMR flip-flops. In this case, the clock signal is pulled to logical zero in parallel while the scan function is disabled. As a result, the self-correction mode is started and an unequal pattern, e.g., 1,1,0 is majority-voted and written to internal 1,1,1 by self-correction mechanism.

#### 6.2.7 Irradiation Campaigns

After successful on-wafer measurements, selected devices were integrated in 68-pin ceramic PGA packages. Depending on the campaign, up to four ICs of one family are mounted in one package for heavy-ion radiation tests. For illustration purpose, a photograph of such integration of four shift register ICs in a package is shown in Figure 6.12. Therein, the ASICs are placed to the center of the package as close as possible in order to avoid to be in the shadow of the package in case of angled irradiation tests.



Figure 6.12: Integration of four shift register CUTs in one package

Table 6.7 lists the subset of fabricated shift registers with corresponding  $\Delta$ TMR CUTs and their respective irradiation campaign number. The purpose is an overview for the heavy ion test results of the next Section. The discussions about the performance and comparison with the reference DFF and the  $\Delta$ TMR DTMR\_R cells have already been made in the previous sections.

| Camp.                              | CUT                                 | $\delta$ -size Min $\Delta_{MC}$ Min $\Delta_{sec}$   Normalized |                        |           | zed                           |              |        |  |
|------------------------------------|-------------------------------------|------------------------------------------------------------------|------------------------|-----------|-------------------------------|--------------|--------|--|
| No.                                |                                     | [ns]                                                             | $[\mu m]$              | $[\mu m]$ | A                             | $t_{\rm pg}$ | Ε      |  |
| Reference Flip-Flop                |                                     |                                                                  |                        |           |                               |              |        |  |
| 1                                  | DFF                                 | n/a                                                              | n/a                    | n/a       | 1.0                           | 1.0          | 1.0    |  |
|                                    |                                     |                                                                  | $\Delta \text{TMR}$    |           |                               |              |        |  |
| 1                                  | DTMR_R                              | 0.18                                                             | 15                     | 0.5       | 11.9                          | 1.7          | 14.4   |  |
| 2                                  | DTMRAR00                            | _                                                                | 10                     | 10        | 6.7                           | 1.9          | 4.6    |  |
| 2 3                                | DTMRAR05 N                          | 0.5                                                              | 14                     | 10        | 8.1                           | 1.9          | 7.4    |  |
| 2                                  | DTMRBR05                            | 0.5                                                              | 19                     | 15        | 9.3                           | 1.9          | 7.4    |  |
| 2                                  | DTMRBR10                            | 1.0                                                              | 19                     | 15        | 10.7                          | 1.9          | 10.3   |  |
| $L-\Delta TMR$ and $LM-\Delta TMR$ |                                     |                                                                  |                        |           |                               |              |        |  |
| 2                                  | LDTMRR00                            | _                                                                | 5                      | n/a       | 7.9                           | 1.9          | 5.5    |  |
| 2                                  | LDTMRR05                            | 0.5                                                              | 5                      | n/a       | 7.9                           | 1.9          | 8.4    |  |
| 2                                  | LDTMR2R00                           | _                                                                | 5                      | n/a       | 7.9                           | 1.9          | 7.5    |  |
| 2                                  | LDTMR2R05                           | 0.5                                                              | 5                      | n/a       | 9.4                           | 1.9          | 10.3   |  |
|                                    | Sca                                 | nnable 4                                                         | $\Delta TMR (S-\Delta$ | TMR )     |                               |              |        |  |
| 3                                  | SDTMRBR05                           | 0.5                                                              | 19                     | 9         | 10.8                          | 2.1          | 7.9    |  |
| 3                                  | SDTMRRR05                           | 0.5                                                              | 19                     | 9         | 10.8                          | 2.1          | 7.8    |  |
| 3                                  | SDTMRNR05                           | 0.5                                                              | 19                     | 9         | 10.8                          | 2.0          | 7.8    |  |
| Legend                             | $\delta$ -size: size of D-          | SET filt                                                         | er delay               |           | A: norm. Area                 |              |        |  |
| -                                  | $\Delta_{\rm MC}$ : memory c        | ell spaci                                                        | ng                     |           | t <sub>pg</sub> : norm. Delay |              |        |  |
|                                    | $\Delta_{\text{sec}}$ : section spa | acing (in                                                        | LCGs)                  |           |                               |              | Inergy |  |

Table 6.7: Overview of CUTs in test vehicle implementations and campaigns

These circuits are integrated in opened packages for heavy ion tests. The devices had been later exposed to different ions with a fluence of at least  $2 \cdot 10^7 \text{ cm}^{-2}$  and an LET between 3 and 67 MeV cm<sup>2</sup> mg<sup>-1</sup> at the cyclotron of the Université catholique de Louvain (UCL) in Belgium. Several test patterns had been applied for the different purposes, while the devices were under irradiation. Finally, all the errors were externally counted.

#### 6.2.8 Results and Discussion

Within the scope of this work, three different campaigns were performed considering the CUTs listed in Table 6.7. The shift register of the standard flip-flops DFF and the initial  $\Delta$ TMR solution DTMR\_R were irradiated in a first campaign. Test vehicles with the novel  $\Delta$ TMR concept with its improved internal component spacing and increased D-SET filter, and the alternative L- $\Delta$ TMR variants were evaluated in a second campaign. Finally, the first variants of the scannable S- $\Delta$ TMR cells and an improved  $\Delta$ TMR configuration with deeper N-well drawings were measured in campaign no. 3. All radiation test results are given as cross-section as a function of the effective LET (LET<sub>eff</sub>) (see Section 2.7.3 for more details). The cross-section  $\sigma$  per bit is calculated, as follows:

$$\sigma = \frac{Errors}{Fluence \times \#\text{flip-flops}}$$
(6.2)

The obtained radiation test results of the  $\Delta$ TMR CUTs with novel  $\Delta$ TMR gates are shown in diagram (a) of Fig. 6.13. For easier comparison and better illustration, the results of the first campaign, i.e., of the unhardened standard flip-flop DFF and the reference  $\Delta$ TMR flip-flop are additionally depicted. Most of the tests were performed with a fluence of  $3 \cdot 10^7$  cm<sup>-2</sup>. Some tests at lower LET used a fluence of  $2 \cdot 10^7$  cm<sup>-2</sup>. Moreover, a global worst-case detection limit of  $4.88e^{-11}$  cm<sup>2</sup> corresponding to the minimum fluence is defined in order to improve all illustrations.



Figure 6.13: Cross-section as a function of effective LET (LET<sub>eff</sub>)

As can be seen in Figure 6.13 (a), SEUs were detected already at the lowest LET for the standard, unhardened D-flip-flop (DFF). Interestingly, two errors were observed for the  $\Delta$ TMR implementation without D-SET filter (DTMRAR00) at low LET of 16.1 MeV cm<sup>2</sup> mg<sup>-1</sup>. Moreover, the reference  $\Delta$ TMR approach (DTMR\_R) with a small integrated D-SET filter and a memory cell section spacing had already an increased LET threshold. However, the most robust variant is the DTMRAR05 implementation with an integrated D-SET filter size of  $\delta = 0.5$  ns. For this device, no single error was detected at 62.5 MeV cm<sup>2</sup> mg<sup>-1</sup> as indicated by the horizontal black curve. However for sake of clarity, the results of the  $\Delta$ TMR cells with larger spacing (DTMRBR05) and an additional larger transient filter of 1 ns (DTMRBR10) are not shown in the figure. For both devices, only one error was counted at the highest LET of 62.5 MeV cm<sup>2</sup> mg<sup>-1</sup>. Consequently, the course of these graphs would be similar to the result of the most robust DTMRAR05 configuration. As an intermediate result, the importance of D-SET filter size for robust TMR-based memory cells can already be deduced. All variants with integrated filter structures have higher LET threshold. Contrarily, first SEU/SET errors had been already seen at 16.1 MeV cm<sup>2</sup> mg<sup>-1</sup> for circuits without or smaller transient filter size, e.g., DTMRAR00 and DTMR\_R.

With respect to the results of the L- $\Delta$ TMR variants shown in Figure 6.13 (b), no SEU was detected at 46.1 MeV cm<sup>2</sup> mg<sup>-1</sup> LET\_eff for the LDTMRR05 device (with 0.5 ns  $\delta$ -delay). As a reminder, the layout of the latch cell-based L- $\Delta$ TMR has a high utilization (density) without maintaining any section spacing ( $\Delta x_{sec}=0$ ) in comparison to the common  $\Delta$ TMR cells made of flip-flop gates. Instead, the layouts consist of a small memory cell spacing  $\Delta x_{MC}$  between the latches. Furthermore and as expected, the number of errors is increased for the error devices with two voter sections (LDTMR2R00). As can be seen in figure, the cross-section for the two-voter L- $\Delta$ TMR configuration with D-SET filter section for the master stages is slightly decreased, since the number of detected errors is lower. As a result, the importance of the use of integrated D-SET filter inside TMR flip-flops is confirmed again.

The results of the third irradiation campaign are given in Figure 6.14. This time, an improved  $\Delta$ TMR configuration with deeper N-well for better SEL robustness and the three scannable S- $\Delta$ TMR cells were integrated in one package and exposed to radiation.



Figure 6.14: Cross-section as a function of effective LET of the third campaign

The curves in the diagram indicate the results of all test patterns, whereas all solid blue lines and the red-dashed curve show the resulting cross-sections of the  $45^{\circ}$ -angled experiments. The detection limit was determined to be  $2.44e^{-11}cm^2$  for the non-angled tests and  $6.21e^{-11}cm^2$  for the angled experiments, respectively. As can be derived from the curves, angled strikes obviously increase the cross-section leading to longer trajectories in the silicon resulting to more deposited charge. This is in contradiction to the results of the SDTMRRR05 device indicated by the red curve lines. As can be seen, the first errors occur above an LET of 40 MeV cm<sup>2</sup> mg<sup>-1</sup>, whereas unexpectedly, no error is seen for angled irradiation tests. It can be deduced, that IC SDTMRRR05 in the cavity was in the shadow of the ceramic package during irradiation leading to these irrational results. As a consequence, future devices under test are integrated as close as possible to each other in the middle of the cavity to prevent such side effects.

Nevertheless, based on these experimental results, it can be observed that the proposed flip-flop variants provide a high level of robustness to soft errors. The cross-section for  $45^{\circ}$ -angled strikes does not increase for LET below 30 MeV cm<sup>2</sup> mg<sup>-1</sup> and most particles are below an effective LET of 30 MeV cm<sup>2</sup> mg<sup>-1</sup>. Hence, the  $\Delta$ TMR concepts are still suitable solutions for most radiation-hard applications. Their robustness performance is comparable to some well-known flip-flop architectures introduced in Section 3.7.

However, the robustness of the  $\Delta$ TMR cells is achieved by circuit redundancy, additional integrated filter structures, and an MNCC-aware layout design. Future radiation campaigns of already-designed test vehicles will demonstrate alternative, more suitable candidates for key criteria, such as lower area occupation, for designs with tighter speed or low-power requirements. As a consequence, it is worth to test the novel DTMRC/D candidates with smaller spacings, the LM- $\Delta$ TMR variants and the TSPC T- $\Delta$ TMR cell with shorter propagation delay in future irradiation experiments. Moreover, the robustness of the novel clock-gating cell CG- $\Delta$ TMR and the developed architectures for self-correction have to be confirmed by tests as well.

### 6.3 Evaluation of the Design Flow for Realistic Applications

In this Section, two realistic applications are presented, to which the proposed design methodology with the corresponding library concept for RHBD circuits is applied. The first one focuses on the digital part of an AMS ADC design which has been published [19]. The second one is a more complex microcontroller architecture for space applications.

#### 6.3.1 A 7.5-15.5 MSPS 14-bit ADC Core

The first example is a 14-bit ADC for main target applications such as telemetry, tracking, and control. The selected architecture in this case is a digitally assisted MASH 1-1  $\Delta\Sigma$  ADC. The entire IP is divided in an analog core (aIP) and digital core (dIP). Both can be combined and integrated in later use as a hard macro in more complex SoC [127].

The block diagram of the prototype ADC architecture with its two sections is shown in Figure 6.15. The analog core digitizes the analog input into a mulit-bit stream. The digital section is responsible for filtering, configuration and control. It is equipped with an SPI for configuration of digital core registers and for test configuration of the analog core. Both is accessible via an analog test bus (ATB) interface. Moreover, Advanced Peripheral Bus (APB) registers for the ADC core configuration such as over-sampling or mode selection are additionally provided in order to bypass the digital logic at different data processing stages.

The digital filter occupies most of the resources of the digital core. The output is a 14-bit bus (DOUT) and a data ready signal (DREADY) indicating the arrival of a data. The over-sampling ratio of the prototype ADC can be configured by appropriate settings of the respective aforementioned configuration registers. In addition, an on-chip



Figure 6.15: Block scheme of the ADC architecture

signal generator (DSIGGEN) is integrated in the digital core for self-testing purpose. It provides a Delta-Sigma-modulated sine wave as reference input for the filter module.

As can be seen in Figure 6.16, the analog core IP is located at the left side of the die, as close as possible to the digital core. The prototype ADC additionally consists of special APB test registers labeled as reg0, reg1, and reg2 in this work. Two of them are realized with SC- $\Delta$ TMR self-correction flip-flops introduced and discussed in detail in Section 5.4.3. Register reg1 is mapped to novel  $\Delta$ TMR flip-flops implementing the SC- $\Delta$ TMR -V self-correction concept with a voter and transient filter as part of the internal feedback loop. Second, register reg2 consists of cells implementing the SC- $\Delta$ TMR -GG concept with its distributed guard-gate-based voter architecture inside the  $\Delta$ TMR cell. Finally, the reference circuit APB register reg0 is mapped to unhardened standard D-flip-flops (DFF) of the baseline library.



Figure 6.16: Sections of the ADC core with module and test register distribution

The implementation of the digital core is generated aligned to the proposed RHBD design methodology starting with an HDL design description. Since the  $\Delta$ TMR cells are modeled as standard flip-flops, there is no impact on the regular design flow for digital standard cell-based design. The netlist is generated with the use of a commercial logic synthesis tool. All flip-flops in the data-processing chain are mapped to the most robust candidate of  $\Delta$ TMR configuration (see Section 6.2.1 and 6.2.8). Almost 33k standard cells are required for logic realization. In particular, the mapping to the target flip-flop variant for reg0 to reg2 is controlled module-wise. They consist of 530 unhardened flip-flops and 2 x 530 SC- $\Delta$ TMR cells. Moreover, new  $\Delta$ TMR clock-gating cells (CG- $\Delta$ TMR) are instantiated for each of the special APB registers to control the clock activity. The rest of the memory cells of the digital part of the ADC core is directly mapped to more than 2500 RHBD- $\Delta$ TMR flip-flops. The resulting overhead of the sequential cell area is nearly eight times higher than for an unhardened design. A decrease in timing budget of critical timing paths of more than 1.5 ns with  $\Delta$ TMR flip-flops usage has been considered during design generation.

On the other side, the increase of local routing congestion overflows is solved by adequate cell padding (spacing) of  $\Delta$ TMR cells during placement. This frees more routing tracks and leads to better routing results. Critical nets such as clock and reset are robustly implemented with robust inverter cells (see Section 5.2.2). More than 2800 sinks are driven on the clock path. The clock trees for the system clock and SPI clock require 217 robust driver cells in total. Additional radiation-tolerant transient filter cells are directly connected to critical input signals at core side after input pad connection. Furthermore, sensitive domain-crossing signals, i.e., ADC configuration nets, reset and clock are driven by robust guard-gate buffer (GGB) acting as transient filter cells (see Section 3.5.2). In sum, 54 robust GGBs are used for a transient-free data transmission to the analog core including the forwarded system clock and reset signal.

The ADC prototype is fabricated in IHP's 130 nm BiCMOS technology. The microphotograph of the ASIC is depicted in Figure 6.17. The overall chip area is  $15 \text{ mm}^2$  in a squared-arranged layout.



Figure 6.17: Micro-photograph of the ADC prototype chip

The functional correctness is verified by electrical on-wafer measurements for pre-selection purpose under different operating conditions. The operating current (OPC) in full function dynamic mode of the digital core is about 30 mA at a system frequency of 125 MHz under nominal condition (1.2 V digital core power supply and room temperature). Good devices are integrated into a 68-pin ceramic quad flat package (CQFP) and remeasured afterwards. A subset of the ICs is prepared for TID and SEU irradiation test campaigns. No ADC device has shown any degradation up to the total dose of 500 krad(Si). Moreover, some preliminary results of heavy ion tests can already be mentioned in this work. As first irradiation measurements of the ADC prototype have shown, no SEL has been detected. Interestingly, both self-correction APB registers reg1 and reg2 show an LET threshold around 52 MeV cm<sup>2</sup> mg<sup>-1</sup> with zero and only two errors, respectively. The reference APB register made of unhardened standard DFF flip-flops (reg0) has a lower LET threshold of around 2 MeV cm<sup>2</sup> mg<sup>-1</sup> as expected. Hence, the robustness for two of the proposed cell concepts for self-correcting  $\Delta$ TMR flip-flops has been confirmed with these early measurement results.

#### 6.3.2 Microcontroller Chip

The example above has already demonstrated the success of the proposed design methodology and its respective cell design concepts. Nevertheless, the introduced overhead to obtain a desired robustness in terms of power, area, and timing is not discussed. This is exactly addressed in this Section on the example of a more complex microcontroller system with 512 KB on-chip memory, a core with Floating Point Unit (FPU), Memory Protection Unit (MPU), and Digital Signal Processing (DSP) unit. The architecture also provides several IPs and interfaces such as DACs, ADCs, CAN, SpaceWire, MIL-STD-1553C (see Figure 6.18).



Figure 6.18: Microcontroller architecture block scheme after [128]

The design exploration of the microcontroller chip is done in IHP's 130 nm technology with the use of the standard digital design flow. For that purpose, two different baseline standard cell libraries are selected for implementation. Moreover, cells such as transient filter cells,  $\Delta$ TMR flip-flops and clock-gates have been additionally developed for both baseline libraries for the design of RHBD implementations. Thus, robust but suitable design solutions are obtained with a reasonable overhead.

In the first run, a rich radiation-hardened standard cell library  $(L_R)$  is used, similar to the library used in [24]. All transistors in these special standard cells consist of ELT layouts for both n- and p-channel devices. Critical internal nodes are additionally protected by SEE-enhanced Miller capacitors. Moreover, all ELT devices are surrounded by Enhanced Guard Rings (EGR) in order to improve the SEL robustness [49]. In comparison to often used standard cell frames, the ground and power supply rails are located inside – in the middle – of the cells. The bottom and the top region of the cells are consequently occupied by well areas of the n- and p-channel transistors.

The library set for the second design exploration L\_RT is a combination of the aformentioned baseline ELT standard cell library L\_R together with an extension of specially developed  $\Delta$ TMR flip-flops, robust transient filter and  $\Delta$ TMR clock-gates. The baseline standard cell library (here: L\_I) of IHP's 130 nm technology is selected and evaluated as a third design case. This is a library without any radiation-hardened cells. Nevertheless, this library already contains larger buffers and inverters for high fanout net optimization and clock tree implementation. Finally, the extension L\_IT with additional RHBD- $\Delta$ TMR cells is selected in a last run.

Based on these four library setups all designs are preliminary explored. All designs have been equally constrained with same timing and design constraints for fair comparison. The most important parameters are extracted after clock tree implementation. Moreover, independent of the selected library set, the same hard macros, such as SRAM, DAC, and ADC are linked in all four design solutions. As an example, an initial floorplan of the microcontroller chip is depicted in Figure 6.19. As can be seen, a lot of area is already occupied by hard macros.



Figure 6.19: Early floorplan of the microcontroller

Table 6.8 lists the obtained results of the microcontroller design exploration for all four library sets. All parameters are extracted during the design with commercial logic synthesis and PnR tools. For simplification, the results of the static analysis of the total power ( $P_{tot}$ ) are calculated with a 0.2 setting for the input activity. Furthermore, the timing constraints and clock definitions are also considered during power estimation runs.

According to Table 6.8, the lowest amount of occupied area with  $51.3 \text{ mm}^2$  after synthesis step is obtained with the use of the unhardened non-ELT library setup L\_I. Thereafter, when the cell set with additional cells for radiation-hardened design is linked in parallel (L\_IT), an increase of 16.2% to  $59.6 \text{ mm}^2$  is noticed. This overhead is mainly driven by the mapping to the robust but larger-sized  $\Delta$ TMR cells. In fact, above 29k flip-flop cells are mapped during GLS step in order to realize the functional behavior of the microcontroller. On the other side, when the design is mapped to richer ELT standard cell libraries (L\_R and L\_RT), the area is significantly increased due to the larger cell frame size with its ELT devices and EGRs.

Moreover, the area after clock tree insertion is increased by more than 34% with the use of the robust L\_RT, and only by less than 18% when L\_IT is selected. In particular,

|                                                        | Library Set                                       | Overhead [%]                                |          |  |  |  |
|--------------------------------------------------------|---------------------------------------------------|---------------------------------------------|----------|--|--|--|
| L_R   L                                                | _RT    L_I   L_IT                                 | $\parallel L_R \rightarrow L_R T \parallel$ | L_I→L_IT |  |  |  |
| Logic Synthesis Stage                                  |                                                   |                                             |          |  |  |  |
| Area $[mm^2] \parallel 70.6 \mid 9$                    | $3.6 \parallel 51.3 \mid 59.6$                    | 32.5                                        | 16.2     |  |  |  |
| Optimized Pla                                          | Optimized Place and Route Stage (with Clock Tree) |                                             |          |  |  |  |
| Area $[mm^2] \parallel 94.6 \mid 12$                   | 27.4    67.1   78.9                               | 34.6                                        | 17.6     |  |  |  |
| $\begin{array}{ c c c c c c c c c c c c c c c c c c c$ | $7.1k \parallel 0.7k \mid 1.4k$                   | 222.0                                       | 97.8     |  |  |  |
| $P_{tot} [mW] \parallel 898.4 \mid 13$                 | $21.5 \parallel 472.9 \mid 763.8$                 | 47.1                                        | 61.5     |  |  |  |

Table 6.8: Results with different library sets at different design stages

the complexity of the clock tree (CT) differs depending in the selected library setup. Interestingly, the number of clock driver cells (CT-INV) is doubled for  $\Delta$ TMR approach in L\_IT and more than triplicated for the L\_RT case. Thus, an intuitive thought of a triplication of the clock tree when the number of sinks is triplicated by TMR is not reasonably derivable. However, an important fact is that if the ELT-standard cell library without additional cells is selected for implementation (i.e., L\_R), a robustness at lower LETs is implicitly given by the library design.

Nevertheless, the design needs to be optimized for timing after clock-tree insertion. With respect to the total power  $P_{tot}$ , the introduced overhead is 47.1 % for transformation from unhardened non-TMR to hardened  $\Delta$ TMR design with ELT standard cell library. Nearly 60 % are noticed in overhead for same transformation when the standard L\_I-design is implemented with the use of the additional  $\Delta$ TMR cells forming an L\_IT design solution.

As demonstrated by design comparison, the most robust solution (L\_RT) requires the most resources, whereas the unhardened solution with setup L\_I utilizes the lowest amount of resources. Consequently, a suitable alternative solution is a compromise between robustness and overhead. Thus, a selective-hardening approach for critical modules would be more beneficial for such a complex microcontroller design. Nonetheless, the overhead of  $\Delta$ TMR design solutions highly-depends on the selected baseline library characteristics as confirmed with these comparisons.

"Sicher ist, dass nichts sicher ist. Selbst das nicht." — Joachim Ringelnatz

## Chapter 7

# Conclusion

Reliability and robustness are the most important criteria in applications such as medicine, security, automotive, aviation, and space. The respective systems must operate properly and correctly under harsh operating conditions. In this Thesis, a system for such applications is referred to as a reliable and robust hardware system (RnRS). Two types of systems are addressed therein: the standard cell-based differential logic design and the design of RHBD circuits. Both can be developed with standard semiconductor technologies.

However, the circuit realization of an RnRS requires additional effort in design. The overall designs costs and overheads are increased due to alternative structures or redundancy required for a reliable and robust design. In general, almost all design measures have an impact on the total area occupation, power, and performance. Furthermore, the compliance to the digital design flow is another hard requirement when standard cell-based design is targeted. As a result, all these factors have to be considered during the development of new cell and library concepts and a design methodology for such systems.

This is exactly where this work comes in. It proposes cell concepts and a design methodology to obtain reliable standard cell-based differential logic designs starting from an RTL behavioral description. The digital design flow is extended using the well-known fat-wire design approach for compliance purpose. The place and route (PnR) stage is split into two phases, separated by an additional design conversion step. The netlist generation is enabled with the use of the proposed single-ended pseudo gate views. An intermediate fat-wire layout with placement and routing is generated in a first PnR design phase. Afterwards, the resulting pseudo design is transformed to real technologyrelated differential standard cell gates by a custom design converter tool. As a result, a differential logic design with unfinished parallel differential routes is developed. Its detailed pin-access is finalized in a second routing phase. The introduced routing congestion of the layouts for bipolar standard cells is relaxed by introducing and interfacing novel virtual fat-wire boundary pins during the first routing phase. The proposed standard cell and library concepts follow a modular approach to reduce the design costs and to ease the development of further logic gates to address the power, area, and delay overhead. The concepts are applied to an example design together with the proposed design methodology. A clock counter of a PLL feedback has been implemented in IHP's 250 nm standard technology. The standard cell-based design generation of differential logic design is successfully demonstrated with a resulting in-parallel-routed design solution. It shows closer wire length and capacitance match of the differential signal pairs in comparison to regular routing technique.

Moreover, cell concepts and a design methodology for RHBD circuits are proposed in this Thesis. One of the most important results of this work are the novel cell concepts for hardware redundancy-based  $\Delta$ TMR flip-flops. Several cell configurations with different functional features and performances are proposed, developed, and compared to each other. In particular, the scan-test support, the self-correction of TMR flip-flops, and the clock-gating feature are covered and enabled for the first time by new cell concepts for  $\Delta$ TMR. The new cell architectures are implemented and evaluated in IHP's 130 nm technology.

Another focus is set on the reduction of the introduced overhead by triplication and the essential control inside the new cells. As an example, a  $\Delta$ TMR flip-flop made of TSPC flip-flops shows a similar speed-performance as the unhardened standard Dflip-flop. Interestingly, a reduction in delay overhead of nearly 50 % in comparison to first  $\Delta$ TMR flip-flop solutions is achievable. All layout views of these cells follow an MNCC-aware layout design by fulfilling additional spacing constraints to maintain the robustness and cope with the charge-sharing effect. Promising circuit solutions are evaluated in test structures and test vehicles for irradiation SEU test campaigns. The robustness of the novel flip-flops is shown by high LET thresholds of 46.1 MeV cm<sup>2</sup> mg<sup>-1</sup> for latch-based  $\Delta$ TMR (L- $\Delta$ TMR) and 62.5 MeV cm<sup>2</sup> mg<sup>-1</sup> for  $\Delta$ TMR, respectively. Scannable configurations demonstrate also a suitable robustness until an LET threshold of 30.0 MeV cm<sup>2</sup> mg<sup>-1</sup>. Finally, early measurement results of two self-correction  $\Delta$ TMR structures confirm a radiation-hardness up to 52.0 MeV cm<sup>2</sup> mg<sup>-1</sup>.

The concepts and methodology have been applied to the digital part of an ADC core for space applications. As also shown here, even though a standard technology is used for implementation, the use of  $\Delta$ TMR cells and additional RHBD logic gates enable a robust design. Moreover, the overhead of that strategy is evaluated on a complex microcontroller example. The impact of the proposed hardening concept is compared for four design cases with the use of two different baseline libraries.

However, new challenges are to be addressed in future work. For the design flow for differential logic design, an additional custom verification stage can be added to the flow to verify the logical equivalence between a SEPG and the finally generated DLG design.

One of the tasks for RHBD circuit development is the extension of the portfolio with more  $\Delta$ TMR concepts with further features. The cells could be additionally equipped with local power-gating or data-retention functionality in order to enhance the lowpower performance of RHBD  $\Delta$ TMR designs. Alternatively, the adaptability of the performance of  $\Delta$ TMR cells is maybe also an interesting feature. The cell configuration may offering different operating modes, e.g., high-performance mode or full-redundant mode. The behavior could be locally implemented inside the  $\Delta$ TMR memory cells, whereas the mode control can be executed by multi-processor systems to bring its cores in different modes. Thus, a system can become more adaptive to the exposed conditions. It may adjust the performance or increase the reliability depending on the current situation.

This Thesis proposes design methodologies and describes the mentioned concepts in the mature but relevant technologies for relevant applications. The concepts have been implemented and evaluated in 130 nm and 250 nm technologies. The approaches for both types of hardware systems are compatible to each other. Thus, they are also applicable to the same technology node. This allows to obtain a hybrid reliable and robust hardware system, such as a robust RHBD circuit with, e.g., local reliable differential logic modules. However, the proposed concepts and methodologies are also applicable to other scaled technologies. Nevertheless, due to complexity and costs beyond the scope of this work, this can be addressed as future work as well.

# Appendix

The following Listing is the ASCII representation of the abstract layout information illustrated in Figure 2.2 on page 14. As can be seen in line two and seven, the cell is declared as a core cell with a site definition of CoreSite. Layer shapes are defined for every pin. Moreover, the VSS and VDD pins consist of special ABUTMENT attributes to declare the shapes to rails in order to force a different strategy for the connection of these nets during implementation by the PnR tool.

Listing 7.1: LEF representation of inverter cell

```
MACRO INVX4
     CLASS CORE ;
     ORIGIN 0 0 ;
     FOREIGN INVX4 0 0 ;
     SIZE 2.5 BY 3.78 ;
\mathbf{5}
     SYMMETRY X Y ;
     SITE CoreSite ;
     PIN VDD
       DIRECTION INOUT ;
       USE POWER ;
10
       SHAPE ABUTMENT ;
       PORT
          LAYER Metal1 ;
            RECT 0 3.52 2.5 4.04 ;
15
       END
     END VDD
     PIN VSS
       DIRECTION INOUT ;
       USE GROUND ;
       SHAPE ABUTMENT ;
20
       PORT
          LAYER Metal1 ;
            RECT 0 -0.26 2.5 0.26 ;
       END
25
     END VSS
     PIN A
       DIRECTION INPUT ;
       USE SIGNAL ;
       PORT
          LAYER Metal1 ;
30
            RECT 0.09 1.47 0.395 2.035 ;
       END
     END A
```

```
PIN Q
      DIRECTION OUTPUT ;
35
       USE SIGNAL ;
       PORT
         LAYER Metal1 ;
           RECT 0.59 0.915 0.895 2.865 ;
          RECT 0.59 1.745 1.91 2.035 ;
40
          RECT 1.605 0.915 1.91 2.865 ;
      END
     END Q
     DENSITY
45
     . . .
    END
   END INVX4
```

The Listing below shows the .lib-file content of an abstract D-flip-flop. As can be seen, the pin directions and characteristics are described. The timing constraints, delays, and power values are given in LUT format in relation to the input properties. Here, the index index\_1/index\_2 stand for, e.g., input transition times and/or connected loads. Depending on the connectivity, the corresponding values for power and timing are selected during design extraction.

Listing 7.2: A pseudo D-flip-flop in .lib syntax

```
cell (DFF) {
       area : 200 ;
       cell_footprint : "DFF";
3
       cell_leakage_power : 1000.0;
       pin (Q) {
         direction : "output";
         function : "IQ";
         max_capacitance : 0.1;
8
         timing () {
            related_pin : "C";
            timing_sense : non_unate;
            timing_type : rising_edge;
            cell_rise (tdly_2x2) {
13
              index_1 ("0.010, 0.1");
              index_2 ("0.005, 0.1");
              values ( \setminus
                "0.371, 0.492", \
                "0.401, 0.053" \
18
              );
            }
            rise_transition (tdly_2x2) {
                  . . .
            }
23
            cell_fall (tdly_2x2) {
                  . . .
            }
            fall_transition (tdly_2x2) {
28
                  . . .
             }
         internal_power () {
            related_pin : "C";
            rise_power (pwrt_2x2) {
              index_1 ("0.010, 0.1");
33
              index_2 ("0.005, 0.1");
            }
            fall_power (pwrt_2x2) {
38
                  . . .
            }
       }
```

```
pin (C) {
         clock : true;
         direction : "input";
43
         max_transition : 2.5;
         capacitance : 0.015;
       }
       pin (D) {
         direction : "input";
48
         nextstate_type : "data";
         max_transition : 2.5;
         capacitance : 0.015;
         timing () {
            related_pin : "C";
53
            timing_type : hold_rising;
            rise_constraint (constrt_2x2) {
              index_1 ("0.01, 0.500");
              index_2 ("0.01, 0.500");
              values ( \setminus
58
                "-0.0489795, -0.0263327", \setminus
                "-0.144534, -0.100758" \
              );
            }
            fall_constraint (constrt_2x2) {
63
                     . . .
           }
         }
         timing () {
            related_pin : "C";
68
            timing_type : setup_rising;
            rise_constraint (constrt_2x2) {
                  . . .
            }
73
            fall_constraint (constrt_2x2) {
                . . .
            }
         }
       }
       ff (IQ,IQN) {
78
         clocked_on : "C";
         next_state : "D";
       }
     }
```

# **List of Figures**

| 1.1  | Possible effects and acting roles general ICs have to deal with                                                     | 4  |
|------|---------------------------------------------------------------------------------------------------------------------|----|
| 1.2  | Key actions of an RnR-System (RnRS)                                                                                 | 4  |
| 2.1  | Chart for standard cell library model/view generation                                                               | 13 |
| 2.2  | Layout views of an inverter: (a) mask layout, (b) abstract layout                                                   | 14 |
| 2.3  | Important timing windows of a rising-edge-triggered D-flip-flop                                                     | 16 |
| 2.4  | Standard digital design flow in a vertical arrangement                                                              | 17 |
| 2.5  | Main stages of the physical implementation flow                                                                     | 19 |
| 2.6  | Register-to-register path with combinational logic                                                                  | 21 |
| 2.7  | Scan path and mapped scannable flip-flops                                                                           | 21 |
| 2.8  | The clock-gate: (a) standard clock-gate scheme, (b) waveform of a clock-gate                                        | 22 |
| 2.9  | Generated Xtalk and impact in a system configuration                                                                | 23 |
| 2.10 | Differential signaling: (a) a differential buffer, (b) the respective differential                                  |    |
|      | signaling, (c) the difference of the differential signal pair $\ldots \ldots \ldots \ldots$                         | 25 |
| 2.11 | Differential amplifier CML stage                                                                                    | 25 |
| 2.12 | Differential CML amplifier with emitter-follower (ECL)                                                              | 26 |
| 2.13 | CML gates at transistor-level: (a) a differential CML 2:1 multiplexer, (b) an                                       |    |
|      | 8-input wired-OR CML gate                                                                                           | 27 |
| 2.14 | Comparison of CMOS and bipolar CML inverter                                                                         | 28 |
| 2.15 | Layout views of transistors: (a) NMOS transistor, (b) npn-HBT                                                       | 29 |
|      | Abstract layout of an NMOS ELT transistor                                                                           | 30 |
| 2.17 | An energetic particle strike which may cause an SEE                                                                 | 30 |
| 2.18 | Particle hit in digital circuits: (a) SET generation and resulting SEU, (b)                                         |    |
|      | generated SEU by direct hit of a memory cell                                                                        | 32 |
|      | Cross-section and Weibull fit                                                                                       | 33 |
|      | Masking effects of resulting transient by single particle hit                                                       | 35 |
| 2.21 | Scheme and transient response of a TMR gate: (a) baseline TMR scheme                                                |    |
|      | applied to flip-flops, (b) transient response                                                                       | 36 |
| 2.22 | Reliabilities of a single system S (red), a fault-free TMR arrangement (blue),<br>and TMR with faulty voter (black) | 38 |
| 2.23 | Temporal redundancy applied to: (a) the data path, (b) the clock path                                               | 38 |
|      | Waveforms of TR approaches: (a) single error correction by data path-related                                        |    |
|      | TR, (b) single error correction by clock path-related TR                                                            | 39 |
| 3.1  | Different logic cells: (a) a static CMOS logic cell, (b) the WDDL counterpart                                       | 42 |

| 3.2  | Main changes on the standard digital design flow proposed in [76]                                                                                             | 42 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.3  | Differences of the design flow after [11, 77]                                                                                                                 | 43 |
| 3.4  | Wire structures with pin-access and centerlines: (a) fat-wire signal, (b) the                                                                                 |    |
|      | differential (split) counterpart                                                                                                                              | 44 |
| 3.5  | Transistor-level scheme of a master-slave D-flip-flop                                                                                                         | 46 |
| 3.6  | An inverting TSPC flip-flop after [85]                                                                                                                        | 47 |
| 3.7  | SET characterization results of inverter cells                                                                                                                | 48 |
| 3.8  | Transient mitigation: (a) block scheme of transient filter with voter stage, (b) waveform shows properly-filtered transient (green) and propagating transient |    |
|      | which exceeds the implemented $\delta$ -delay of the filter (red)                                                                                             | 49 |
| 3.9  | Guard-gate circuits: (a) guard-gate inverter (GGI), (b) GGI with driver and optional feedback inverter (GG/GGB), (c) truth table of a GG/GGB                  | 50 |
| 3.10 |                                                                                                                                                               |    |
|      | scheme, (b) waveform in error-free and error scenario                                                                                                         | 50 |
| 3.11 | DSET-D1D2 transient filter concept with inverter cells as delay elements: (a) the gate-level scheme, (b) the waveform in the error-free case (green) and a    |    |
|      | captured SET with resulting upset (red)                                                                                                                       | 51 |
| 3.12 | Transient filter concepts: (a) transient suppressor with AND-OR-MUX struc-                                                                                    |    |
|      | ture [93], (b) concept with guard-gates and one delay element [94] $\ldots$ .                                                                                 | 52 |
| 3.13 | TMR distributed skewed clocks in multi-bit registers designs after $[96]$                                                                                     | 52 |
| 3.14 | The DICE storage cell scheme after $[104]$                                                                                                                    | 54 |
| 3.15 | Transistor scheme of a Quatro cell $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$                                                    | 54 |
| 3.16 | Block schemes: (a) the TSPC-Quatro, and (b), the TSPC-DICE $\ . \ . \ . \ .$                                                                                  | 56 |
| 3.17 | BISER in a latch-based flip-flop design                                                                                                                       | 56 |
| 3.18 | Scheme of FTMR with shared clock network $\hfill \ldots \ldots \ldots \ldots \ldots \ldots$                                                                   | 57 |
| 3.19 | Circuit scheme of a $\Delta {\rm TMR}$ flip-flop with transient filter elements on the data                                                                   |    |
|      | path and a single majority voter (MAJ) $\ldots$                                                                                                               | 58 |
| 3.20 | Concept for scan-test after $[16]$                                                                                                                            | 59 |
|      | Cell frame of a $\Delta$ TMR flip-flop in 130 nm technology                                                                                                   | 59 |
| 3.22 | Self-correcting flip-flop without mode control after $[122]$                                                                                                  | 60 |
| 3.23 | Standard cell-based voter solutions: (a) $3 \ge 1000$ x AND2 and OR3 composition, (b)                                                                         |    |
|      | a single OA222 gate, (c) XOR-multiplexer-based voter of [73]                                                                                                  | 61 |
| 3.24 | Guard-gate inverter-based (GGI-based) voter architecture with driver                                                                                          | 62 |
| 4.1  | Overview of the contribution to obtain reliable differential logic designs $\ . \ .$ .                                                                        | 67 |
| 4.2  | Bias concept for two domains of differential logic                                                                                                            | 69 |
| 4.3  | Logical sections of a differential CML latch $\hfill \ldots \ldots \ldots \ldots \ldots \ldots$ .                                                             | 70 |
| 4.4  | Derivable configurations with one npn-Section A $\ldots \ldots \ldots \ldots \ldots \ldots$                                                                   | 71 |
| 4.5  | Level-shifter: (a) transistor-level scheme with both output levels, (b) level-<br>shifter in application for the reset and clock signals of a CML latch       | 75 |
| 4.6  | Gate transformation of an original differential logic gate to SEPG variants:<br>(a) the baseline differential gate with OR-function, (b) the derived single-  |    |
|      | ended OR/NOR-variants, and (c), the AND/NAND gates                                                                                                            | 77 |

| 4.7   | Two approaches for SEPG-model generation from different baseline libraries . $\ 77$                                                                                                                                                                          |
|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4.8   | Illustration of fat-wire and differential wire definitions                                                                                                                                                                                                   |
| 4.9   | A fat-wire compatible standard cell layout frame of a differential CML latch . $$ 80 $$                                                                                                                                                                      |
| 4.10  | Layout views: (a) fat-wire layout with fat-wire boundary pin-shapes and additional routing blockages, (b) layout of the differential counterpart with depicted preserved area for the final pin-access                                                       |
| 4.11  | Encapsulation of a differential CML cell with wrappers for CZ                                                                                                                                                                                                |
|       | Two approaches for model generation                                                                                                                                                                                                                          |
|       | Forming the fat-wire pins: (a) procedure for fat-wire pin generation, (b) the respective LEF syntax of the differential layout, and (c), the generated FW-                                                                                                   |
| 1 1 1 |                                                                                                                                                                                                                                                              |
|       |                                                                                                                                                                                                                                                              |
|       | Proposed design flow for standard cell-based differential logic design 86<br>SEPG scheme of a div4/div5 clock divider                                                                                                                                        |
|       | Design conversion: from SEPG to DIFF design                                                                                                                                                                                                                  |
|       | Routing of a design, (a) fat-wire design, (b) the split variant after conversion 92                                                                                                                                                                          |
|       | Detailed pin-access generated in the second routing phase                                                                                                                                                                                                    |
| 4.19  | Detailed phi-access generated in the second routing phase                                                                                                                                                                                                    |
| 5.1   | Overview of the contribution to obtain robust RHBD circuits                                                                                                                                                                                                  |
| 5.2   | Proposed pure standard cell-based transient filter cells: (a) AND-OR-based,                                                                                                                                                                                  |
|       | (b) the NAND-based optimization                                                                                                                                                                                                                              |
| 5.3   | Waveform of NAND-based transient filter in action                                                                                                                                                                                                            |
| 5.4   | General concept of logical sections of RHBD- $\Delta$ TMR cell                                                                                                                                                                                               |
| 5.5   | Transient filter concept applied to D-SET Filter Section                                                                                                                                                                                                     |
| 5.6   | Scheme and waveforms of the proposed DSET-D1-GG configuration: (a) gate-level scheme, (b) properly-filtered transient, (c) propagation of large transient to all three internal nodes, (d) one directly-induced internal transient104                        |
| 5.7   | Different D-flip-flop configurations with standard cell gates: (a) single D-flip-<br>flop gate (DFF), (b) master-slave D-latch arrangements (LDFF), (c) decom-<br>position with multiplexed D-latch outputs (LMDFF)                                          |
| 5.8   | Voter solutions: (a) NAND-gate variant V-ND, (b) inverting GG arrangement V-GGN, (c) V-GGN plus driver cell (V-GG)                                                                                                                                           |
| 5.9   | Concepts for scannable $\Delta$ TMR flip-flops with highlighted internal scan-paths:<br>(a) S- $\Delta$ TMR-I with scannable D-flip-flops, (b) S- $\Delta$ TMR-II with the use of standard D-flip-flops and D-SET filter section with multiplexed inputs 111 |
| 5.10  | Logical scheme of the proposed RHBD- $\Delta$ TMR CG-cell                                                                                                                                                                                                    |
| 5.11  | Block scheme of the first self-correction concept SC-S- $\Delta$ TMR                                                                                                                                                                                         |
| 5.12  | Transistor-level simulation with activated self-correction mode                                                                                                                                                                                              |
| 5.13  | Block scheme of SC- $\Delta$ TMR-V concept: (a) open slave latch of the modified baseline flip-flops (O_FF), (b) TMR arrangement with additional FB-SET                                                                                                      |
|       | $module \dots \dots$                                                                                                                                   |

| 5.14 | Circuit and block-level scheme of the SC- $\Delta$ TMR-GG approach: (a) internal                                                                                                 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|      | architecture of a single open latch (the second stage of the open flip-flop                                                                                                      |
|      | with GG-outputs (O_FFG)), (b) self-correcting TMR flip-flop without D-                                                                                                           |
| - 1- | SET Filter Section                                                                                                                                                               |
|      | Hidden timing violation of the third flip-flop during CZ $\dots \dots \dots$     |
| 5.10 | General cell frame of a RHBD- $\Delta$ TMR flip-flop with spacing                                                                                                                |
| 6.1  | Block diagram of the M/N+1-counter in a PLL application $\ldots \ldots \ldots \ldots 128$                                                                                        |
| 6.2  | Fat-wire routing result (left) with region of interest after second routing phase                                                                                                |
|      | (right) with routing artifacts due to signal inversion                                                                                                                           |
| 6.3  | Comparison of the routing results of the regular routing and the proposed                                                                                                        |
|      | 2-Phase routing: (a) histogram of wire length, (b) X-Y plot of the total wire                                                                                                    |
|      | capacitance                                                                                                                                                                      |
| 6.4  | Logical sections of $\Delta$ TMR flip-flop with highlighted logical grouping 133                                                                                                 |
| 6.5  | Placement and signal routing of a baseline $\Delta$ TMR flip-flop $\ldots \ldots \ldots \ldots 133$                                                                              |
| 6.6  | Abstract component placement of an L- $\Delta$ TMR flip-flop $\dots \dots \dots$ |
| 6.7  | One-row layout frame of a novel LM- $\Delta$ TMR flip-flop                                                                                                                       |
| 6.8  | Two-row layout frame of a novel SC-S- $\Delta$ TMR flip-flop                                                                                                                     |
| 6.9  | Extracted propagation delays of DFF, TDFF, and a T- $\Delta$ TMR cell 140                                                                                                        |
| 6.10 | Block scheme of the proposed RHBD- $\Delta \mathrm{TMR}$ CG with highlighted LCGs $~$ .<br>. 142                                                                                 |
| 6.11 | Implementation of test vehicles for electrical measurements and irradiation                                                                                                      |
|      | tests: (a) layout view of $\Delta {\rm TMR}$ shift register, (b) a micro-photograph $\ . \ . \ . \ 143$                                                                          |
| 6.12 | Integration of four shift register CUTs in one package                                                                                                                           |
| 6.13 | Cross-section as a function of effective LET (LET $_{\rm eff}$ )                                                                                                                 |
| 6.14 | Cross-section as a function of effective LET of the third campaign $\ldots \ldots 147$                                                                                           |
| 6.15 | Block scheme of the ADC architecture                                                                                                                                             |
| 6.16 | Sections of the ADC core with module and test register distribution $\ldots \ldots 149$                                                                                          |
| 6.17 | Micro-photograph of the ADC prototype chip                                                                                                                                       |
| 6.18 | Microcontroller architecture block scheme after $[128]$                                                                                                                          |
| 6.19 | Early floorplan of the microcontroller                                                                                                                                           |
|      |                                                                                                                                                                                  |

## List of Tables

| 4.1 | CML/ECL library specification                                                                               |
|-----|-------------------------------------------------------------------------------------------------------------|
| 4.2 | Bias current configurations and resulting voltage swing and NM 68                                           |
| 4.3 | Characteristics of initial speed-classes of a $3.3\mathrm{V}\text{-compatible}$ CML latch $~$ 72            |
| 4.4 | Speed-class extensions of a 3.3 V CML latch                                                                 |
| 4.5 | Extracted results for some ISCAS89 benchmark circuits                                                       |
| 4.6 | Placement and routing specification for the selected $250 \text{ nm}$ technology 79                         |
| 5.1 | Comparison of different D-SET filter section configurations $\ldots \ldots \ldots \ldots \ldots \ldots 105$ |
| 5.2 | Comparison of different baseline memory cells proposed for triplication 107                                 |
| 5.3 | Comparison of different majority voter configurations                                                       |
| 5.4 | Comparison of the proposed self-correction approaches                                                       |
| 6.1 | Novel $\Delta$ TMR cells compared to reference DFF and reference $\Delta$ TMR flip-flop 134                 |
| 6.2 | LM- $\Delta$ TMR flip-flop experiments in comparison to reference L- $\Delta$ TMR 136                       |
| 6.3 | L- $\Delta$ TMR and LM- $\Delta$ TMR cells in comparison to reference flip-flops $\ldots \ldots 137$        |
| 6.4 | Novel S- $\Delta$ TMR and SC-S- $\Delta$ TMR cells compared to reference flip-flops 139                     |
| 6.5 | T- $\Delta$ TMR in comparison to $\Delta$ TMR cells and standard flip-flop (DFF) 141                        |
| 6.6 | CG- $\Delta$ TMR in comparison to standard and $\Delta$ TMR D-flip-flops                                    |
| 6.7 | Overview of CUTs in test vehicle implementations and campaigns $\ldots \ldots 145$                          |
| 6.8 | Results with different library sets at different design stages                                              |
|     |                                                                                                             |

## Listings

| A 1-row standard cell SITE definition            | 14                                               |
|--------------------------------------------------|--------------------------------------------------|
| Asynchronous initialization of an n-bit register | 87                                               |
| XML configuration for the design converter       | 91                                               |
| LEF representation of inverter cell              | 159                                              |
| A pseudo D-flip-flop in .lib syntax              | 161                                              |
|                                                  | Asynchronous initialization of an n-bit register |

## Bibliography

- R. Ecoffet, "Overview of In-Orbit Radiation Induced Spacecraft Anomalies," *IEEE Transactions on Nuclear Science*, vol. 60, pp. 1791–1815, June 2013.
- [2] M. Sheetz, "SpaceX to lose as many as 40 Starlink satellites due to space storm." https://www.cnbc.com/2022/02/09/spacex-losing-starlink-satellites-dueto-geomagnetic-space-storm.html, 2022. [Available online, last access: June 6th, 2022].
- [3] Microsemi, "Neutron-Induced Single Event Upset (SEU) FAQ." https://www.microsemi.com, August 2011. [Available online, last access: May 2022].
- [4] Komite Nasional Keselamatan Transportasi, "Aircraft Accident Investigation Report," 2018. [Available online, last access: May 2022].
- [5] Ministry of Transport, "Interim Investigation Report on Accident to the B737-8 (MAX)," 2019. [Available online, last access: April 2022].
- [6] National Transportation Safety Board, "Assumptions used in the safety assessment process and the effects of multiple alerts and indications on pilot performance." https://www.ntsb.gov, 2019. [Available online, last access: April 2022].
- [7] L. Cooke, M. Goossens, P. Hoxey, T. Inoue, D. Overhauser, P. Saxena, and R. Singh, "Signal Integrity Effects in System-on-Chip Designs - A Designer's Perspective," Signal Integrity Effects in Custom IC and ASIC Designs, 2001.
- [8] R. Singh, Signal Integrity Effects in Custom IC and ASIC Designs. IEEE Press, Wiley-Interscience, 2002.
- [9] F. Vater, "Entwicklung von Designkonzepten zur Verbesserung der Seitenattacken-Resistenz von Krypto-Beschleunigern," Master's thesis, Technische Universität Cottbus, 2007.
- [10] P. Kocher, J. Jaffe, and B. Jun, "Differential Power Analysis," in Advances in Cryptology — CRYPTO' 99 (M. Wiener, ed.), (Berlin, Heidelberg), pp. 388–397, Springer Berlin Heidelberg, 1999.

- [11] S. Badel, I. Hatirnaz, Y. Leblebici, and E. J. Brauer, "Implementation of Structured ASIC Fabric Using Via-Programmable Differential MCML Cells," in 2006 IFIP International Conference on Very Large Scale Integration, pp. 234–238, Oct 2006.
- [12] F. Herzel, S. Osmany, K. Hu, K. Schmalz, U. Jagdhold, J. Scheytt, O. Schrape, W. Winkler, R. Follmann, D. Köther, T. Kohl, O. Kersten, T. Podrebersek, H.-V. Heyer, and F. Winkler, "An integrated 8-12 GHz fractional-N frequency synthesizer in SiGe BiCMOS for satellite communications," *Analog Integrated Circuits and Signal Processing*, vol. 65, pp. 21–32, 2010. 10.1007/s10470-010-9454-z.
- [13] STMicroelectronics, "Data brief rad hard 65nm cmos technology platform for space applications." https://www.st.com, 2 2015. [Available online, last access: February 2022].
- [14] Atmel, "Rad-Hard 150 nm SOI CMOS Cell-based ASIC for Space Use." https://ww1.microchip.com, 2020. [Available online, last access: May 2022].
- [15] R. C. Lacoe, "Improving Integrated Circuit Performance Through the Application of Hardness-by-Design Methodology," *IEEE Transactions on Nuclear Science*, vol. 55, no. 4, pp. 1903–1925, 2008.
- [16] V. Petrovic and M. Krstic, "Design Flow for Radhard TMR Flip-Flops," in 2015 IEEE 18th International Symposium on Design and Diagnostics of Electronic Circuits Systems, pp. 203–208, April 2015.
- [17] O. Schrape, M. Herrmann, F. Winkler, and M. Krstic, "Routing Approach for Digital, Differential bipolar Designs using Virtual fat-wire Boundary Pins," in 20th IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems, DDECS 2017, Dresden, Germany, April 19-21, 2017, pp. 122–126, 2017.
- [18] O. Schrape, A. Breitenreiter, S. Zeidler, and M. Krstic, "Aspects on Timing Modeling of Radiation-Hardness by Design Standard Cell-Based ΔTMR Flip-Flops," in 2019 22nd Euromicro Conference on Digital System Design (DSD), pp. 639–642, Aug 2019.
- [19] O. Schrape, A. Breitenreiter, L. Lu, E. P. Garcia, and M. Krstić, "Mit konventioneller Technologie zum strahlungsharten AMS-Design," in Workshop Testmethoden und Zuverlässigkeit von Schaltungen und Systemen, 2021.
- [20] O. Schrape, A. Breitenreiter, C. Schulze, S. Zeidler, and M. Krstić, "Radiation-Hardness-by-Design Latch-based Triple Modular Redundancy Flip-Flops," in 2021 12th IEEE Latin American Symposium on Circuits and Systems, 2021.

- [21] O. Schrape, M. Andjelković, A. Breitenreiter, A. Balashov, and M. Krstic, "Design Concept for Radiation-Hardening of Triple Modular Redundancy TSPC Flip-Flops," in 2020 23rd Euromicro Conference on Digital System Design (DSD), Aug 2020.
- [22] O. Schrape, M. Andjelković, A. Breitenreiter, S. Zeidler, A. Balashov, and M. Krstić, "Design and Evaluation of Radiation-Hardened Standard Cell Flip-Flops," *IEEE Transactions on Circuits and Systems I: Regular Papers*, pp. 1–14, 2021.
- [23] M. Krstic, J. Schmidt, A. Breitenreiter, F. Teply, and R. Sorge, "Evaluierung einer strahlungsharten Bibliothek in 0.13µm BiCMOS," in *DLR Bauteilekonferenz*, 2018. DLR Bauteilekonferenz.
- [24] A. Simevski, P. Skoncej, C. Calligaro, and M. Krstic, "Scalable and Configurable Multi-Chip SRAM in a Package for Space Applications," in 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 1–6, Oct 2019.
- [25] Cadence, "LEF/DEF 5.8 Language Reference." http://www.cadence.com, 2018. [Available online, last access: May 2019].
- [26] Cadence, "Innovus Implementation System." http://www.cadence.com, 2021. [Available online, last access: May 2021].
- [27] Synopsys, "PrimeLib: Unified Library Characterizationand Validation." https://www.synopsys.com, 2022. [Available online, last access: March 2022].
- [28] Cadence, "Liberate Characterization Solution." http://www.cadence.com, 2020. [Available online, last access: August 2020].
- [29] E. Brunvand, Digital VLSI Chip Design with Cadence and Synopsys CAD Tools. USA: Addison-Wesley Publishing Company, 1st ed., 2009.
- [30] Synopsys, "Design Compiler." https://www.synopsys.com, 2019. [Available online, last access: May 2020].
- [31] Cadence, "Genus Synthesis Solution." http://www.cadence.com, 2022. [Available online, last access: March 2022].
- [32] T. Kobayashi, T. Matsue, and H. Shiba, "Flip-Flop Circuit with FLT Capability," In Proc.IECEO Conference, p. 962, 1968.
- [33] F. Vater, Secure Scan Chain and Debug Interface. PhD thesis, Brandenburgische Technische Universität Cottbus-Senftenberg, 2017.
- [34] Synopsys, "TestMAX ATPG." https://www.synopsys.com, 2021. [Available online, last access: May 2021].

- [35] Cadence, "Cadence Modus DFT Software Solution." https://www.cadence.com, 2022. [Available online, last access: March 2022].
- [36] M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low Power Methodology Manual for System-on-Chip Design. Springer Publishing Company, Incorporated, 2007.
- [37] K. Pokhrel, "Physical and Silicon Measures of Low Power Clock Gating Success: An Apple to Apple Case Study," *SNUG*, 2007.
- [38] Intel/Altera, "Basic Principles of Signal Integrity." https://www.intel.com, December 2007. Last Access: 2021-01-11.
- [39] M. Alioto and G. Palumbo, Model and Design of Bipolar and MOS Current-Mode Logic: CML, ECL and SCL Digital Circuits. Springer, 2005.
- [40] J. Ward, "Early Transistor History at IBM Hannon S. Yourke." http://semiconductormuseum.com/Transistors/IBM/OralHistories/Yourke/ Yourke\_Index.htm, 2004. [Available online, last access: November 2021].
- [41] D. Lewine, J. Guyer, B. Baxter, and C. Moriondo, "An ECL implementation of the Motorola 88000," in COMPCON Spring '89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage, Digest of Papers., pp. 27 -31, feb-3 1989.
- [42] E. W. Brown, A. Agrawal, T. Creary, M. F. Klein, D. Murata, and J. Petolino, "Implementing Sparc in ECL," *IEEE Micro*, vol. 10, pp. 10–22, 1990.
- [43] P. E. Dodd, M. R. Shaneyfelt, J. R. Schwank, and J. A. Felix, "Current and Future Challenges in Radiation Effects on CMOS Electronics," *IEEE Transactions on Nuclear Science*, vol. 57, no. 4, pp. 1747–1763, 2010.
- [44] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba, "Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule," *IEEE Transactions on Electron Devices*, vol. 57, no. 7, pp. 1527–1538, 2010.
- [45] J. R. Schwank, M. R. Shaneyfelt, and P. E. Dodd, "Radiation Hardness Assurance Testing of Microelectronic Devices and Integrated Circuits: Radiation Environments, Physical Mechanisms, and Foundations for Hardness Assurance," *IEEE Transactions on Nuclear Science*, vol. 60, no. 3, pp. 2074–2100, 2013.
- [46] H. J. Barnaby, "Total-Ionizing-Dose Effects in Modern CMOS Technologies," IEEE Transactions on Nuclear Science, vol. 53, no. 6, pp. 3103–3121, 2006.
- [47] W. J. Snoeys, T. A. P. Gutierrez, and G. Anelli, "A new NMOS layout structure for radiation tolerance," in 2001 IEEE Nuclear Science Symposium Conference Record (Cat. No.01CH37310), vol. 2, pp. 822–826 vol.2, 2001.

- [48] Li Chen and D. M. Gingrich, "Study of N-channel MOSFETs with an enclosed-gate layout in a 0.18um CMOS technology," *IEEE Transactions on Nuclear Science*, vol. 52, no. 4, pp. 861–867, 2005.
- [49] C. Calligaro and U. Gatti, Rad-hard Semiconductor Memories. River Publishers, 2018.
- [50] R. R. Troutman, Latchup in CMOS Technology: The Problem and Its Cure. The Springer International Series in Engineering and Computer Science, Springer US, 1986.
- [51] S. Weidling and M. Goessel, "Fault tolerant linear state machines," in 2014 15th Latin American Test Workshop - LATW, pp. 1–6, 2014.
- [52] J. S. Kauppila, L. W. Massengill, D. R. Ball, M. L. Alles, R. D. Schrimpf, T. D. Loveless, J. A. Maharrey, R. C. Quinn, and J. D. Rowe, "Geometry-aware singleevent enabled compact models for sub-50 nm partially depleted silicon-on-insulator technologies," *IEEE Transactions on Nuclear Science*, vol. 62, pp. 1589–1598, Aug 2015.
- [53] M. Andjelković, A Methodology for Characterization, Modeling and Mitigation of Single Event Transient Effects in CMOS Standard Combinational Cells. PhD thesis, University of Potsdam, 2021.
- [54] L. T. Clark, "Microprocessors and SRAM for Space: Basics, Radiation Effects and Design," *IEEE Nuclear and Space Radiation Effects Conference (NSREC)*, Short Course, 2010.
- [55] C. Zhao and S. Dey, "Evaluating and Improving Transient Error Tolerance of CMOS Digital VLSI Circuits," in 2006 IEEE International Test Conference, pp. 1– 10, Oct 2006.
- [56] J. D. Black, P. E. Dodd, and K. M. Warren, "Physics of Multiple-Node Charge Collection and Impacts on Single-Event Characterization and Soft Error Rate Prediction," *IEEE Transactions on Nuclear Science*, vol. 60, pp. 1836–1851, June 2013.
- [57] M. Ebrahimi, H. Asadi, R. Bishnoi, and M. B. Tahoori, "Layout-Based Modeling and Mitigation of Multiple Event Transients," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, pp. 367–379, March 2016.
- [58] O. A. Amusan, L. W. Massengill, M. P. Baze, B. L. Bhuva, A. F. Witulski, S. Das-Gupta, A. L. Sternberg, P. R. Fleming, C. C. Heath, and M. L. Alles, "Directional Sensitivity of Single Event Upsets in 90 nm CMOS Due to Charge Sharing," *IEEE Transactions on Nuclear Science*, vol. 54, pp. 2584–2589, Dec 2007.

- [59] M. Haghi and J. Draper, "The 90 nm Double-DICE Storage Element to reduce Single-Event Upsets," in 2009 52nd IEEE International Midwest Symposium on Circuits and Systems, pp. 463–466, Aug 2009.
- [60] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic," in *Proceedings International Conference on Dependable Systems and Networks*, pp. 389–398, 2002.
- [61] J. R. Ahlbin, L. W. Massengill, B. L. Bhuva, B. Narasimham, M. J. Gadlage, and P. H. Eaton, "Single-Event Transient Pulse Quenching in Advanced CMOS Logic Circuits," *IEEE Transactions on Nuclear Science*, vol. 56, no. 6, pp. 3050–3056, 2009.
- [62] V. Ferlet-Cavrois, P. Paillet, D. McMorrow, N. Fel, J. Baggio, S. Girard, O. Duhamel, J. S. Melinger, M. Gaillardin, J. R. Schwank, P. E. Dodd, M. R. Shaneyfelt, and J. A. Felix, "New Insights Into Single Event Transient Propagation in Chains of Inverters-Evidence for Propagation-Induced Pulse Broadening," *IEEE Transactions on Nuclear Science*, vol. 54, pp. 2338–2346, Dec 2007.
- [63] P. E. Dodd, M. R. Shaneyfelt, J. A. Felix, and J. R. Schwank, "Production and Propagation of Single-Event Transients in High-Speed Digital Logic ICs," *IEEE Transactions on Nuclear Science*, vol. 51, no. 6, pp. 3278–3284, 2004.
- [64] B. Narasimham, B. L. Bhuva, R. D. Schrimpf, L. W. Massengill, M. J. Gadlage, O. A. Amusan, W. T. Holman, A. F. Witulski, W. H. Robinson, J. D. Black, J. M. Benedetto, and P. H. Eaton, "Characterization of Digital Single Event Transient Pulse-Widths in 130-nm and 90-nm CMOS Technologies," *IEEE Transactions on Nuclear Science*, vol. 54, pp. 2506–2511, Dec 2007.
- [65] M. J. Gadlage, J. R. Ahlbin, B. L. Bhuva, L. W. Massengill, and R. D. Schrimpf, "Single event transient pulse width measurements in a 65-nm bulk cmos technology at elevated temperatures," in 2010 IEEE International Reliability Physics Symposium, pp. 763–767, May 2010.
- [66] S. Jagannathan, N. N. Mahatme, N. J. Gaspard, T. D. Loveless, B. L. Bhuva, and L. W. Massengill, "Hardware based empirical model for predicting logic soft error cross-section," in 2016 IEEE International Reliability Physics Symposium (IRPS), pp. 3B-3-1-3B-3-5, April 2016.
- [67] V. Petrovic, Design Methodology for highly Reliable Digital ASIC Designs Applied to Network-Centric System Middleware Switch Processor. PhD thesis, Brandenburgische Technischen Universität Cottbus, 2013.
- [68] J. von Neumann, "Probabilistic Logics and Synthesis of Reliable Organisms from Unreliable Components," in *Automata Studies* (C. Shannon and J. McCarthy, eds.), pp. 43–98, Princeton University Press, 1956.

- [69] V. Srinivasan, A. Sternberg, A. Duncan, W. Robinson, B. Bhuva, and L. Massengill, "Single-event Mitigation in Combinational Logic using Targeted Data Path Hardening," *IEEE Transactions on Nuclear Science*, vol. 52, pp. 2516–2523, Dec 2005.
- [70] F. Brglez, D. Bryan, and K. Kozminski, "Combinational Profiles of Sequential Benchmark Circuits," in 1989 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1929–1934 vol.3, May 1989.
- [71] K. Al-Kofahi, *Reliability Analysis of Triple Modular Redundancy System with Spare*. PhD thesis, Rochester Institute of Technology, 1993.
- [72] I. Koren and Stephen Y.H. Su, "Reliability Analysis of N-Modular Redundancy Systems with Intermittent and Permanent Faults," *IEEE Transactions on Computers*, vol. C-28, pp. 514–520, July 1979.
- [73] T. Ban and L. A. de Barros Naviner, "A simple Fault-tolerant Digital Voter Circuit in TMR Nanoarchitectures," in *Proceedings of the 8th IEEE International NEWCAS Conference 2010*, pp. 269–272, June 2010.
- [74] D. G. Mavis and P. H. Eaton, "Soft Error Rate Mitigation Techniques for Modern Microcircuits," in 2002 IEEE International Reliability Physics Symposium. Proceedings. 40th Annual (Cat. No.02CH37320), pp. 216–225, April 2002.
- [75] V. Morgan and D. Gregory, "An ECL Logic Synthesis System," Proceedings of the 28th ACM/IEEE Design Automation Conference, no. Paper 6.3, pp. 106 – 111, 1991.
- [76] K. Tiri and I. Verbauwhede, "A VLSI Design Flow for Secure Side-Channel Attack resistant ICs," in *Design, Automation and Test in Europe, 2005. Proceedings*, pp. 58–63 Vol. 3, March 2005.
- [77] S. Badel, I. Hatirnaz, Y. Leblebici, and E. Brauer, "Via-Programmable Structured ASIC Fabric Based on MCML Cells: Design Flow and Implementation," in 2006 49th IEEE International Midwest Symposium on Circuits and Systems, vol. 1, pp. 85–88, Aug. 2006.
- [78] K. Tiri and I. Verbauwhede, "Place and Route for Secure Standard Cell Design," in *CARDIS* (J.-J. Quisquater, P. Paradinas, Y. Deswarte, and A. A. E. Kalam, eds.), pp. 143–158, Kluwer, 2004.
- [79] K. Tiri and I. Verbauwhede, "A Digital Design Flow for Secure Integrated Circuits," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 25, pp. 1197–1208, July 2006.
- [80] S. Badel, E. Guleyupoglu, O. Inac, A. Martinez, P. Vietti, F. Gurkaynak, and Y. Leblebici, "A Generic Standard Cell Design Methodology for Differential Circuit

Styles," in Design, Automation and Test in Europe, 2008. DATE '08, pp. 843–848, Mar. 2008.

- [81] G. Kell, "IECL cells description SBG25VD Technology," 2006.
- [82] G. Kell, D. Schulz, and M. Müller, "Digital cell library Common\_ECL," 2015.
- [83] G. Kell and J. Wiesböck, "Ultraschneller RISC-basierter Operationsknoten in bipolarer SiGe-Technologie," *ElektronikPraxis*, 2016. [Available online, last access: May 2022].
- [84] Y. Ji-Ren, I. Karlsson, and C. Svensson, "A True Single-Phase-Clock Dynamic CMOS Circuit Technique," *IEEE Journal of Solid-State Circuits*, vol. 22, pp. 899– 901, Oct 1987.
- [85] J. Yuan and C. Svensson, "High-speed CMOS Circuit Technique," *IEEE Journal of Solid-State Circuits*, vol. 24, pp. 62–70, Feb 1989.
- [86] B. Razavi, "TSPC Logic [A Circuit for All Seasons]," IEEE Solid-State Circuits Magazine, vol. 8, pp. 10–13, Fall 2016.
- [87] S. M. Jahinuzzaman, D. J. Rennie, and M. Sachdev, "Soft Error robust Impulse and TSPC flip-flops in 90nm CMOS," in 2009 2nd Microsystems and Nanoelectronics Research Conference, pp. 45–48, Oct 2009.
- [88] Cadence, "Spectre Simulation Platform." http://www.cadence.com, 2021. [Available online, last access: November 2021].
- [89] J. S. Kauppila, A. L. Sternberg, M. L. Alles, A. M. Francis, J. Holmes, O. A. Amusan, and L. W. Massengill, "A Bias-Dependent Single-Event Compact Model Implemented Into BSIM4 and a 90 nm CMOS Process Design Kit," *IEEE Transactions on Nuclear Science*, vol. 56, pp. 3152–3157, Dec 2009.
- [90] B. Narasimham, M. J. Gadlage, B. L. Bhuva, R. D. Schrimpf, L. W. Massengill, W. T. Holman, A. F. Witulski, and K. F. Galloway, "Test Circuit for Measuring Pulse Widths of Single-Event Transients Causing Soft Errors," *IEEE Transactions* on Semiconductor Manufacturing, vol. 22, pp. 119–125, Feb 2009.
- [91] R. Shuler, A. Balasubramanian, B. Narasimham, B. Bhuva, P. Neill, and C. Kouba, "The Effectiveness of TAG or Guard-Gates in SET Suppression Using Delay and Dual-Rail Configurations at 0.35 μm," *Nuclear Science, IEEE Transactions on*, vol. 53, pp. 3428 – 3431, 01 2007.
- [92] R. Naseer and J. Draper, "The DF-DICE Storage Element for Immunity to Soft Errors," in 48th Midwest Symposium on Circuits and Systems, 2005., pp. 303–306 Vol. 1, Aug 2005.
- [93] F. Smith, "A new Methodology for Single Event Transient Suppression in Flash FPGAs," *Microprocessors and Microsystems*, vol. 37, no. 3, pp. 313–318, 2013.

- [94] S. Rezgui, J. J. Wang, E. C. Tung, B. Cronquist, and J. McCollum, "New Methodologies for SET Characterization and Mitigation in Flash-Based FPGAs," *IEEE Transactions on Nuclear Science*, vol. 54, pp. 2512–2524, Dec 2007.
- [95] R. Weigand, "Single Event Effect Mitigation in Digital Integrated Circuits for Space," 2010. Topical Workshop on Electronics for Particle Physics.
- [96] A. Gujja, S. Chellappa, C. Ramamurthy, and L. T. Clark, "Redundant Skewed Clocking of Pulse-Clocked Latches for Low Power Soft Error Mitigation," in 2015 15th European Conference on Radiation and Its Effects on Components and Systems (RADECS), pp. 1–7, Sept 2015.
- [97] S. Kumar, S. Chellappa, and L. T. Clark, "Temporal Pulse-clocked Multi-bit Flip-Flop mitigating SET and SEU," in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 814–817, May 2015.
- [98] B. Narasimham, R. L. Shuler, J. D. Black, B. L. Bhuva, R. D. Schrimpf, A. F. Witulski, W. T. Holman, and L. W. Massengill, "Quantifying the Reduction in Collected Charge and Soft Errors in the Presence of Guard Rings," *IEEE Transactions on Device and Materials Reliability*, vol. 8, no. 1, pp. 203–209, 2008.
- [99] J. Furuta, K. Kobayashi, and H. Onodera, "Impact of Cell Distance and Well-Contact Density on Neutron-Induced Multiple Cell Upsets," in 2013 IEEE International Reliability Physics Symposium (IRPS), pp. 6C.3.1–6C.3.4, 2013.
- [100] J. Chen, S. Chen, Y. He, J. Qin, B. Liang, B. Liu, and P. Huang, "Novel Layout Technique for Single-Event Transient Mitigation Using Dummy Transistor," *IEEE Transactions on Device and Materials Reliability*, vol. 13, no. 1, pp. 177–184, 2013.
- [101] T. Aoki, "A Practical High-Latchup Immunity Design Methodology for Internal Circuits in the Standard Cell-Based CMOS/BiCMOS LSIs," *IEEE Transactions* on Electron Devices, vol. 40, no. 8, pp. 1432–1436, 1993.
- [102] M. Nicolaidis, "A Low-Cost Single-Event Latchup Mitigation Scheme," in 12th IEEE International On-Line Testing Symposium (IOLTS'06), pp. 5 pp.-, 2006.
- [103] V. Petrović, G. Schoof, and Z. Stamenković, "Fault-Tolerant TMR and DMR Circuits with Latchup Protection Switches," *Microelectronics Reliability*, vol. 54, no. 8, pp. 1613–1626, 2014.
- [104] T. Calin, M. Nicolaidis, and R. Velazco, "Upset hardened memory design for submicron CMOS technology," *IEEE Transactions on Nuclear Science*, vol. 43, no. 6, pp. 2874–2878, 1996.
- [105] O. A. Amusan, L. W. Massengill, M. P. Baze, A. L. Sternberg, A. F. Witulski, B. L. Bhuva, and J. D. Black, "Single Event Upsets in Deep-Submicrometer Technologies Due to Charge Sharing," *IEEE Transactions on Device and Materials Reliability*, vol. 8, no. 3, pp. 582–589, 2008.

- [106] L. Hsiao-Heng Kelin, L. Klas, B. Mounaim, R. Prasanthi, I. R. Linscott, U. S. Inan, and M. Subhasish, "LEAP: Layout Design through Error-Aware Transistor Positioning for Soft-error resilient Sequential Cell Design," in 2010 IEEE International Reliability Physics Symposium, pp. 203–212, May 2010.
- [107] S. M. Jahinuzzaman, D. J. Rennie, and M. Sachdev, "A Soft Error Tolerant 10T SRAM Bit-Cell With Differential Read Capability," *IEEE Transactions on Nuclear Science*, vol. 56, no. 6, pp. 3768–3773, 2009.
- [108] S. Jagannathan, T. D. Loveless, B. L. Bhuva, S. Wen, R. Wong, M. Sachdev, D. Rennie, and L. W. Massengill, "Single-Event Tolerant Flip-Flop Design in 40nm Bulk CMOS Technology," *IEEE Transactions on Nuclear Science*, vol. 58, no. 6, pp. 3033–3037, 2011.
- [109] D. Bessot and R. Velazco, "Design of SEU-hardened CMOS memory cells: the HIT cell," in *RADECS 93. Second European Conference on Radiation and its Effects* on Components and Systems (Cat. No.93TH0616-3), pp. 563–570, Sep. 1993.
- [110] S. Redant, "The DARE library family," DARE User Day, ESA/ESTEC, 2011.
- [111] J. E. Knudsen and L. T. Clark, "An Area and Power Efficient Radiation Hardened by Design Flip-Flop," *IEEE Transactions on Nuclear Science*, vol. 53, pp. 3392– 3399, Dec 2006.
- [112] A. Maru, H. Shindou, T. Ebihara, A. Makihara, T. Hirao, and S. Kuboyama, "DICE-based Flip-Flop with SET Pulse Discriminator on a 90 nm Bulk CMOS Process," *IEEE Transactions on Nuclear Science*, vol. 57, pp. 3602–3608, Dec 2010.
- [113] S. M. Jahinuzzaman and R. Islam, "TSPC-DICE: A single phase clock high performance SEU hardened flip-flop," in 2010 53rd IEEE International Midwest Symposium on Circuits and Systems, pp. 73–76, Aug 2010.
- [114] S. Gupta and J. Mekie, "Soft Error Resilient and Energy Efficient Dual Modular TSPC Flip-Flop," in 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), pp. 341–346, Jan 2019.
- [115] S. Mitra, M. Zhang, N. Seifert, T. Mak, and K. S. Kim, "Soft Error Resilient System Design through Error Correction," in 2006 IFIP International Conference on Very Large Scale Integration, pp. 332–337, Oct 2006.
- [116] G. L. Jaya, S. Chen, and S. Liter, "A Dual Redundancy Radiation-hardened Flip-Flop based on C-element in 65nm Process," in 2016 International Symposium on Integrated Circuits (ISIC), pp. 1–4, Dec 2016.
- [117] J. Teifel, "Self-Voting Dual-Modular-Redundancy Circuits for Single-Event-Transient Mitigation," *IEEE Transactions on Nuclear Science*, vol. 55, pp. 3435– 3439, Dec 2008.

- [118] S. A. Aketi, J. Mekie, and H. Shah, "Single-Error Hardened and Multiple-Error Tolerant Guarded Dual Modular Redundancy Technique," in 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID), pp. 250–255, Jan 2018.
- [119] L. Cassano, A. Bosio, and G. D. Natale, "A Novel Adaptive Fault Tolerant Flip-Flop Architecture Based on TMR," in 2014 19th IEEE European Test Symposium (ETS), pp. 1–2, 2014.
- [120] A. J. Drake, A. Kleinosowski, and A. K. Martin, "A Self-Correcting Soft Error Tolerant Flip-Flop," in 12th NASA Symposium on VLSI Design, Coeur d'Alene, pp. 4–5, 2005.
- [121] N. D. Hindman, D. E. Pettit, D. W. Patterson, K. E. Nielsen, X. Yao, K. E. Holbert, and L. T. Clark, "High speed Redundant Self-correcting Circuits for Radiation Hardened by Design Logic," in 2009 European Conference on Radiation and Its Effects on Components and Systems, pp. 465–472, Sep. 2009.
- [122] C. Ramamurthy, A. Gujja, V. Vashishtha, S. Chellappa, and L. T. Clark, "Muller C-element Self-corrected Triple Modular Redundant Logic with Multithreading and Low Power Modes," in 2017 17th European Conference on Radiation and Its Effects on Components and Systems (RADECS), pp. 1–4, Oct 2017.
- [123] P. Balasubramanian and N. E. Mastorakis, "Power, Delay and Area Comparisons of Majority Voters relevant to TMR Architectures," ArXiv, vol. abs/1603.07964, 2016.
- [124] C. Albrecht, "IWLS 2005 Benchmarks." https://iwls.org/iwls2005/benchmarks.html, 2005. [Available online, last access: March 2021].
- [125] S. Rezgui, J. McCollum, and J.-J. Wang, "Single Event Transient Mitigation and Measurement in Integrated Circuits," 2010.
- [126] S. Zeidler, O. Schrape, A. Breitenreiter, and M. Krstic, "Fehlertolerante sequentielle Zelle und dessen Testverfahren," 2021.
- [127] E. Garcia, M. Vallejo, L. Puerto, A. Breitenreiter, O. Schrape, and M. Krstic, "Research on ADC Architectures Suitable for Space Applications and Technology Scaling," in *International Workshop on Analogue and Mixed-Signal Integrated Circuits for Space Applications (AMICSA)*, 2021.
- [128] RCD and IHP, "System Specification Document." 2021.