Implementing record and refinement for debugging timing-dependent communication
- Distributed applications are hard to debug because timing-dependent network communication is a source of non-deterministic behavior. Current approaches to debug non deterministic failures include post-mortem debugging as well as record and replay. However, the first impairs system performance to gather data, whereas the latter requires developers to understand the timing-dependent communication at a lower level of abstraction than they develop at. Furthermore, both approaches require intrusive core library modifications to gather data from live systems. In this paper, we present the Peek-At-Talk debugger for investigating non-deterministic failures with low overhead in a systematic, top-down method, with a particular focus on tool-building issues in the following areas: First, we show how our debugging framework Path Tools guides developers from failures to their root causes and gathers run-time data with low overhead. Second, we present Peek-At-Talk, an extension to our Path Tools framework to record non-deterministic communicationDistributed applications are hard to debug because timing-dependent network communication is a source of non-deterministic behavior. Current approaches to debug non deterministic failures include post-mortem debugging as well as record and replay. However, the first impairs system performance to gather data, whereas the latter requires developers to understand the timing-dependent communication at a lower level of abstraction than they develop at. Furthermore, both approaches require intrusive core library modifications to gather data from live systems. In this paper, we present the Peek-At-Talk debugger for investigating non-deterministic failures with low overhead in a systematic, top-down method, with a particular focus on tool-building issues in the following areas: First, we show how our debugging framework Path Tools guides developers from failures to their root causes and gathers run-time data with low overhead. Second, we present Peek-At-Talk, an extension to our Path Tools framework to record non-deterministic communication and refine behavioral data that connects source code with network events. Finally, we scope changes to the core library to record network communication without impacting other network applications.…
Author details: | Tim FelgentreffORCiDGND, Michael Perscheid, Robert HirschfeldORCiDGND |
---|---|
DOI: | https://doi.org/10.1016/j.scico.2015.11.006 |
ISSN: | 0167-6423 |
ISSN: | 1872-7964 |
Title of parent work (English): | Science of computer programming |
Publisher: | Elsevier |
Place of publishing: | Amsterdam |
Publication type: | Article |
Language: | English |
Date of first publication: | 2016/11/30 |
Publication year: | 2017 |
Release date: | 2022/07/04 |
Tag: | Distributed debugging; Dynamic analysis; Record and refinement; Record and replay |
Volume: | 134 |
Number of pages: | 15 |
First page: | 4 |
Last Page: | 18 |
Organizational units: | An-Institute / Hasso-Plattner-Institut für Digital Engineering gGmbH |
DDC classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke |
Peer review: | Referiert |