Migol : a Fault-Tolerant Service Framework for MPI Applications in the Grid
- In a distributed, inherently dynamic Grid environment the reliability of individual resources cannot be guaranteed. The more resources and components are involved the more error-prone is the system. Therefore, it is important to enhance the dependability of the system with fault-tolerance mechanisms. In this paper, we present Migol, a fault-tolerant, self-healing Grid service infrastructure for MPI applications. The benefit of the Grid is that in case of a failure an application may be migrated and restarted from a checkpoint file on another site. This approach requires a service infrastructure which handles the necessary activities transparently for an application. But any migration framework cannot support fault-tolerant applications, if it is not fault-tolerant itself.
Author details: | André Luckow, Bettina SchnorORCiDGND |
---|---|
ISBN: | 978-3-540-29009-4 |
Publication type: | Article |
Language: | English |
Year of first publication: | 2005 |
Publication year: | 2005 |
Release date: | 2017/03/24 |
Source: | Recent Advances in Parallel Virtual Machine and Message Passing Interface : 12th European PVM/MPI User's Group Meeting Sorrento, Italy, September 18-21, 2005 : Proceedings / Hrsg.: K. Selcuk Canda ; Augusto Celentano. - Berlin : Springer, 2005. - ISBN 978-3-540-29009-4. - (Lecture Notes in Computer Science ; 3666). - S. 258 - 267 |
Organizational units: | Mathematisch-Naturwissenschaftliche Fakultät / Institut für Mathematik |