0

Fault-Tolerance Techniques for High-Performance Computing

eBook - Computer Communications and Networks

Erschienen am 01.07.2015, 1. Auflage 2015
111,95 €
(inkl. MwSt.)

Download

E-Book Download
Bibliografische Daten
ISBN/EAN: 9783319209432
Sprache: Englisch
Umfang: 8.62 MB
E-Book
Format: PDF
DRM: Digitales Wasserzeichen

Beschreibung

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Inhalt

Part I: General Overview.-Fault-Tolerance Techniques for High-Performance Computing.-Part II: Technical Contributions.-Errors and Faults.- Fault-Tolerant MPI.- Using Replication for Resilience on Exascale Systems.- Energy-Aware Check pointing Strategies.

Informationen zu E-Books

Bitte beachten Sie beim Kauf eines Ebooks, das sie das richtige Format wählen (EPUB oder PDF) und das eine Stornierung der Bestellung nach Anklicken des Downloadlinks nicht mehr möglich ist.