RMIT University
Browse

Evaluating recovery aware components for grid reliability

conference contribution
posted on 2024-10-31, 09:51 authored by I Yusuf, Heinrich SchmidtHeinrich Schmidt, Ian Peake
Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.

History

Start page

277

End page

280

Total pages

4

Outlet

Proceedings of the Seventh European Software Engineering Conference and Seventeenth ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE)

Editors

Hans van Vliet; Valérie Issarny

Name of conference

The Seventh European Software Engineering Conference and Seventeenth ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE)

Publisher

ACM

Place published

New York, United States

Start date

2009-08-24

End date

2009-08-28

Language

English

Copyright

Copyright 2009 ACM

Former Identifier

2006017310

Esploro creation date

2020-06-22

Fedora creation date

2011-06-10

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC