Failure in long running grid applications is arguably in- evitable and costly. Therefore, fault tolerance (FT) sup- port for grid applications is needed. This paper evaluates an extension of our prior work on Recovery Aware Compo- nents (RAC), a component-based FT approach. Our ex- tension utilizes the grid application architecture according to a small number of architectural classes. In this paper, we evaluate the MapReduce architecture only and analyze the reliability improvement MapReduce applications would gain by adopting the RAC approach. Our analysis shows that signi cant increases in reliability are possible at mod- erate extra cost. Obviously the cost of FT depends on the failure rate of the managed system, i.e., the system to be pro- tected from faults, and the FT strategy chosen. Our work aims to give High Performance Computing (HPC) software architects the tools to control these factors for di erent grid application architectures.
History
Related Materials
1.
ISBN - Is published in 9781450307246 (urn:isbn:9781450307246)