posted on 2024-11-03, 12:54authored byLakshmi J Mohan, Renji Harold, Pablo Caneleo, Udaya Parampalli, Aaron Harwood
Large Scale distributed storage systems play a vital role in maintaining data across storage locations globally. These systems use replication as the default mechanism for providing fault-tolerance. Recently, erasure codes are being used as a viable alternative to replication, since they provide the same fault-tolerance for reduced storage overhead. However, their performance is unclear in a geographically diverse distributed storage system. This paper compares the performance of triple replication with the erasure coding (Reed-Solomon codes) used in Apache Hadoop's implementation of a distributed file system, on a cluster distributed across Australia that runs on the NeCTAR research cloud. Our results show that using erasure coding does not degrade the read performance in such a setting. We also compare the Hadoop's code with a local reconstruction code, implemented in the XORBAS version of Hadoop. These codes perform well in our clusters but the performance gain observed in our results does not conform to the results reported. Hence, we need new codes that perform better, addressing the geographical diversity issue. We believe that our framework is readily usable to test a range of novel erasure codes that are being introduced in the literature.