RMIT University
Browse

Relative debugging for a highly parallel hybrid computer system

conference contribution
posted on 2024-10-31, 20:53 authored by Luiz DeRose, Andrew Gontarek, Aaron Vose, Robert Moench, David Abramson, Minh DinhMinh Dinh, Chao Jin
Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs accelerators or coprocessors. In this paper we extend relative debugging to support porting stencil computation on a hybrid computer. We describe a generic data model that allows programmers to examine the global state across different types of applications, including MPI/OpenMP, MPI/OpenACC, and UPC programs. We present case studies using a hybrid version of the 'stellarator' particle simulation DELTA5D, on Titan at ORNL, and the UPC version of Shallow Water Equations on Crystal, an internal supercomputer of Cray. These case studies used up to 5,120 GPUs and 32,768 CPU cores to illustrate that the debugger is effective and practical.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/2807591.2807605
  2. 2.
    ISBN - Is published in 9781509002733 (urn:isbn:9781509002733)

Start page

133

End page

144

Total pages

12

Outlet

Proceedings of the IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 15)

Name of conference

SC 15

Publisher

IEEE

Place published

United States

Start date

2015-11-15

End date

2015-11-20

Language

English

Copyright

© 2015 ACM. All Rights Reserved

Former Identifier

2006094376

Esploro creation date

2020-06-22

Fedora creation date

2019-10-23