RMIT University
Browse

Measurements of errors in large-scale computational simulations at runtime

conference contribution
posted on 2024-11-03, 14:10 authored by Minh DinhMinh Dinh, Quang Nguyen
Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of `distribution distance' and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ 2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.

History

Start page

291

End page

296

Total pages

6

Outlet

Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies

Name of conference

IEEE International Conference on Research, Innovation and Vision for the Future

Publisher

IEEE

Place published

United States

Start date

2020-10-14

End date

2020-10-15

Language

English

Copyright

© 2020 IEEE

Former Identifier

2006106049

Esploro creation date

2021-06-01

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC