Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of `distribution distance' and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ 2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.
History
Start page
291
End page
296
Total pages
6
Outlet
Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies
Name of conference
IEEE International Conference on Research, Innovation and Vision for the Future