Evaluating static analysis techniques to accelerate data race detection for MPI RMA

  • Evaluation statischer Analysemethoden zur Performanceoptimierung von Data-Race-Erkennung für MPI RMA

Oraji, Yussur Mustafa; Müller, Matthias S. (Thesis advisor); Noll, Thomas (Thesis advisor); Schwitanski, Simon (Consultant)

Aachen : RWTH Aachen University (2023)
Bachelor Thesis

Bachelorarbeit, RWTH Aachen University, 2023

Abstract

Most high-performance computing systems utilize a distributed memory system, where a message-passing specification such as MPI is required for data communication across processes. MPI especially allows for one-sided communication, where message passing requires only one process to start the communication while the other is not required to perform a corresponding MPI call. Both standard MPI and MPI RMA are prone to data races however, requiring significant effort to find and fix. While MPI RMA data race detectors exist, they often significantly slow down program execution. This is especially the case for dynamic analysis tools which perform race detection at runtime. MUST-RMA, one such tool, can cause a slowdown of up to a factor of 16. In contrast, static tools can run cheaply at compile time with minimal overhead. The combination of both dynamic and static analysis may therefore prove useful: This thesis presents three static optimization approaches for MPI RMA data race detection based on MUST-RMA. The first approach generates a whitelist of relevant values and instructions to inspect for the dynamic tool, while others may be ignored. Though similar to the approach used in MC-Checker, the implementation is more generally applicable and extensible, for example, to additional programming languages such as Fortran. This whitelist may also be extended with additional information, more specifically on which type each value stored corresponds to. By checking whether or not the code only performs remote reads, writes or both additional filtering of this whitelist is possible for potential speed gain. Finally, the race detection itself may simply be delayed until the moment it is required, which is the moment the MPI RMA window is created. Additionally, the race detection may be turned off again when this window is destroyed. These optimization approaches were built on top of the LLVM framework as compile time passes, with the implementation general enough to support both C and C++ at this time. All optimizations used support interprocedural analysis, and, through the use of a modified compilation pipeline, may also be used across translation units. While introducing some false negatives, applying these optimizations provides a 2x speedup compared to normal MUST execution in most cases, with best case scenarios reaching a speedup of 4x.

Institutions

  • Department of Computer Science [120000]
  • Chair of Computer Science 12 (High Performance Computing) [123010]