Using hardware performance footprints of HPC benchmarks for job embedding

  • Job Embedding mithilfe von Hardware Performance Charakteristika verschiedener HPC Benchmarks

Eckhardt, Jonas; Müller, Matthias S. (Thesis advisor); Schulz, Martin (Thesis advisor); Schürhoff, Daniel (Consultant)

Aachen : RWTH Aachen University (2023)
Master Thesis

Masterarbeit, RWTH Aachen University, 2022


While hardware performance counters are already collected in modern high performance computing systems, they are currently only evaluated manually and by experts in a mostly time consuming process. Automatic evaluation of performance counters gives the opportunity to speed up this process and allows for new applications like job classification, automatic user feedback, system health monitoring, automatic job tagging and many more. The automatic evaluation is hampered by the lack of labelled data and hard to define classification rules. In addition the automatic evaluation is limited by the high temporal dimensionality, even with a low resolution, as well as high dimensionality caused by the number of cores, nodes and collected hardware performance counters. Therefore, in this work, labelled data is collected and different statistical dimensionality reduction methods as well as autoencoder, principal component analysis and feature agglomeration are applied. The quality is evaluated based on the results of a supervised learning task, leading to a justified best reduction of the collected data. The goal of the task is to predict which, of a given sets of benchmarks, was run based on the measured hardware performance indicators, using neural networks, naive bayes classifier, decision trees and support vector machines. The quality is compared, based on quality metrics of the task, resulting in the reasoned best dimensionality reduction. Since the dimensionality reduction is dependent on the underlying architecture, a framework to recalculate the best reduction for future architectures and other tasks is presented. The presented embedding reduces the dimensionality with statistical methods and feature agglomeration down to a dimension of 20, while still being capable of predicting the executed workload with a precision, accuracy and recall of above 99\%. A precision of above 90\% can be achieved with an embedding into five dimensional space. In addition the results prove that the thereby chosen embedding improves unsupervised k-Means clustering quality by a factor of seven to twelfth.