Transfer learning workflow for I/O bandwidth prediction

  • Transfer Learning Workflow zur I/O-Leistungsvorhersage der Bandbreite

Povaliaiev, Dmytro; Müller, Matthias S. (Thesis advisor); Kunkel, Julian (Thesis advisor); Liem, Radita Tapaning Hesti (Consultant)

Aachen : RWTH Aachen University (2023)
Master Thesis

Masterarbeit, RWTH Aachen University, 2023

Abstract

As the new generation of high-performance computing (HPC) systems reaches exascale performance for the first time, preventing underutilization due to I/O bottlenecks becomes even more critical. However, accurately predicting the I/O performance remains a challenging problem. The existing approaches [29] [37] [92] use a significant amount of data from a particular HPC cluster to create a suitable machine learning model. This is problematic due to the required timescale and I/O instrumentation infrastructure, especially in the case of the new filesystems that have not yet gained widespread adoption. To address this issue, I propose a transfer learning-based workflow for I/O bandwidth prediction that requires less data from the target cluster than the existing methods to produce a model of equivalent quality. As a proof-of-concept (POC), I use it to predict the I/O performance of CLAIX, the supercomputing cluster at RWTH Aachen University, employing data collected at the Blue Waters system of the University of Illinois for the initial training. Even in the POC form, the models produced by the workflow show a slight improvement of 1.08% average residual error over the current state of the art of 10% in bandwidth prediction on HPC clusters [37]. I further verify these results using cross-validation and analyze the models with the help of nine interpretable machine learning (also called explainable AI) techniques to provide insight into the features they consider to be the most important ones.

Institutions

  • Department of Computer Science [120000]
  • Chair of Computer Science 12 (High Performance Computing) [123010]

Identifier

Downloads