Publications

Varbench: an Experimental Framework to Measure and Characterize Performance Variability

Abstract

Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench's capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how …

Metadata

publication
International Conference on Parallel Processing, 2018
year
2018
publication date
2018/8/13
authors
Brian Kocoloski, John R Lange
link
https://dl.acm.org/doi/abs/10.1145/3225058.3225125
resource_link
https://dl.acm.org/doi/pdf/10.1145/3225058.3225125
conference
International Conference on Parallel Processing