Publications
Varbench: an Experimental Framework to Measure and Characterize Performance Variability
Abstract
Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench's capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how …
Metadata
- publication
- International Conference on Parallel Processing, 2018
- year
- 2018
- publication date
- 2018/8/13
- authors
- Brian Kocoloski, John R Lange
- link
- https://dl.acm.org/doi/abs/10.1145/3225058.3225125
- resource_link
- https://dl.acm.org/doi/pdf/10.1145/3225058.3225125
- conference
- International Conference on Parallel Processing