Description:
Architecture, Networks, and Storage -- Microarchitecture of a Configurable High-radix Router for Exascale Interconnect -- BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs -- Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How that’s Different from CPU -- A Hierarchical Task Scheduler for Heterogeneous Computing -- Machine Learning, AI, and Emerging Technologies -- Auto-Precision Scaling for Distributed Deep Learning -- FPGA Acceleration of Number Theoretic Transform -- Designing a ROCm-aware MPI Library for AMD GPUs: Early Experiences -- A Tunable Implementation of Quality-of-Service Classes for HPC Networks -- Scalability of Streaming Anomaly Detection in an Unbounded Key Space using Migrating Threads -- HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads -- Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems -- HPC Algorithms and Applications -- COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling -- Enabling AI-Accelerated Multiscale Modeling of Thrombogenesis at Millisecond and Molecular Resolutions on Supercomputers -- Evaluation of the NEC Vector Engine for Legacy CFD Codes -- Distributed Sparse Block Grids on GPUs -- iPUG: Accelerating Breadth-First Graph Traversals using Manycore Graphcore IPUs -- Performance Modeling, Evaluation, and Analysis -- Optimizing GPU-enhanced HPC System and Cloud Procurements for Scientific Workloads -- A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application -- Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact -- Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark -- Under the Hood of SYCL - An Initial Performance Analysis With an Unstructured-mesh CFD Application -- Characterizing Containerized HPC Application Performance at Petascale on CPU and GPU Architectures -- Ubiquitous Performance Analysis -- Programming Environments and Systems Software -- Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning.
This book constitutes the refereed proceedings of the 36th International Conference on High Performance Computing, ISC High Performance 2021, held virtually in June/July 2021. The 24 full papers presented were carefully reviewed and selected from 74 submissions. The papers cover a broad range of topics such as architecture, networks, and storage; machine learning, AI, and emerging technologies; HPC algorithms and applications; performance modeling, evaluation, and analysis; and programming environments and systems software.