Beschreibung:
SummaryComputational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores (MICs), and those computational components are combined in heterogeneous clusters of nodes with different characteristics, including coprocessors of different types, with varying numbers of nodes at different speeds. The software previously developed and optimized for simpler system needs to be redesigned and reoptimized for these new, more complex systems. The adaptation to hybrid multicore + multiGPU and multicore + multiMIC of autotuning techniques for basic linear algebra routines is analyzed. The matrix‐matrix multiplication kernel, which is optimized for different computational system components through guided experimentation, is studied. The routine is installed for each node in the cluster, and the information generated from individual installations may be used for a hierarchical installation in a cluster. The basic matrix‐matrix multiplication may, in turn, be used inside higher level routines, which delegate their efficient execution to the optimization of the lower level routine. Experimental results are satisfactory in different multicore + multiGPU and multicore + multiMIC systems. So the guided search of execution configurations for satisfactory execution times proves to be a useful tool for heterogeneous systems, where the complexity of the system means a correct use of highly efficient routines and libraries is difficult.