Description:
<jats:title>Abstract</jats:title>
<jats:p>The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG) and quasi-minimal residual (QMR) methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronization-reducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called <jats:italic>s</jats:italic>-step algorithms are based on grouping dot products for joint execution and replacing time-consuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the <jats:italic>s</jats:italic>-step Lanczos algorithm, introduce <jats:italic>s</jats:italic>-step BiCG and QMR variants, and compare the parallel performance of these new <jats:italic>s</jats:italic>-step versions with previous algorithms.</jats:p>