• Medientyp: E-Artikel
  • Titel: Large-scale distributed linear algebra with tensor processing units
  • Beteiligte: Lewis, Adam G. M.; Beall, Jackson; Ganahl, Martin; Hauru, Markus; Mallick, Shrestha Basu; Vidal, Guifre
  • Erschienen: Proceedings of the National Academy of Sciences, 2022
  • Erschienen in: Proceedings of the National Academy of Sciences, 119 (2022) 33
  • Sprache: Englisch
  • DOI: 10.1073/pnas.2122762119
  • ISSN: 1091-6490; 0027-8424
  • Schlagwörter: Multidisciplinary
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:p>We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs’ fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size<jats:inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>048</mml:mn><mml:mo>,</mml:mo><mml:mn>576</mml:mn></mml:mrow></mml:math></jats:inline-formula>in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization.</jats:p>
  • Zugangsstatus: Freier Zugang