• Medientyp: Bericht; E-Book
  • Titel: Intrinsic Fault Tolerance of Multi Level Monte Carlo Methods
  • Beteiligte: Pauli, Stefan [Verfasser:in]; Arbenz, Peter [Verfasser:in]; id_orcid0 000-0002-1501-3176 [Verfasser:in]; Schwab, Christoph [Verfasser:in]
  • Erschienen: Seminar for Applied Mathematics, ETH Zurich, 2012-08
  • Erschienen in: SAM Research Report, 2012-24
  • Sprache: Englisch
  • DOI: https://doi.org/20.500.11850/571352; https://doi.org/10.3929/ethz-a-010387066
  • Schlagwörter: Multilevel Monte Carlo ; Fault tolerance ; Failure resilience ; Mathematics ; Exascale parallel computing
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: Monte Carlo (MC) and Multilevel Monte Carlo (MLMC) methods applied to solvers for Partial Differential Equations with random input data are shown to exhibit intrinsic failure resilience. Sufficient conditions are provided for non-recoverable loss of a random fraction of samples not to fatally damage the asymptotic accuracy vs. work of an MC simulation. Specifically, the convergence behavior of MLMC methods on massively parallel hardware is analyzed mathematically and computationally, under general assumptions on the node failures and on the sample failure statistics on the different MC levels, in the absence of checkpointing, i.e. we assume irrecoverable sample failures with complete loss of data. Modifications of the MLMC with enhanced resilience are proposed. The theoretical results are obtained under general statistical models of CPU failure at runtime. Specifically, node failures with the so-called Weibull failure models on massively parallel stochastic Finite Volume computational fluid dynamics simulations are discussed.
  • Zugangsstatus: Freier Zugang
  • Rechte-/Nutzungshinweise: Urheberrechtsschutz - Nicht kommerzielle Nutzung gestattet