• Media type: Text; E-Article; Electronic Conference Proceeding
  • Title: Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network
  • Contributor: Zhang, Qinzi [Author]; Tseng, Lewis [Author]
  • Published: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021
  • Language: English
  • DOI: https://doi.org/10.4230/LIPIcs.OPODIS.2020.7
  • Keywords: Wireless Communication ; Byzantine Fault ; Communication Complexity ; Distributed Machine Learning ; Single-hop Radio Network ; Parameter Server
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: In the past few years, many Byzantine-tolerant distributed machine learning (DML) algorithms have been proposed in the point-to-point communication model. In this paper, we focus on a popular DML framework - the parameter server computation paradigm and iterative learning algorithms that proceed in rounds, e.g., [Gupta and Vaidya, 2020; El-Mhamdi et al., 2020; Chen et al., 2017]. One limitation of prior algorithms in this domain is the high communication complexity. All the Byzantine-tolerant DML algorithms that we are aware of need to send n d-dimensional vectors from worker nodes to the parameter server in each round, where n is the number of workers and d is the number of dimensions of the feature space (which may be in the order of millions). In a wireless network, power consumption is proportional to the number of bits transmitted. Consequently, it is extremely difficult, if not impossible, to deploy these algorithms in power-limited wireless devices. Motivated by this observation, we aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network [Alistarh et al., 2010; Bhandari and Vaidya, 2005; Koo, 2004]. Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020 [Gupta and Vaidya, 2020], we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast properties of the radio network to avoid transmitting the raw gradients (full d-dimensional vectors). In the radio network, each worker is able to overhear previous gradients that were transmitted to the parameter server. Roughly speaking, in Echo-CGC, if a worker "agrees" with a combination of prior gradients, it will broadcast the "echo message" instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most n) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send n local gradients in each round, where each gradient is typically a vector in a ...
  • Access State: Open Access