• Medientyp: E-Artikel; Elektronischer Konferenzbericht; Sonstige Veröffentlichung
  • Titel: Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions
  • Beteiligte: Chen, Justin Y. [VerfasserIn]; Indyk, Piotr [VerfasserIn]; Woodruff, David P. [VerfasserIn]
  • Erschienen: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024
  • Sprache: Englisch
  • DOI: https://doi.org/10.4230/LIPIcs.ITCS.2024.32
  • Schlagwörter: Sublinear Algorithms ; Streaming and Sketching Algorithms
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: We revisit the problem of estimating the profile (also known as the rarity) in the data stream model. Given a sequence of m elements from a universe of size n, its profile is a vector ϕ whose i-th entry ϕ_i represents the number of distinct elements that appear in the stream exactly i times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry ϕ_i up to an additive error of ± ε D using O(1/ε² (log n + log m)) bits of space, where D is the number of distinct elements in the stream. In this paper, we considerably improve on this result by designing an algorithm which simultaneously estimates many coordinates of the profile vector ϕ up to small overall error. We give an algorithm which, with constant probability, produces an estimated profile ϕˆ with the following guarantees in terms of space and estimation error: b) For any constant τ, with O(1 / ε² + log n) bits of space, ∑_{i = 1}^τ |ϕ_i - ϕˆ_i| ≤ ε D. c) With O(1/ ε²log (1/ε) + log n + log log m) bits of space, ∑_{i = 1}^m |ϕ_i - ϕˆ_i| ≤ ε m. In addition to bounding the error across multiple coordinates, our space bounds separate the terms that depend on 1/ε and those that depend on n and m. We prove matching lower bounds on space in both regimes. Application of our profile estimation algorithm gives estimates within error ± ε D of several symmetric functions of frequencies in O(1/ε² + log n) bits. This generalizes space-optimal algorithms for the distinct elements problems to other problems including estimating the Huber and Tukey losses as well as frequency cap statistics.
  • Zugangsstatus: Freier Zugang