• Media type: E-Book
  • Title: Adaptive Sequential Experiments with Unknown Information Arrival Processes
  • Contributor: Gur, Yonatan [VerfasserIn]; Momeni, Ahmadreza [VerfasserIn]
  • imprint: [S.l.]: SSRN, [2021]
  • Published in: Stanford University Graduate School of Business Research Paper
  • Extent: 1 Online-Ressource (65 p)
  • Language: English
  • DOI: 10.2139/ssrn.3892631
  • Identifier:
  • Keywords: Sequential experiments ; online learning ; multi-armed bandits ; transfer learning ; minimax complexity ; adaptive algorithms ; product recommendations
  • Origination:
  • Footnote: Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments July 23, 2021 erstellt
  • Description: Sequential experiments are deployed in a variety of practices, including for optimizing product recommendations and pricing in online platforms. Such experiments are often characterized by an exploration-exploitation tradeoff that is well-understood when at each time period feedback is received only on the action that was selected at that period. However, in many practical settings additional data may become available between decision epochs. We study the performance gain one may achieve when leveraging such auxiliary data, and the design of algorithm that effectively do so without prior information on the underlying information arrival process. We introduce a generalized formulation, which considers a broad class of distributions that are informative about rewards from actions, and allows observations from these distributions to arrive according to an arbitrary and a priori unknown process. When it is known how to map auxiliary data to reward estimates, we obtain matching bounds that characterize the best achievable performance as a function of the information arrival process. In terms of achieving optimal performance, we establish that upper confidence bound and Thompson sampling policies possess natural robustness with respect to the information arrival process, which uncovers a novel property of these popular algorithms and further lends credence to their appeal. When the mappings connecting auxiliary data and rewards are a unknown, we characterize a necessary and sufficient condition under which auxiliary information allows performance improvement, and devise a new policy that is near-optimal in that setting. We use data from a large media site to analyze the value that may be captured by leveraging auxiliary data for designing content recommendations
  • Access State: Open Access