Footnote:
Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments December 16, 2020 erstellt
Description:
We study the fundamental model in joint pricing and inventory replenishment control under the learning-while-doing framework, with T consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-up-to level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows:1. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search methods that attain an O(T^{1/2}) regret, assuming the reward function is concave and only twice continuously differentiable.2. In the more general case of non-concave reward functions, we design an active tournament elimination method that attains O(T^{3/5}) regret, based also on the technical innovation of consistent estimates of reward differences at two prices.3. We complement the O(T^{3/5}) regret upper bound with a matching \Omega(T^{3/5}) regret lower bound. The lower bound is established by a novel information-theoretical argument based on generalized squared Hellinger distance, which is significantly different from conventional arguments that are based on Kullback-Leibler divergence. This lower bound shows that no learning-while-doing algorithm could achieve O(T^{1/2}) regret without assuming the reward function is concave, even if the sales revenue as a function of demand rate or price is concave.Both the upper bound technique based on the "difference estimator" and the lower bound technique based on generalized Hellinger distance are new in the literature, and can be potentially applied to solve other inventory or censored demand type problems that involve learning