Skip to yearly menu bar Skip to main content


Online Learning and Pricing for Network Revenue Management with Reusable Resources

Huiwen Jia · Cong Shi · Siqian Shen

Hall J (level 1) #942

Keywords: [ queueing network ] [ network ] [ mixing times ] [ pricing ] [ bandits ] [ stochastic systems ]

Abstract: We consider a price-based network revenue management problem with multiple products and multiple reusable resources. Each randomly arriving customer requests a product (service) that needs to occupy a sequence of reusable resources (servers). We adopt an incomplete information setting where the firm does not know the price-demand function for each product and the goal is to dynamically set prices of all products to maximize the total expected revenue of serving customers. We propose novel batched bandit learning algorithms for finding near-optimal pricing policies, and show that they admit a near-optimal cumulative regret bound of $\tilde{O}(J\sqrt{XT})$, where $J$, $X$, and $T$ are the numbers of products, candidate prices, and service periods, respectively. As part of our regret analysis, we develop the first finite-time mixing time analysis of an open network queueing system (i.e., the celebrated Jackson Network), which could be of independent interest. Our numerical studies show that the proposed approaches perform consistently well.

Chat is not available.