Timezone: »
Safe Online Bid Optimization with Uncertain Return-On-Investment and Budget Constraints
Giulia Romano
Event URL: https://gather.town/MLECON2021 »
In online advertising, the advertiser's goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a minimum Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncertain budget and ROI constraints. We show that the problem is inapproximable within any factor unless $P = NP$ even without uncertainty, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. Furthermore, we show that no online learning algorithm can violate the (budget or ROI) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. We provide the $GCB_{safe}$ algorithm guaranteeing w.h.p.~a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret bound. However, a simple adaptation of $GCB_{safe}$ provides a sublinear pseudo-regret when accepting the satisfaction of the constraints with a fixed tolerance. Finally, we experimentally evaluate $GCB_{safe}$ in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data.
In online advertising, the advertiser's goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a minimum Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncertain budget and ROI constraints. We show that the problem is inapproximable within any factor unless $P = NP$ even without uncertainty, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. Furthermore, we show that no online learning algorithm can violate the (budget or ROI) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. We provide the $GCB_{safe}$ algorithm guaranteeing w.h.p.~a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret bound. However, a simple adaptation of $GCB_{safe}$ provides a sublinear pseudo-regret when accepting the satisfaction of the constraints with a fixed tolerance. Finally, we experimentally evaluate $GCB_{safe}$ in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data.
Author Information
Giulia Romano (Politecnico di Milano)
More from the Same Authors
-
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : A Unifying Framework for Online Safe Optimization »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti -
2022 Poster: A Unifying Framework for Online Optimization with Long-Term Constraints »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti