NeurIPS Poster Optimal Rates for Bandit Nonstochastic Control

Poster

Optimal Rates for Bandit Nonstochastic Control

Y. Jennifer Sun · Stephen Newman · Elad Hazan

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

2023 Poster

Abstract: Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) control are foundational and extensively researched problems in optimal control. We investigate LQR and LQG problems with semi-adversarial perturbations and time-varying adversarial bandit loss functions. The best-known sublinear regret algorithm~\cite{gradu2020non} has a

T^{\frac{3}{4}}

time horizon dependence, and its authors posed an open question about whether a tight rate of

\sqrt{T}

could be achieved. We answer in the affirmative, giving an algorithm for bandit LQR and LQG which attains optimal regret, up to logarithmic factors. A central component of our method is a new scheme for bandit convex optimization with memory, which is of independent interest.

Chat is not available.