Workshop
Learning in Presence of Strategic Behavior
Omer BenPorat · Nika Haghtalab · Annie Liang · Yishay Mansour · David Parkes
In recent years, machine learning has been called upon to solve increasingly more complex tasks and to regulate many aspects of our social, economic, and technological world. These applications include learning economic policies from data, prediction in financial markets, learning personalize models across population of users, and ranking qualified candidates for admission, hiring, and lending. These tasks take place in a complex social and economic context where the learners and objects of learning are often people or organizations that are impacted by the learning algorithm and, in return, can take actions that influence the learning process. Learning in this context calls for a new vision for machine learning and economics that aligns the incentives and interests of the learners and other parties and is robust to the evolving social and economic needs. This workshop explores a view of machine learning and economics that considers interactions of learning systems with a wide range of social and strategic behaviors. Examples of these problems include: multiagent learning systems, welfareaware machine learning, learning from strategic and economic data, learning as a behavioral model, and causal inference for learning impact of strategic choices.
Schedule
Mon 8:50 a.m.  9:00 a.m.

Opening remarks
(
Remarks
)
SlidesLive Video 
🔗 
Mon 9:00 a.m.  9:40 a.m.

Keynote: Michael I. Jordan (On DynamicsInformed Blending of Machine Learning and Game Theory)
(
Keynote
)
SlidesLive Video Statistical decisions are often given meaning in the context of other decisions, particularly when there are scarce resources to be shared. Managing such sharing is one of the classical goals of microeconomics, and it is given new relevance in the modern setting of large, humanfocused datasets, and in dataanalytic contexts such as classifiers and recommendation systems. I'll discuss several recent projects that aim to explore the interface between machine learning and microeconomics, including the study of explorationexploitation tradeoffs for bandit learning algorithms that compete over a scarce resource, leader/follower dynamics in strategic classification, and the robust learning of optimal auctions. 
🔗 
Mon 9:40 a.m.  10:20 a.m.

Keynote: Susan Athey (Machine Learning with Strategic Agents: Lessons from Incentive Theory and Econometrics)
(
Keynote
)
SlidesLive Video This talk will apply insights from classic economic incentive theory to problems in machine learning and artificial intelligence. In particular, it will apply multitask theory to the design of A/B testing platforms and bandit objective functions, and it will analyze humanAI interaction as a problem of optimal delegation. We will also consider how informational asymmetries and observability problems can lead to systematic challenges for learning by bidders at auctions. 
🔗 
Mon 10:20 a.m.  10:50 a.m.

Discussion with Michael Jordan and Susan Athey, moderated by Kevin LeytonBrown
(
Discussion Panel
)
SlidesLive Video Learning and Economics: A Vision for Future Research and Development 
🔗 
Mon 11:10 a.m.  11:17 a.m.

Spotlight 1: Exploration and Incentives in Reinforcement Learning
(
Spotlights
)
SlidesLive Video 
Max Simchowitz · Aleksandrs Slivkins 🔗 
Mon 11:17 a.m.  11:20 a.m.

Q&A for Spotlight 1
(
Q&A
)

🔗 
Mon 11:20 a.m.  11:27 a.m.

Spotlight 2: Models of fairness in federated learning
(
Spotlights
)
SlidesLive Video 
Kate Donahue · Jon Kleinberg 🔗 
Mon 11:27 a.m.  11:30 a.m.

Q&A for Spotlight 2
(
Q&A
)

🔗 
Mon 11:30 a.m.  11:37 a.m.

Spotlight 3: Efficient Competitions and Online Learning with Strategic Forecasters
(
Spotlights
)
SlidesLive Video 
Anish Thilagar · Rafael Frongillo · Bo Waggoner · Robert Gomez 🔗 
Mon 11:37 a.m.  11:40 a.m.

Q&A for Spotlight 3
(
Q&A
)

🔗 
Mon 11:40 a.m.  11:47 a.m.

Spotlight 4: Estimation of Standard Asymmetric Auction Models
(
Spotlights
)
SlidesLive Video 
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis 🔗 
Mon 11:47 a.m.  11:50 a.m.

Q&A for Spotlight 4
(
Q&A
)

🔗 
Mon 11:50 a.m.  11:57 a.m.

Spotlight 5: Strategic clustering
(
Spotlights
)
SlidesLive Video 
AnaAndreea Stoica · Christos Papadimitriou 🔗 
Mon 11:57 a.m.  12:00 p.m.

Q&A for Spotlight 5
(
Q&A
)

🔗 
Mon 12:00 p.m.  1:00 p.m.

Poster Session ( Poster Session ) link  🔗 
Mon 1:00 p.m.  1:40 p.m.

Keynote: Dorsa Sadigh (Theory and Practice of PartnerAware Algorithms in MultiAgent Coordination)
(
Keynote
)
SlidesLive Video Today I will be discussing some of the challenges and lessons learned in partner modeling in decentralized multiagent coordination. We will start with discussing the role of representation learning in learning effective latent partner strategies and how one can leverage the learned representations within a reinforcement learning loop for achieving coordination, collaboration, and influencing. We will then extend the notion of influencing beyond optimizing for longhorizon objectives, and analyze how strategies that stabilize latent partner representations can be effective in reducing nonstationarity and achieving a more desirable learning outcome. Finally, we will formalize the problem of decentralized multiagent coordination as a collaborative multiarmed bandit with partial observability, and demonstrate that partner modeling strategies are effective approaches for achieving logarithmic regret. 
🔗 
Mon 1:40 p.m.  2:20 p.m.

Keynote: Vince Conitzer (Automated Mechanism Design for Strategic Classification)
(
Keynote
)
SlidesLive Video AI is increasingly making decisions, not only for us, but also about us  from whether we are invited for an interview, to whether we are proposed as a match for someone looking for a date, to whether we are released on bail. Often, we have some control over the information that is available to the algorithm; we can selfreport some information, and other information we can choose to withhold. This creates a potential circularity: the classifier used, mapping submitted information to outcomes, depends on the (training) data that people provide, but the (test) data depend on the classifier, because people will reveal their information strategically to obtain a more favorable outcome. This setting is not adversarial, but it is also not fully cooperative. Mechanism design provides a framework for making good decisions based on strategically reported information, and it is commonly applied to the design of auctions and matching mechanisms. However, the setting above is unlike these common applications, because in it, preferences tend to be similar across agents, but agents are restricted in what they can report. This creates both new challenges and new opportunities. I will discuss both our theoretical work and our initial experiments. (This is joint work with Hanrui Zhang, Andrew Kephart, Yu Cheng, Anilesh Krishnaswamy, Haoming Li, and David Rein. Papers on these topics can be found at: https://users.cs.duke.edu/~conitzer/bytopic.html#automated%20mechanism%20design) 
🔗 
Mon 2:20 p.m.  2:50 p.m.

Discussion with Dorsa Sadigh and Vincent Conitzer, moderated by Peter Stone
(
Discussion Panel
)
SlidesLive Video Achieving Agent Coordination in Theory and Practice 
🔗 
Mon 2:50 p.m.  3:00 p.m.

Concluding Remarks
(
Remarks
)
SlidesLive Video 
🔗 


Efficient Competitions and Online Learning with Strategic Forecasters
(
Poster
)
link
Winnertakeall competitions in forecasting and machinelearning suffer from distorted incentives.Witkowski et al. identified this problem and proposed ELF, a truthful mechanism to select a winner.We show that, from a pool of $n$ forecasters, ELF requires $\Theta(n\log n)$ events or test data points to select a nearoptimal forecaster with high probability.We then show that standard online learning algorithms select an $\epsilon$optimal forecaster using only $O(\log(n) / \epsilon^2)$ events, by way of a strong approximatetruthfulness guarantee.This bound matches the best possible even in the nonstrategic setting.We then apply these mechanisms to obtain the first noregret guarantee for nonmyopic strategic experts.

Anish Thilagar · Rafael Frongillo · Bo Waggoner · Robert Gomez 🔗 


Alternative Microfoundations for Strategic Classification
(
Oral
)
link
When reasoning about strategic behavior in a machine learning context it is tempting to combine standard microfoundations of rational agents with the statistical decision theory underlying classification. In this work, we argue that a direct combination of these standard ingredients leads to brittle solution concepts of limited descriptive and prescriptive value. First, we show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule that we often do not observe empirically. Second, when any positive fraction of agents is not perfectly strategic, desirable stable pointswhere the classifier is optimal for the data it entailscease to exist. Third, optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden within a broad class of possible assumptions about agent behavior.Recognizing these limitations we explore alternatives to standard microfoundations for binary classification. To navigate the space of possible assumptions about how agents respond to a decision rule we specify a set of desiderata our model should satisfy and that help mitigate the limitations of the standard model. We propose noisy response as a promising candidate model. Inspired by smoothed analysis and empirical observations, noisy response incorporates natural imperfection in the agent responses. This model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality. 
Meena Jagadeesan · Celestine MendlerDünner · Moritz Hardt 🔗 


Estimation of Standard Asymmetric Auction Models
(
Oral
)
link
We provide efficient methods for learning strategic agents' underlying bid and value distributions by observing only the outcomes of their repeated interaction in a variety of standard auction models. In particular, given a finite set of observationseach only comprising the identity of the winner and the price they paidin a sequence of auctions involving the same set of ex ante asymmetric bidders with independent private values, we provide algorithms for nonparametrically estimating the bid distribution of each bidder to within Wasserstein, Kolmogorov, or total variation distance on the effective support of these distributions. We provide convergence bounds for the attained distance in terms of the number of observations, number of bidders, and other relevant parameters of the problem, which are uniform in that they do not depend on the bid distributions being estimated. For firstprice auctions (where bid distributions and equilibrium value distributions do not coincide) we also show provide finitesample estimation results for agents' value distributions at BayesNash equilibrium. Our estimation guarantees advance a body of work at the intersection of machine learning and econometrics with partial sample observability wherein only identification results have been previously obtained in our setting. 
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis 🔗 


NearOptimal NoRegret Learning in General Games
(
Oral
)
link
We show that Optimistic Hedge  a common variant of multiplicativeweightsupdates with recency bias  attains ${\rm poly}(\log T)$ regret in multiplayer generalsum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard noregret learners in games, the $O(T^{1/4})$ regret attainable by noregret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of twoplayer games (Chen & Peng, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.

Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich 🔗 


Bounded Rationality for MultiAgent Motion Planning and Behavior Learning
(
Oral
)
link
When a robot shares the same workspace with other intelligent agents (e.g., other robots or humans), the robot must be able to reason the behaviors of its neighboring agents while accomplishing its designated task. We assume other agents do not explicitly share their decisions or reasoning processes with the robot. Thus the robot needs to reason itself for the consequences of all other agents, the process of which demands prohibitive computational resources. We observe that in practice, oftentimes predicting the absolutely optimal agent behaviors is not necessary. Instead, it is more desirable that the solution can be computed with a closetooptimal (suboptimal) but affordable (tractable) computation time. We propose to incorporate the concept of bounded rationality (from an informationtheoretic viewpoint) into the generalsum stochastic game setting. Specifically, the robot computes a policy under the information processing constraint, represented as KLdivergence between the default and optimized stochastic policies. The solution to the boundedoptimal policy can be obtained by an importance sampling approach. We show pilot results that this framework allows the robot to (1) effectively tradeoff between solution optimality and limited computational time; (2) efficiently learn the suboptimal behaviors of other agents. 
Junhong Xu · Kai Yin · Lantao Liu 🔗 


Information Discrepancy in Strategic Learning
(
Oral
)
link
We study the effects of information discrepancy across subpopulations on their ability to simultaneously improve their features in strategic learning settings. Specifically, we consider a game where a principal deploys a decision rule in an attempt to optimize the whole population's welfare, and agents strategically adapt to it to receive better scores. Inspired by reallife settings, such as loan approvals and college admissions, we remove the typical assumption made in the strategic learning literature that the decision rule is fully known to the agents, and focus on settings where it is inaccessible. In their lack of knowledge, individuals try to infer this rule by learning from their peers (e.g., friends and acquaintances who previously applied for a loan), naturally forming groups in the population, each with possibly different type and level of information about the decision rule. In our equilibrium analysis, we show that the principal's decision rule optimizing the welfare across subgroups} may cause a surprising negative externality; the true quality of some of the subgroups can actually deteriorate. On the positive side, we show that in many natural cases, optimal improvement is guaranteed simultaneously for all subgroups in equilibrium. We also characterize the disparity in improvements across subgroups via a measure of their informational overlap. Finally, we complement our theoretical analysis with experiments on realworld datasets. 
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani 🔗 


When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits
(
Oral
)
link
In cooperative bandits, a framework that captures essential features of collective sequential decision making, agents can minimize group regret, and thereby improve performance, by leveraging shared information. However, sharing information can be costly, which motivates developing policies that minimize group regret while also reducing the number of messages communicated by agents. Existing cooperative bandit algorithms obtain optimal performance when agents share information with their neighbors at \textit{every time step}, i.e., full communication. This requires $\Theta(T)$ number of messages, where $T$ is the time horizon of the decision making process. We propose \textit{ComEx}, a novel costeffective communication protocol in which the group achieves the same order of performance as full communication while communicating only $O(\log T)$ number of messages. Our key step is developing a method to identify and only communicate the information crucial to achieving optimal performance. Further we propose novel algorithms for several benchmark cooperative bandit frameworks and show that our algorithms obtain \textit{stateoftheart} performance while consistently incurring a significantly smaller communication cost than existing algorithms.

Udari Madhushani · Naomi Leonard 🔗 


Gaming Helps! Learning from Strategic Interactions in Natural Dynamics
(
Oral
)
link
We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenariosfrom credit assessment to school admissionsposing a challenge for the learner. Surprisingly, we find that such strategic manipulations may in fact help the learner recover the meaningful variablesthat is, the features that, when changed, affect the true label (as opposed to nonmeaningful features that have no effect). We show that even simple behavior on the learner's part allows her to simultaneously i) accurately recover the meaningful features, and ii) incentivize agents to invest in these meaningful features, providing incentives for improvement. 
Yahav Bechavod · Katrina Ligett · Steven Wu · Juba Ziani 🔗 


Interactive Robust Policy Optimization for MultiAgent Reinforcement Learning
(
Oral
)
link
As machine learning is applied more to realworld problems like robotics, control of autonomous vehicles, drones, and recommendation systems, it becomes essential to consider the notion of agency where multiple agents with local observations start impacting each other and interact to achieve their goals. Multiagent reinforcement learning (MARL) is concerned with developing learning algorithms that can discover effective policies in multiagent environments. In this work, we develop algorithms for addressing two critical challenges in MARL  nonstationarity and robustness. We show that naive independent reinforcement learning does not preserve the strategic gametheoretic interaction between the agents, and we present a way to realize the classical infinite order recursion reasoning in a reinforcement learning setting. We refer to this framework as Interactive Policy Optimization (IPO) and derive four MARL algorithms using centralizedtrainingdecentralizedexecution that generalize the widely used singleagent policy gradient methods to multiagent settings. Finally, we provide a method to estimate opponent's parameters in adversarial settings using maximum likelihood and integrate IPO with an adversarial learning framework to train agents robust to destabilizing disturbances from the environment/adversaries and for better sim2real transfer from simulated multiagent environments to the real world. 
Videh Nema · Balaraman Ravindran 🔗 


Negotiating networks in oligopoly markets for price sensitive products
(
Oral
)
link
We present a novel framework to learn functions that estimate decisions of sellers and buyers simultaneously in a oligopoly market for a pricesensitive product. In this setting, the aim of the seller network is to come up with a price for a given context such that the expected revenue is maximised by considering the buyer's satisfaction as well. On the other hand, the aim of the buyer network is to assign probability of purchase to the offered price to mimic the real world buyers' responses while also showing price sensitivity through its action. In other words, rejecting the unnecessarily high priced products. Similar to generative adversarial networks, this framework corresponds to a minimax twoplayer game. In our experiments with simulated and realworld transaction data, we compared our framework with the baseline model and demonstrated its potential through proposed evaluation metrics. 
naman shukla 🔗 


Strategic clustering
(
Oral
)
link
This paper studies the problem of clustering through the lens of utility and welfare. In particular, we ask, how much does the quality of the clustering  typically measured by the conductance, or by the number of edges cut, or the average distance to the centers  deteriorate if the nodes are strategic and can change clusters? And among reasonable utilities for the nodes, which one hurts quality the least? We investigate these questions both theoretically, by studying the equilibria of hedonic games (simplified clustering games with unconstrained number of clusters), and experimentally, by measuring the quality of pure Nash equilibria of more realistic clustering games. We introduce a new utility function for the nodes which we call closeness, and which we believe is an attractive alternative to previously studied node utilities. We study the properties of the closeness utility theoretically and demonstrate experimentally its advantages over other established utilities such as the modified fractional utility. Finally, we present a polynomialtime algorithm which, given a clustering with optimal quality, finds another clustering with better average utility, and in fact the one that maximizes the ratio of the gain in average utility over the loss in quality. This submission and the topics covered are particularly related to the theme of the Strategic ML workshop, especially to welfareaware machine learning and modeling strategic behavior. In the case of clustering, this work aims to cover gaps between the classical ML algorithms for clustering that optimize certain connectivity metrics (such as conductance) and incentives that people have in creating or changing groups. With an increasing demand for algorithmic tools in identifying and facilitating groups, understanding the incentives people may have in group formation become important in order to ensure retention, stability, and success of groups. This work aims to open avenues of research for investigating tradeoffs between robustness and welfare in designing algorithms for clustering. 
AnaAndreea Stoica · Christos Papadimitriou 🔗 


One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning
(
Oral
)
link
Federated learning is highly studied as an approach to collaboration across large populations, however, little has been explored in how to coordinate resources to maintain such data collaborations over time. Inspired by game theoretic notions, this paper introduces a framework for incentiveaware learning and data sharing in federated learning. Our notions of stable and envyfree equilibria formalize notions of fairness and stability for selfinterested agents. For example, in an envyfree equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we show that, in the worst case, a $\Omega(\sqrt{k})$ gap exists between the sampleminimizing and equilibrium solutions across realistic learning models.

Richard Phillips · Han Shao · Avrim Blum · Nika Haghtalab 🔗 


Regret, stability, and fairness in matching markets with bandit learners
(
Oral
)
link
We consider the twosided matching market with bandit learners. In a matching market, there are two sets of agentsusers and providersthat seek to be matched onetoone. Contrary to a core assumption in the literature, users and providers often do not know their true matching preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multiarmed bandit problems. One recent work finds that it is possible to assign matchings that are stable (i.e., no pair of agents is incentivized to defect) at every time step while also allowing agents to learn enough so that the system converges to matchings that are stable under the agents' true preferences. However, the authors also uncover an impossibility result: under their setup, one cannot simultaneously guarantee stability and low optimal regret. In this work, we study the twosided matching market with bandit learners under costs and transfers in order to faithfully model competition between agents. We show that, in contrast to previous work, it is possible to simultaneously guarantee four desiderata: (1) stability, (2) low optimal regret, (3) fairness in the distribution of regret, and (4) high social welfare. 
Sarah Cen · Devavrat Shah 🔗 


On classification of strategic agents who can both game and improve
(
Oral
)
link
In this work, we consider classification of agents who can both game and improve. For example, people wishing to get a loan may be able to take some actions that increase their perceived creditworthiness and others that also increase their true creditworthiness. A decisionmaker would like to define a classification rule with few falsepositives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each.For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partialinformation learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finitepoint version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NPhard in our full linear model. We additionally provide an algorithm that determines whether there exists a linear classifier that classifies all agents accurately and causes all improvable agents to become qualified, and give additional results for lowdimensional data. 
Saba Ahmadi · Hedyeh Beyhaghi · Avrim Blum · Keziah Naggita 🔗 


The Strategic Perceptron
(
Oral
)
link
The classical Perceptron algorithm provides a simple and elegant procedure for learning a linear classifier. In each step, the algorithm observes the sample’s position and label and updates the current predictor accordingly if it makes a mistake. However, in presence of strategic agents that desire to be classified as positive and that are able to modify their position by a limited amount, the classifier may not be able to observe the true position of agents but rather a position where the agent pretends to be. Unlike the original setting with perfect knowledge of positions, in this situation the Perceptron algorithm fails to achieve its guarantees, and we illustrate examples with the predictor oscillating between two solutions forever, making an unbounded number of mistakes even though a perfect largemargin linear classifier exists. Our main contribution is providing a modified Perceptronstyle algorithm which makes a bounded number of mistakes in presence of strategic agents with both $\ell_2$ and weighted $\ell_1$ manipulation costs. In our baseline model, knowledge of the manipulation costs (i.e., the extent to which an agent may manipulate) is assumed. In our most general model, we relax this assumption and provide an algorithm which learns and refines both the classifier and its cost estimates to achieve good mistake bounds even when manipulation costs are unknown.

Saba Ahmadi · Hedyeh Beyhaghi · Avrim Blum · Keziah Naggita 🔗 


Price Discovery and Efficiency in Waiting Lists: A Connection to Stochastic Gradient Descent
(
Oral
)
link
Waiting lists allocate items by offering agents a choice among items with associated waiting times. These waiting times serve as prices that are formed by the stochastic arrivals and departures of agents. We study the allocative efficiency under such dynamically adjusting prices by drawing a connection between this price adjustment process and the stochastic gradient descent optimization algorithm. We show that the loss due to price fluctuations is bounded by the granularity of price changes. Additional conditions allow us to identify markets where the loss is close to the bound or exponentially small. Our results show that a simple price adjustment heuristic can perform well, but may be slow to adjust to changes in demand because of a tradeoff between the speed of adaptation and loss from price fluctuations. 
Itai Ashlagi · Jacob Leshno · Pengyu Qian · Amin Saberi 🔗 


Unfairness Despite Awareness: GroupFair Classification with Strategic Agents
(
Oral
)
link
The use of algorithmic decision making systems in domains which impact the financial, social, and political wellbeing of people has created a demand for these decision making systems to be ``fair'' under some accepted notion of equity. This demand has in turn inspired a large body of work focused on the development of fair learning algorithms which are then used in lieu of their conventional counterparts. Most analysis of such fair algorithms proceeds from the assumption that the people affected by the algorithmic decisions are represented as immutable feature vectors. However, strategic agents may possess both the ability and the incentive to manipulate this observed feature vector in order to attain a more favorable outcome. We explore the impact that strategic agent behavior could have on fair classifiers and derive conditions under which this behavior leads to fair classifiers becoming less fair than their conventional counterparts under the same measure of fairness that the fair classifier takes into account. These conditions are related to the the way in which the fair classifier remedies unfairness on the original unmanipulated data: fair classifiers which remedy unfairness by becoming more selective than their conventional counterparts are the ones that become less fair than their counterparts when agents are strategic. We further demonstrate that both the increased selectiveness of the fair classifier, and consequently the loss of fairness, arises when performing fair learning on domains in which the advantaged group is overrepresented in the region near (and on the beneficial side of) the decision boundary of conventional classifiers. Finally, we observe experimentally, using several datasets and learning methods, that this \emph{fairness reversal} is common, and thatour theoretical characterization of the fairness reversal conditions indeed holds in most such cases. 
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik 🔗 


Learning Losses for Strategic Classification
(
Oral
)
link
Strategic classification, i.e. classification under possible strategic manipulations of features, has received a lot of attention from both the machine learning and the game theory community. Most works focus on analysing the optimal decision rule under such manipulations. In our work we take a learning theoretic perspective, focusing on the sample complexity needed to learn a good decision rule which is robust to strategic manipulation. We perform this analysis under a strategic manipulation loss that takes into account both the accuracy of the final decision and the vulnerability to manipulation. We analyse the sample complexity for a known graph of possible manipulations in terms of the complexity of the function class and the manipulation graph. Additionally, we address the problem of unknown manipulation capabilities of the involved agents. Using techniques from transfer learning theory, we define a similarity measure for manipulation graphs and show that learning outcomes are robust with respect to small changes in the manipulation graph. Lastly we analyse the (sample complexity of) learning of the manipulation capability of agents with respect to this similarity measures, providing a way to learn strategic classification with respect to an unknown manipulation graph. 
Tosca Lechner · Ruth Urner 🔗 


Normative disagreement as a challenge for Cooperative AI
(
Oral
)
link
Cooperation in settings where agents have both common and conflicting interests (mixedmotive environments) has recently received considerable attention in multiagent learning. However, the mixedmotive environments typically studied have a single cooperative outcome on which all agents can agree. Many realworld multiagent environments are instead bargaining problems (BPs): they have several Paretooptimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperationinducing learning algorithms fail to cooperate in BPs when there is room for \textit{normative disagreement} resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of \textit{normadaptive} policies. Normadaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of normadaptive policies and show in experiments that these significantly increase cooperation. However, normadaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness. 
Julian Stastny · Maxime Riché · Aleksandr Lyzhov · Johannes Treutlein · Allan Dafoe · Jesse Clifton 🔗 


Learning through Recourse under Censoring
(
Oral
)
link
Machine learning models are often deployed in dynamic systems that involve data collection and model updates. In consumer finance, for example, lenders deploy a model to automate lending decisions, collect data from applicants, and then use it to update the model for future applicants. In such systems, data collection suffers from \emph{selective labeling}, as we only observe outcomes for applicants who are approved. When these systems are initialized with models deny loans to a specific group, they exhibit \emph{censoring} whereby they deny loans to a group of consumers in perpetuity.In this work, we identify conditions when machine learning systems exhibit censoring. We study the ability to resolve censoring by providing \emph{recourse}  i.e., by providing applicants who are denied with actions that guarantee approval. We develop a method to learn linear classifiers with recourse guarantees, which allows model owners to update their models while providing prior applicants with a guarantee of approval. We benchmark our method and other strategies for exploration in their ability to resolve censoring. Our results illustrate the costs of each strategy on key stakeholders, provide insight into failure modes that lead to censoring, and highlight the importance of the feasibility of recourse. 
Jennifer Chien · Berk Ustun · Margaret Roberts 🔗 


Pessimistic Offline Reinforcement Learning with Multiple Agents
(
Oral
)
link
We study offline multiagent reinforcement learning (RL), which aims to learn an optimal policy based on historical data for multiple agents. Unlike online RL, accidental overestimation errors arising from function approximation can cumulatively arise and affect future iterations in the offline RL. In this paper, we extend the pessimistic value iteration algorithm to the multiagent setting: after obtaining the lower bound of the value function of each agent, we compute the optimistic policy by solving a generalsum matrix game with these lower bounds as the payoff matrices. Instead of finding the Nash Equilibrium of such a generalsum game, our algorithm solves for a Coarse Correlated Equilibrium (CCE) for the sake of computational efficiency. In the end, we prove an informationtheoretic lower bound of the regret for any algorithm on an offline dataset in the twoagent zerosum game. To the best of our knowledge, such a CCE scheme for pessimismbased algorithm has not appeared in the literature and can be of interest in its own right. We hope that our work can shed light on the future analysis of the equilibrium point of a multiagent Markov decision process given an offline dataset. 
Yihang Chen 🔗 


Promoting Resilience of MultiAgent Reinforcement Learning via ConfusionBased Communication
(
Oral
)
link
Agents operating in realworld settings are often faced with the need to adapt to unexpected changes in their environment. Recent advances in multiagent reinforcement learning (MARL) provide a variety of tools to support the ability of RL agents to deal with the dynamic nature of their environment, which may often be increased by the presence of other agents. In this work, we measure the resilience of a group of agents as the group’s ability to adapt to unexpected perturbations in the environment. To promote resilience, we suggest facilitating collaboration within the group, and offer a novel confusionbased communication protocol that requires an agent to broadcast its local observations that are least aligned with its previous experience. We present empirical evaluation of our approach on a set of simulated multitaxi settings. 
Ofir Abu · Sarah Keren · Matthias Gerstgrasser · Jeffrey S Rosenschein 🔗 


Game Redesign in Noregret Game Playing
(
Oral
)
link
We study the game redesign problem in which an external designer has the ability to change the payoff function in each round, but incurs a design cost for deviating from the original game. The players apply noregret learning algorithms to repeatedly play the changed games with limited feedback. The goals of the designer are to (i) incentivize all players to take a specific target action profile frequently; and (ii) incur small cumulative design cost. We present game redesign algorithms with the guarantee that the target action profile is played in T−o(T) rounds while incurring only o(T) cumulative design cost. Game redesign describes both positive and negative applications: a benevolent designer who incentivizes players to take a target action profile with better social welfare compared to the solution of the original game, or a malicious attacker whose target action profile benefits themselves but not the players. Simulations on four classic games illustrate our proposed redesign algorithms. 
Yuzhe Ma · Young Wu · Jerry Zhu 🔗 


PseudoCompetitive Games and Algorithmic Pricing
(
Oral
)
link
We study a new game of price competition amongst firms selling identical goods, whose defining property is that the revenue of a firm at a price is independent of any competing prices that are strictly lower. This game is motivated by a variety of customer choice models that induce this property, prominently those stemming from the wellknown satisficing heuristic for decisionmaking. Under a mild condition, we show that every purestrategy local Nash equilibrium in this game corresponds to a set of prices generated by the firms sequentially setting local bestresponse prices in a fixed order. In other words, despite being simultaneousmove games, equilibria have a sequentialmove equilibrium structure. We thus call these games pseudocompetitive games. We find that these games are often plagued by strictlylocal Nash equilibria, in which the price of a firm is only a local bestresponse to the competitor's price, when a globally optimal response with a potentially unboundedly higher payoff is available. We show that price dynamics resulting from the firms utilizing gradientbased pricing algorithms may often converge to such undesirable outcomes. We propose a new online learning approach to address this concern under certain regularity assumptions on the revenue curves. 
Chamsi Hssaine · Vijay Kamble · Siddhartha Banerjee 🔗 


Learning in Matrix Games can be Arbitrarily Complex
(
Oral
)
link
A growing number of machine learning architectures, such as Generative Adversarial Networks, rely on the design of games which implement a desired functionality via a Nash equilibrium. In practice these games have an implicit complexity (e.g.~from underlying datasets and the deep networks used) that makes directly computing a Nash equilibrium impractical or impossible. For this reason, numerous learning algorithms have been developed with the goal of iteratively converging to a Nash equilibrium. Unfortunately, the dynamics generated by the learning process can be very intricate and instances of training failure hard to interpret. In this paper we show that, in a strong sense, this dynamic complexity is inherent to games. Specifically, we prove that replicator dynamics, the continuoustime analogue of Multiplicative Weights Update, even when applied in a very restricted class of gamesknown as finite matrix gamesis rich enough to be able to approximate arbitrary dynamical systems. Our results are positive in the sense that they show the nearly boundless dynamic modelling capabilities of current machine learning practices, but also negative in implying that these capabilities may come at the cost of interpretability. As a concrete example, we show how replicator dynamics can effectively reproduce the wellknown strange attractor of Lonrenz dynamics (the "butterfly effect") while achieving no regret. 
Gabriel Andrade · Rafael Frongillo · Georgios Piliouras 🔗 


Testoptional Policies: Overcoming Strategic Behavior and Informational Gaps
(
Oral
)
link
Due to the Covid19 pandemic, more than 500 USbased colleges and universities went ``testoptional'' for admissions and promised that they would not penalize applicants for not submitting test scores, part of a longer trend to rethink the role of testing in college admissions. However, it remains unclear how (and whether) a college can simultaneously use test scores for those who submit them, while not penalizing those who do notand what that promise even means. We formalize these questions, and study how a college can overcome two challenges with optional testing: \textit{strategic applicants} (when those with low test scores can pretend to not have taken the test), and \textit{informational gaps} (it has more information on those who submit a test score than those who do not). We find that colleges can indeed do so, if and only if they are able to use information on who has test access and are willing to randomize admissions. 
Zhi Liu · Nikhil Garg 🔗 


Models of fairness in federated learning
(
Oral
)
link
In many realworld situations, data is distributed across multiple locations and can't be combined for training. For example, multiple hospitals might have comparable data related to patient outcomes. Federated learning is a novel distributed learning approach that allows multiple federating agents to jointly learn a model. While this approach might reduce the error each agent experiences, it also raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error experienced by another agent? In this work, we consider two notions of fairness that each may be appropriate in different circumstances: \emph{egalitarian fairness} (which aims to bound how dissimilar error rates can be) and \emph{proportional fairness} (which aims to reward players for contributing more data). For egalitarian fairness, we obtain a tight multiplicative bound on how widely error rates can diverge between agents federating together. For proportional fairness, we show that subproportional error (relative to the number of data points contributed) is guaranteed for any individually rational federating coalition. 
Kate Donahue · Jon Kleinberg 🔗 


Improving Robustness of Malware Classifiers using Adversarial Strings Generated from Perturbed Latent Representations
(
Oral
)
link
In malware behavioral analysis, the list of accessed and created files very often indicates whether the examined file is malicious or benign. However, malware authors are trying to avoid detection by generating random filenames and/or modifying used filenames with new versions of the malware. These changes represent realworld adversarial examples. The goal of this work is to generate realistic adversarial examples and improve the classifier's robustness against these attacks. Our approach learns latent representations of input strings in an unsupervised fashion and uses gradientbased adversarial attack methods in the latent domain to generate adversarial examples in the input domain. We use these examples to improve the classifier's robustness by training on the generated adversarial set of strings. Compared to classifiers trained only on perturbed latent vectors, our approach produces classifiers that are significantly more robust without a large tradeoff in standard accuracy. 
Marek Galovic · Branislav Bosansky · Viliam Lisy 🔗 


Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization
(
Oral
)
link
In an effort to improve the accuracy of credit lending decisions, many financial intuitions are now using predictions from machine learning models. While such predictions enjoy many advantages, recent research has shown that the predictions have the potential to be biased and unfair towards certain subgroups of the population. To combat this, several techniques have been introduced to help remove the bias and improve the overall fairness of the predictions. We introduce a new fairness technique, called Subgroup Threshold Optimizer (STO), that does not require any alternations to the input training data nor does it require any changes to the underlying machine learning algorithm, and thus can be used with any existing machine learning pipeline. STO works by optimizing the classification thresholds for individual subgroups in order to minimize the overall discrimination score between them. Our experiments on a realworld credit lending dataset show that STO can reduce gender discrimination by over 90%. 
Cecilia Ying · Stephen Thomas 🔗 


The Platform Design Problem
(
Oral
)
link
Online firms deploy suites of software platforms, where each platform is designed to interact with users during a certain activity, such as browsing, chatting, socializing, emailing, driving, etc. The economic and incentive structure of this exchange, as well as its algorithmic nature, have not been explored to our knowledge. We model this interaction as a Stackelberg game between a Designer and one or more Agents. We model an Agent as a Markov chain whose states are activities; we assume that the Agent’s utility is a linear function of the steadystate distribution of this chain. The Designer may design a platform for each of these activities/states; if a platform is adopted by the Agent, the transition probabilities of the Markov chain are affected, and so is the objective of the Agent. The Designer’s utility is a linear function of the steady state probabilities of the accessible states (that is, the ones for which the platform has been adopted), minus the development cost of the platforms. The underlying optimization problem of the Agent — that is, how to choose the states for which to adopt the platform — is an MDP. If this MDP has a simple yet plausible structure (the transition probabilities from one state to another only depend on the target state and the recurrent probability of the current state) the Agent’s problem can be solved by a greedy algorithm. The Designer’s optimization problem (designing a custom suite for the Agent so as to optimize, through the Agent’s optimum reaction, the Designer’s revenue), is in general NPhard to approximate within any finite ratio; however, in the special case, while still NPhard, has an FPTAS. These results generalize, under mild additional assumptions, from a single Agent to a distribution of Agents with finite support, as well as to the setting where other Designers have already created platforms, and the Designer must find the best response to the strategies of the other Designers. We discuss other implications of our results and directions of future research. 
Christos Papadimitriou · Kiran Vodrahalli · Mihalis Yannakakis 🔗 


Strategic Classification in the Dark
(
Oral
)
link
Strategic classification studies the interaction between a classification rule and the strategic agents it governs. Under the assumption that the classifier is known, rational agents respond to it by manipulating their features. However, in many reallife scenarios of highstake classification (e.g., credit scoring), the classifier is not revealed to the agents, which leads agents to attempt to learn the classifier and game it too. In this paper we generalize the strategic classification model to such scenarios. We define the ``price of opacity'' as the difference in prediction error between opaque and transparent strategyrobust classifiers, characterize it, and give a sufficient condition for this price to be strictly positive, in which case transparency is the recommended policy. Our experiments show how Hardt et al.’s robust classifier is affected by keeping agents in the dark. 
Ganesh Ghalme · Vineet Nair · Itay Eilat · Inbal TalgamCohen · Nir Rosenfeld 🔗 


Scoring Rules for Performative Binary Prediction
(
Oral
)
link
We construct a model of expert prediction where predictions can influence the state of the world. Under this model, we show through theoretical and numerical results that proper scoring rules can incentivize experts to manipulate the world with their predictions. We also construct a simple class of scoring rules that avoids this problem. 
Alan Chan 🔗 


Approximating Bayes Nash Equilibria in Auction Games via Gradient Dynamics
(
Oral
)
link
Auctions are modeled as Bayesian games with continuous type and action spaces.Computing equilibria in games is computationally hard in general and no exactsolution theory is known for auction games. Recent research aimed at learningpure strategy Bayes Nash equilibria in symmetric auction games. In contrast, weintroduce algorithms computing distributional strategies on a discretized versionof the game via convex online optimization. First, existence results for purestrategy Bayes Nash equilibria are more restricted than those for equilibria indistributional strategies. Second, the expected utility of agents is linear indistributional strategies. We show that if regularized convex online optimizationalgorithms converge in such games, then they converge to a distributional εequilibrium of the discretized game. Importantly, we prove that a distributional εequilibrium of the discretized game approximates an equilibrium in the continuousgame. In a large number of experiments, we show that the method approximatesthe analytical (pure) Bayes Nash equilibrium arbitrarily closely in a wide varietyof auction games. The method allows for interdependent valuations and differenttypes of utility functions and provides a foundation for broadly applicableequilibrium solvers. For small environments with only a few players and items,the techniques is much faster, and can solve equilibria in symmetric auction gamesin seconds. 
Maximilian Fichtl · Matthias Oberlechner · Martin Bichler 🔗 


Strategic Classification Made Practical
(
Oral
)
link
Strategic classification regards the problem of learning in settings where users can strategically modify their features to improve outcomes. This setting applies broadly and has received much recent attention. But despite its practical significance, work in this space has so far been predominantly theoretical. In this paper we present a learning framework for strategic classification that is practical. Our approach directly minimizes the “strategic” empirical risk, achieved by differentiating through the strategic response of users. This provides flexibility that allows us to extend beyond the original problem formulation and towards more realistic learning scenarios. A series of experiments demonstrates the effectiveness of our approach on various learning settings. 
Nir Rosenfeld · Sagi Levanon 🔗 


Nash Convergence of MeanBased Learning Algorithms in First Price Auctions
(
Oral
)
link
We consider repeated first price auctions where each bidder, having a deterministic type, learns to bid using a meanbased learning algorithm. We completely characterize the Nash convergence property of the bidding dynamics in two senses: (1) timeaverage: the fraction of rounds where bidders play a Nash equilibrium approaches to 1 in the limit; (2) lastiterate: the mixed strategy profile of bidders approaches to a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value:if the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in timeaverage and in lastiterate. if the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in timeaverage but not necessarily in lastiterate. if the number is one, the bidding dynamics may not converge to a Nash equilibrium in timeaverage nor in lastiterate. Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms. 
Xiaotie Deng · Xinyan Hu · Tao Lin · Weiqiang Zheng 🔗 


Bayesian Persuasion for Algorithmic Recourse
(
Oral
)
link
When faced with (automated) assessment rules, individuals can modify their observable features strategically to obtain better decisions. In many situations, decisionmakers deliberately keep the underlying assessment rule secret to avoid gaming. This forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decisionmaker sends a signal, i.e., an action recommendation, to the decision subject to incentivize them to take desirable actions. We formulate the principal's problem of finding the optimal Bayesian incentivecompatible signaling policy as an optimization problem and characterize it via a linear program. Through this characterization, we observe that while finding a BIC strategy can be simplified dramatically, the computational complexity of solving this linear program is closely tied to (1) the relative size of the agent's action space, and (2) the number of features utilized by the underlying decision rule. 
Keegan Harris · Valerie Chen · Joon Kim · Ameet S Talwalkar · Hoda Heidari · Steven Wu 🔗 


RewardFree Attacks in MultiAgent Reinforcement Learning
(
Oral
)
link
We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to disrupt realworld RL applications, such as autonomous vehicles or delivery drones, without knowing the victim's precise goals or reward function. We argue that one heuristic approach an attacker can use is to strategically maximize the entropy of the victim's policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a rewardfree exploration algorithm that maximizes the attacker's entropy during the exploration phase, and it then maximizes the victim's empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim’s reward to succeed. Hence, even if the victim's reward information is protected, rewardfree attacks, based only on observing behavior, underscore the need to better understand policy obfuscation when preparing to deploy reinforcement learning in real world applications. 
Ted Fujimoto · Tim Doster · Adam Attarian · Jill Brandenberger · Nathan Hodas 🔗 


Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments
(
Oral
)
link
We study blackbox reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori. That is, our attack makes minimum assumptions on the prior knowledge of the adversary: it has no initial knowledge of the environment or the learner, and neither does it observe the learner's internal mechanism except for its performed actions. We design a novel blackbox attack, U2, that can provably achieve a nearmatching performance to the stateoftheart whitebox attack, demonstrating the feasibility of reward poisoning even in the most challenging blackbox setting. 
Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla 🔗 


Global Convergence of MultiAgent Policy Gradient in Markov Potential Games
(
Oral
)
link
Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multiagent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multiagent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multiagent coordination. Counterintuitively, insights from normalform potential games do not carry over as MPGs can consist of settings where stategames can be zerosum games. In the opposite direction, Markov games where every stategame is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multiagent learning settings. 
Stefanos Leonardos · Will Overman · Ioannis Panageas · Georgios Piliouras 🔗 


ExplorationExploitation in MultiAgent Competition: Convergence with Bounded Rationality
(
Oral
)
link
The interplay between exploration and exploitation in competitive multiagent learning is still far from being well understood. Motivated by this, we study smooth Qlearning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Qlearning always converges to the unique quantalresponse equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zerosum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [15,34], we show that fast convergence of Qlearning in competitive settings obtains regardless of the number of agents and without any need for parameter finetuning. As showcased by our experiments in network zerosum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multiagent settings. 
Stefanos Leonardos · Kelly Spendlove · Georgios Piliouras 🔗 


Exploration and Incentives in Reinforcement Learning
(
Oral
)
link
How do you incentivize selfinterested agents to $\text{\emph{explore}}$ when they prefer to $\text{\emph{exploit}}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously.

Max Simchowitz · Aleksandrs Slivkins 🔗 


Timing is Money: The Impact of Arrival Order in BetaBernoulli Prediction Markets
(
Oral
)
link
Prediction markets are incentivebased mechanisms for eliciting and combining the diffused, private beliefs of traders about a future uncertain event such as a political election. Typically prediction markets maintain point estimates of forecast variables; however, exponentialfamily prediction markets define a class of cost functionbased marketmaking algorithms that maintain a complete, collective belief distribution over the underlying generative process of the event of interest (e.g. the probability density of the winner's voteshare). In this paper, we focus on concretizing a special case of this abstract framework we call the betaBernoulli market. We set up a multiagent simulation of the market ecosystem to experimentally investigate the interaction of this marketmaking algorithm with a heterogeneous trading population. We design a Bayesian trader model with explicit characterization of this heterogeneity with respect to two independent attributes: how rich a trader's private information is and how much wealth they initially have at their disposal. We are interested in gauging the interplay of the above attributes with the arrival order of traders, particularly in terms of the net profit accrued by different trader types. Our results strongly suggest that early arrival can dominate both wealth and informativeness as a factor in determining trader compensation under a variety of experimental conditions. 
Blake Martin · Mithun Chakraborty · Sindhu Kutty 🔗 


The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
(
Oral
)
link
We consider \emph{incentivized exploration}: a version of multiarmed bandits where the choice of arms is controlled by selfinterested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily large depending on the Bayesian priors, and scale exponentially in the number of arms. A more basic problem of sampling each arm once runs into similar factors.We focus on the \emph{price of incentives}: the loss in performance, broadly construed, incurred for the sake of incentivecompatibility. We prove that Thompson Sampling, a standard bandit algorithm, is incentivecompatible if initialized with sufficiently many data points. The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.The problem is largely reduced to that of sample complexity: how many rounds are needed? We address this question, providing matching upper and lower bounds and instantiating them in various corollaries. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the ``strength of beliefs". 
Mark Sellke · Aleksandrs Slivkins 🔗 


Coopetition Against an Amazon
(
Oral
)
link
This paper studies cooperative datasharing between competitors vying to predict a consumer's tastes. We design optimal datasharing schemes both for when firms compete only with each other, and for when they additionally compete with an Amazona company with more, better data. We derive conditions under which competitors benefit from coopetition, and under which full datasharing is optimal. 
Ronen Gradwohl · Moshe Tennenholtz 🔗 