Timezone: »

Political Economy of Reinforcement Learning Systems (PERLS)
Thomas Gilbert · Stuart J Russell · Tom O Zick · Aaron Snoswell · Michael Dennis

Tue Dec 14 04:00 AM -- 01:45 PM (PST) @ None
Event URL: https://perls-workshop.github.io/ »

Sponsored by the Center for Human-Compatible AI at UC Berkeley, and with support from the Simons Institute and the Center for Long-Term Cybersecurity, we are convening a cross-disciplinary group of researchers to examine the near-term policy concerns of Reinforcement Learning (RL). RL is a rapidly growing branch of AI research, with the capacity to learn to exploit our dynamic behavior in real time. From YouTube’s recommendation algorithm to post-surgery opioid prescriptions, RL algorithms are poised to permeate our daily lives. The ability of the RL system to tease out behavioral responses, and the human experimentation inherent to its learning, motivate a range of crucial policy questions about RL’s societal implications that are distinct from those addressed in the literature on other branches of Machine Learning (ML).

Tue 4:00 a.m. - 5:00 a.m.
Pre-show meet and greet (Gather town session)  link »
Tue 5:00 a.m. - 5:20 a.m.
(Brief introduction)

Brief opening remarks from the workshop organizers

Aaron Snoswell · Thomas Gilbert · Michael Dennis · Tom O Zick
Tue 5:20 a.m. - 5:40 a.m.
[Plenary] Culturing PERLS (Mark Nitzberg) (Plenary presentation)   
Mark Nitzberg
Tue 5:40 a.m. - 5:55 a.m.
Audience Q+A for plenary presentation (Live Q+A)
Mark Nitzberg
Tue 5:55 a.m. - 6:00 a.m.
5 minute break (Break)  link »
Tue 6:00 a.m. - 6:05 a.m.
V&S | Theme and first speaker introduction (Brief introduction)
Thomas Gilbert
Tue 6:05 a.m. - 6:25 a.m.
V&S | [Presentation] Title TBA (Stuart Russell) (Presentation)
Stuart J Russell
Tue 6:25 a.m. - 6:30 a.m.
V&S | Audience Q+A for first speaker (Live Q+A)
Thomas Gilbert · Stuart J Russell
Tue 6:30 a.m. - 6:32 a.m.
V&S | Second speaker introduction (Brief introduction)
Thomas Gilbert
Tue 6:32 a.m. - 6:52 a.m.
V&S | [Presentation] Title TBA (Mireille Hildebrandt) (Presentation)
Mireille Hildebrandt
Tue 6:52 a.m. - 6:57 a.m.
V&S | Audience Q+A for second speaker (Live Q+A)
Thomas Gilbert · Mireille Hildebrandt
Tue 6:57 a.m. - 7:40 a.m.
V&S | [Panel discussion] (Live panel discussion)
Thomas Gilbert · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques
Tue 7:40 a.m. - 7:50 a.m.
10 minute break (Break)  link »
Tue 7:50 a.m. - 8:00 a.m.
LAF | Theme and speaker introductions (Brief introduction)
Jakob Foerster
Tue 8:00 a.m. - 8:10 a.m.
LAF | [Presentation] "Legitimacy" in the Computational Elicitation of Preferences in Mechanism Design (Jake Goldenfein) (Short presentation)   
Jake Goldenfein
Tue 8:10 a.m. - 8:20 a.m.
LAF | [Presentation] The Role of Explanation in RL Legitimacy, Accountability, and Feedback (Finale Doshi-Velez) (Short presentation)   
Finale Doshi-Velez
Tue 8:20 a.m. - 8:30 a.m.
LAF | [Presentation] Evaluating Reinforcement Learners (Michael Littman) (Short presentation)
Michael Littman
Tue 8:30 a.m. - 9:15 a.m.
LAF | [Panel discussion] (Live panel discussion)
Jakob Foerster · Jake Goldenfein · Finale Doshi-Velez · Michael Littman · Jessica Forde · Evi Micha · Ivana Dusparic
Tue 9:15 a.m. - 10:00 a.m.
45 minute lunch break (Break)  link »
Tue 10:00 a.m. - 11:55 a.m.
Poster session for accepted papers (Gather town session)  link »
Tue 11:55 a.m. - 12:00 p.m.
5 minute break (Break)  link »
Tue 12:00 p.m. - 12:10 p.m.
TD | Theme and speaker introductions (Brief introduction)
Michael Dennis
Tue 12:10 p.m. - 12:20 p.m.
TD | [Presentation] Antimonopoly as a Tool for Democratization? (Ayse Yasar) (Short presentation)   
Ayse Yasar
Tue 12:20 p.m. - 12:30 p.m.
TD | [Presentation] Reinforcement of What? Shaping the Digitization of Judgement by Reinforcement Learning (Frank Pasquale) (Short presentation)   
Frank Pasquale
Tue 12:30 p.m. - 12:40 p.m.
TD | [Presentation] Metrics are Tricky (Rachel Thomas) (Short presentation)   
Rachel Thomas
Tue 12:40 p.m. - 1:30 p.m.
TD | [Panel Discussion] (Live panel discussion)
Michael Dennis · Ayse Yasar · Rachel Thomas · Mason Kortz
Tue 1:30 p.m. - 1:45 p.m.
Closing remarks (Brief conclusion)
Thomas Gilbert
(Poster) [ OpenReview  link »

Reinforcement learning (RL) techniques offer a versatile and powerful extension to the toolkit for computer scientists and marketplace designers for their use in online marketplaces. As the use of these techniques continues to expand, their application in online marketplaces raise questions of their appropriate use, particularly around issues of fairness and market transparency. I argue that the use of RL techniques, alongside similar calls in domains such as automated vehicle systems, is a problem of sociotechnical specification that faces a set of normative and regulatory challenges unique to marketplaces. I provide a selective overview of the RL literature as applied to markets to illustrate challenges associated with the use of RL techniques in online marketplaces. I conclude with a discussion of capacity-building in research and institutions that is required in order for benefits from algorithmically managed marketplaces to be realized for stakeholders and broader society.

Andrew Chong
(Poster) [ OpenReview  link »

Machine learning (ML) methods already permeate environmental decision-making, from processing high-dimensional data on earth systems to monitoring compliance with environmental regulations. Of the ML techniques available to address pressing environmental problems (e.g., climate change, biodiversity loss), Reinforcement Learning (RL) may both hold the greatest promise and present the most pressing perils. This paper explores how RL-driven policy refracts existing power relations in the environmental domain while also creating unique challenges to ensuring equitable and accountable environmental decision processes. We focus on how RL technologies shift the distribution of decision-making, agenda-setting, and ideological power between resource users, governing bodies, and private industry.

Melissa Chapman · Carl None Boettiger
(Poster) [ OpenReview  link »

The paper is not complete. We provide an extended abstract.

Philipp Zahn
(Poster) [ OpenReview  link »

With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine two of them. The first is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We introduce the framing of cognitive architectures to make this technical topic more accessible. The second issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment.

Koen Holtman
(Poster) [ OpenReview  link »

Public policies that supply public goods, especially those involve collaboration by limiting individual liberty, always give rise to controversies over governance legitimacy. Multi-Agent Reinforcement Learning (MARL) methods are appropriate for supporting the legitimacy of the public policies that supply public goods at the cost of individual interests. Among these policies, the inter-regional collaborative pandemic control is a prominent example, which has become much more important for an increasingly inter-connected world facing a global pandemic like COVID-19. Different patterns of collaborative strategies have been observed among different systems of regions, yet it lacks an analytical process to reason for the legitimacy of those strategies. In this paper, we use the inter-regional collaboration for pandemic control as an example to demonstrate the necessity of MARL in reasoning, and thereby legitimizing policies enforcing such inter-regional collaboration. Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods. Different optimal policies are learned by our MARL agents under different collaboration levels, which change in an interpretable pattern of collaboration that helps to balance the losses suffered by regions of different types, and consequently promotes the overall welfare. Meanwhile, policies learned with higher collaboration levels yield higher global rewards, which illustrates the benefit of, and thus provides a novel justification for the legitimacy of, promoting inter-regional collaboration. Therefore, our method shows the capability of MARL in computationally modeling and supporting the theory of calculus of consent, developed by Nobel Prize winner J. M. Buchanan.

Yang Hu · Xue (Steve) Liu

Author Information

Thomas Gilbert (UC Berkeley)
Stuart J Russell (UC Berkeley)
Tom O Zick (Harvard)

Tom Zick earned her PhD from UC Berkeley and  is a current fellow at the Berkman Klein Center for Internet and Society at Harvard. Her research bridges between AI ethics and law, with a focus on how to craft safe and equitable policy surrounding the adoption of AI in high-stakes domains. In the past, she has worked as a data scientist at the Berkeley Center for Law and Technology, evaluating the capacity of regulations to promote open government data. She has also collaborated with graduate students across social science and engineering to advocate for pedagogy reform focused on infusing social context into technical coursework. Outside of academia, Tom has crafted digital policy for the City of Boston as a fellow for the Mayor’s Office for New Urban Mechanics. Her current research centers on the near term policy concerns surrounding reinforcement learning.

Aaron Snoswell (Queensland University of Technology)

Aaron is a research fellow in computational law at the Australian Research Council Centre of Excellence for Autonomous Decision Making and Society. With a background in cross-disciplinary mechatronic engineering, Aaron’s Ph.D. research developed new theory and algorithms for Inverse Reinforcement Learning in the maximum conditional entropy and multiple intent settings. Aaron’s ongoing work investigates technical measures for achieving value alignment for autonomous decision making systems, and legal-theoretic models for AI accountability.

Michael Dennis (University of California Berkeley)

Michael Dennis is a 5th year grad student at the Center for Human-Compatible AI. With a background in theoretical computer science, he is working to close the gap between decision theoretic and game theoretic recommendations and the current state of the art approaches to robust RL and multi-agent RL. The overall aim of this work is to ensure that our systems behave in a way that is robustly beneficial. In the single agent setting, this means making decisions and managing risk in the way the designer intends. In the multi-agent setting, this means ensuring that the concerns of the designer and those of others in the society are fairly and justly negotiated to the benefit of all involved.

More from the Same Authors