Skip to yearly menu bar Skip to main content


Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

Diogo Cruz · Edoardo Pona · Alex Holness-Tofts · Elias Schmied · VĂ­ctor Abia Alonso · Charlie J Griffin · Bogdan-Ionut Cirstea

Abstract

Chat is not available.