Workshop: Workshop on Machine Learning Safety

Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety

Allen Schmaltz · Danielle Rasooly


When deploying Transformer networks, we seek the ability to introspect the predictions against instances with known labels; update the model without a full re-training; and provide reliable uncertainty quantification over the predictions. We demonstrate that these properties are achievable via recently proposed approaches for approximating deep neural networks with instance-based metric learners, at varying resolutions of the input, and the associated Venn-ADMIT Predictor for constructing prediction sets. We consider a challenging (but non-adversarial) task: Zero-shot sequence labeling (i.e., feature detection) in a low-accuracy, class-imbalanced, covariate-shifted setting while requiring a high confidence level.

Chat is not available.