Timezone: »
As machine learning models find increasing use in the real world, ensuring their safe and reliable deployment depends on ensuring their robustness to distribution shift. This is especially true for sequential data, which occurs naturally in various data domains such as natural language processing, healthcare, computational biology, and finance. However, building models for sequence data which are robust to distribution shifts presents a unique challenge. Sequential data are often discrete rather than continuous, exhibit difficult to characterize distributions, and can display a much greater range of types of distributional shifts. Although many methods for improving model robustness exist for imaging or tabular data, extending these methods to sequential data is a challenging research direction that often requires fundamentally different techniques.
This workshop aims to facilitate progress towards improving the distributional robustness of models trained on sequential data by bringing together researchers to tackle a wide variety of research questions including, but not limited to:
(1) How well do existing robustness methods work on sequential data, and why do they succeed or fail?
(2) How can we leverage the sequential nature of the data to develop novel and distributionally robust methods?
(3) How do we construct and utilize formalisms for distribution shifts in sequential data?
We hope that this workshop provides a first step towards improving the robustness, and ultimately safety and reliability, of models in sequential data domains.
Fri 7:00 a.m. - 7:15 a.m.
|
Opening Remarks
SlidesLive Video » |
🔗 |
Fri 7:15 a.m. - 8:00 a.m.
|
Invited Talk: Behnam Neyshabur
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 8:00 a.m. - 8:10 a.m.
|
An Invariant Learning Characterization of Controlled Text Generation
(
Spotlight
)
link »
SlidesLive Video » Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to building a predictor of the desired attribute.For example, researchers hoping to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In this paper, we show that the performance of controlled generation may be poor if the target distribution of text differs from the distribution the predictor was trained on. Instead, we take inspiration from causal representation learning and cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. Experiments demonstrate the promise and difficulty of adapting invariant learning methods, which have been primarily developed for vision, to text. |
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei 🔗 |
Fri 8:10 a.m. - 8:20 a.m.
|
Exploiting Variable Correlation with Masked Modeling for Anomaly Detection in Time Series
(
Spotlight
)
link »
SlidesLive Video » Online anomaly detection in multi-variate time series is a challenging problem particularly when there is no supervision information. Autoregressive predictive models are often used for this task, but such detection methods often overlook correlations between variables observed in the most recent step and thus miss some anomalies that violate normal variable relations. In this work, we propose a masked modeling approach that captures variable relations and temporal relations in a single predictive model. Our method can be combined with a wide range of predictive models. Our experiment shows that our new masked modeling method improves detection performance over pure autoregressive models when the time series itself is not very predictable. |
Panagiotis Lymperopoulos · Yukun Li · Liping Liu 🔗 |
Fri 8:20 a.m. - 8:30 a.m.
|
Online Policy Optimization for Robust MDP
(
Spotlight
)
link »
SlidesLive Video » Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework---in which the transition probabilities belong to an uncertainty set around a nominal model---provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs. |
Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang 🔗 |
Fri 8:30 a.m. - 9:30 a.m.
|
Poster Session #1
(
Poster Session
)
|
🔗 |
Fri 9:30 a.m. - 10:00 a.m.
|
Invited Talk: Dan Hendrycks
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 10:00 a.m. - 10:30 a.m.
|
Invited Talk: Suchi Saria
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 10:30 a.m. - 11:30 a.m.
|
Lunch Break
|
🔗 |
Fri 11:30 a.m. - 12:15 p.m.
|
Invited Talk: Yejin Choi
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 12:15 p.m. - 12:25 p.m.
|
Comparison of Uncertainty Quantification in Time Series Regression
(
Spotlight
)
link »
SlidesLive Video » Increasingly high-stakes decisions are made using neural networks in order to make predictions. Specifically, meteorologists and hedge funds apply these techniques to time series data. When it comes to prediction, there are certain limitations for machine learning models (such as lack of expressiveness, vulnerability of domain shifts and overconfidence) which can be solved using uncertainty estimation. There is a set of expectations regarding how uncertainty should ``behave". For instance, a wider prediction horizon should lead to more uncertainty or the model's confidence should be proportional to its accuracy. In this paper, different uncertainty estimation methods are compared to forecast meteorological time series data and evaluate these expectations. The results show how each uncertainty estimation method performs on the forecasting task, which partially evaluates the robustness of predicted uncertainty. |
Levente Foldesi · Matias Valdenegro-Toro 🔗 |
Fri 12:25 p.m. - 12:35 p.m.
|
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
(
Spotlight
)
link »
Large language models often err during deployment due to non-representative training data or distribution shift in the test set. Recently, model editors have been proposed to fix errors by adjusting a pre-trained model's weights. However, these approaches quickly decay a model's performance on upstream data, and forget how to fix previous errors. We propose and study a novel Lifelong Model Editing setting, where errors stream into a deployed model and we update the model to correct its predictions without influencing it for unrelated inputs. We propose General Retrieval Adaptors for Continual Editing, or GRACE, which learns and caches a particular layer's activations in a codebook as edits stream in, while the original model weights remain frozen. This ensures similar edits are treated similarly without altering the model's performance on unrelated instances. Experimentally, we show that GRACE substantially improves over recent model editors. |
Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi 🔗 |
Fri 12:35 p.m. - 12:45 p.m.
|
Out-of-Distribution Detection and Selective Generation for Conditional Language Models
(
Spotlight
)
link »
SlidesLive Video » Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on out-of-distribution (OOD) inputs as the prediction is done auto-regressively over many steps. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. |
Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu 🔗 |
Fri 12:45 p.m. - 1:45 p.m.
|
Poster Session #2
(
Poster Session
)
|
🔗 |
Fri 1:45 p.m. - 2:15 p.m.
|
Invited Talk: Byron Wallace
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 2:15 p.m. - 2:45 p.m.
|
Invited Talk: He He
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 2:45 p.m. - 3:00 p.m.
|
Closing Remarks
SlidesLive Video » |
🔗 |
-
|
Perturbation Augmentation for Fairer NLP
(
Poster
)
link »
Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i) language models (LMs) pre-trained on demographically perturbed corpora are typically more fair, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks, and (iii) fairness improvements do not come at the expense of performance on downstream tasks. Lastly, we discuss outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this exploration of neural demographic perturbation will help drive more improvement towards fairer NLP. |
Rebecca Qian · Candace Ross · Jude Fernandes · Eric Michael Smith · Douwe Kiela · Adina Williams 🔗 |
-
|
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
(
Poster
)
link »
Large language models often err during deployment due to non-representative training data or distribution shift in the test set. Recently, model editors have been proposed to fix errors by adjusting a pre-trained model's weights. However, these approaches quickly decay a model's performance on upstream data, and forget how to fix previous errors. We propose and study a novel Lifelong Model Editing setting, where errors stream into a deployed model and we update the model to correct its predictions without influencing it for unrelated inputs. We propose General Retrieval Adaptors for Continual Editing, or GRACE, which learns and caches a particular layer's activations in a codebook as edits stream in, while the original model weights remain frozen. This ensures similar edits are treated similarly without altering the model's performance on unrelated instances. Experimentally, we show that GRACE substantially improves over recent model editors. |
Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi 🔗 |
-
|
Feature Restricted Group Dropout for Robust Electronic Health Record Predictions
(
Poster
)
link »
Recurrent neural networks are commonly applied to electronic health records to capture complex relationships and model clinically relevant outcomes. However, it is commonplace for the covariates in electronic health records to change distributions. This work extends restricted feature interactions in recurrent neural networks to address foreseeable and unexpected covariate shifts. We extend on the previous work by 1) Introducing a deterministic feature rotation so that hyperparameter tuning can search through all combinations of features, 2) Introduce a sub-network specific dropout to ablate the influence of entire features at output of the hidden network, and 3) Extend the feature restrictions to the GRU-D network, which has been shown to be a stronger baseline for covariate shift recovery. We show that feature restricted GRU-D's may be more robust to certain perturbations. Manual intervention was not needed to confer robustness. Despite this, the LSTM was still the best model in nearly 50\% of the cases. |
Bret Nestor · Anna Goldenberg · Marzyeh Ghassemi 🔗 |
-
|
CLIFT : Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
(
Poster
)
link »
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question Answering task. The testbed includes 25k high-quality question-answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several deep-learning models under the proposed testbed. Despite impressive results on the original test set with no adaptive overfitting, the performance degrades when applied to new test sets, which leads to a distribution shift. Our findings emphasise the need for and the potential for increasing the robustness of clinical domain models under distributional shift. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shift. The test sets and codes to reproduce the experiments and evaluate new models against Clift are available at anonymous.github.io |
Ankit Pal 🔗 |
-
|
Probabilistic thermal stability prediction through sparsity promoting transformer representation
(
Poster
)
link »
Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre- trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks especially in the context of ML driven drug design. |
Yevgen Zainchkovskyy · Jesper Ferkinghoff-Borg · Anja Bennett · Thomas Egebjerg · Nikolai Lorenzen · Per Greisen · Søren Hauberg · Carsten Stahlhut 🔗 |
-
|
On the Abilities of Sequence Extrapolation with Implicit Models
(
Poster
)
link »
Deep neural networks excel on a variety of different tasks, often surpassing human intelligence. However, when presented with out-of-distribution data, these models tend to break down even on the simplest tasks. In this paper, we compare robustness in sequence modeling of implicitly-defined and classical deep learning models on a series of extrapolation tasks where the models are tested with out-of-distribution samples during inference time. Throughout our experiments, implicit models greatly outperform classical deep learning networks that overfit the training distribution. We showcase implicit models' unique advantages for sequence extrapolation thanks to their flexible and selective framework. Implicit models, with potentially unlimited depth, not only adapt well to out-of-distribution inputs but also understand the underlying structure of inputs much better. |
Juliette Decugis · Alicia Tsai · Ashwin Ganesh · Max Emerling · Laurent El Ghaoui 🔗 |
-
|
An Invariant Learning Characterization of Controlled Text Generation
(
Poster
)
link »
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to building a predictor of the desired attribute.For example, researchers hoping to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In this paper, we show that the performance of controlled generation may be poor if the target distribution of text differs from the distribution the predictor was trained on. Instead, we take inspiration from causal representation learning and cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. Experiments demonstrate the promise and difficulty of adapting invariant learning methods, which have been primarily developed for vision, to text. |
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei 🔗 |
-
|
Robustness of Neural Networks used in Electrical Motor Time-Series
(
Poster
)
link »
Electrical motors are widely used in industrial and emerging applications such as electrical automotive. Industrial 4.0 has led to the usage of neural networks for electrical motor tasks like fault detection, monitoring, and control of electrical motors. The growing increase of neural networks in safety-critical systems requires an in-depth analysis of their robustness and stability. This paper studies the robustness of neural networks used in time-series tasks like system modeling, signal denoising, speed-torque estimation, temperature estimation, and fault detection. The dataset collected for these problems has all types of noise from the operating environment, sensors, and the system itself. This affects the performance of different network architectures during training and inference. We train and analyze under perturbations several different architectures that range from simple linear, convolutional and sequential networks to complex networks like 1D ResNet and Transformers. |
Sagar Verma · Kavya Gupta 🔗 |
-
|
Quantifying Uncertainty in Foundation Models via Ensembles
(
Poster
)
link »
As large-scale foundation models begin to have increasing impact in real-world applications, to guarantee reliability and trustworthiness it is important for these models to "know what they don't know": to be capable of quantifying uncertainty about their own outputs. In this work, we propose disagreement of model ensembles as an effective and compute-efficient method to quantify uncertainty. We also conduct a systematic study of uncertainty quantification spanning multiple tasks - a synthetic string task, and natural language arithmetic and question-answering tasks - over a progression of increasingly out of distribution inputs. We find that considering ensemble disagreement results in improved uncertainty prediction over only considering a single model's likelihood. We hope that our investigation and results encourage more research in the area of uncertainty quantification in foundation models and the use of model ensembles. |
Meiqi Sun · Wilson Yan · Pieter Abbeel · Igor Mordatch 🔗 |
-
|
An Adaptive Temporal Attention Mechanism to Address Distribution Shifts
(
Poster
)
link »
With the goal of studying robust sequence modeling via time series, we propose a robust multi-horizon forecasting approach that adaptively reacts to distribution shifts on relevant time scales. It is common in many forecasting domains to observe slow or fast forecasting signals at different times. For example wind and river forecasts are slow changing during drought, but fast during storms. Our approach is based on the transformer architecture, that across many domains, has demonstrated significant improvements over other architectures. Several works benefit from integrating a temporal context to enhance the attention mechanism's understanding of the underlying temporal behavior. In this work, we propose an adaptive temporal attention mechanism that is capable to dynamically adapt the temporal observation window as needed. Our experiments on several real-world datasets demonstrate significant performance improvements over existing state-of-the-art methodologies. |
sepideh koohfar · Laura Dietz 🔗 |
-
|
Defend Against Textual Backdoor Attacks By Token Substitution
(
Poster
)
link »
Backdoor attacks are a type of malicious threat to deep neural networks (DNNs). The attacker injects a trigger into the model during the training process. The victim model behaves normally on data without the backdoor attack trigger but gives a prediction the same as the attacker-specified target. Backdoor attacks were first investigated in computer vision. The investigation of backdoor attacks has also emerged in natural language processing (NLP) recently. However, the study of defense methods against textual backdoor attacks is still insufficient. Especially, there are not enough methods available to protect against backdoor attacks using syntax as the trigger. In this paper, we propose a novel method that can effectively defend against syntactic backdoor attacks. Experiments show the effectiveness of our method on BERT for syntactic backdoor attacks when choosing five different syntaxes as triggers. |
Xinglin Li · Yao Li · Minhao Cheng 🔗 |
-
|
Behavioral Classification of Sequential Neural Activity Using Time Varying Recurrent Neural Networks
(
Poster
)
link »
Shifts in data distribution across time can strongly affect early classification of time-series data. When decoding behavior from neural activity, early detection of behavior may help in devising corrective neural stimulation before the onset of behavior. Recurrent Neural Networks (RNNs) are common models to model sequence data. However, standard RNNs are not able to handle data with temporal distribution shifts to guarantee robust classification across time. To enable the network to utilize all temporal features of the neural input data, and to enhance the memory of an RNN, we propose a novel approach: RNNs with time-varying weights, here termed Time-Varying RNNs (TV-RNNs). These models are able to not only predict the class of the time-sequence correctly but also lead to accurate classification earlier in the sequence than standard RNNs. In this work, we focus on early robust sequential classification of brain-wide neural activity across time using TV-RNNs as subjects perform a motor task. |
Yongxu Zhang · Shreya Saxena 🔗 |
-
|
Strategy-Aware Contextual Bandits
(
Poster
)
link »
Algorithmic tools are often used to make decisions about people in high-stakes domains. In the presence of such automated decision making, there is incentive for strategic agents to modify their input to the algorithm in order to receive a more desirable outcome. While previous work on strategic classification attempts to capture this phenomenon, these models fail to take into account the multiple actions a decision maker usually has at their disposal, and the fact that they often have access only to bandit feedback. In contrast, we capture this setting as a contextual bandit problem, in which a decision maker must take actions based on a sequence of strategically modified contexts. We provide a low-strategic-regret algorithm for the two action setting, and prove that sublinear strategic regret is generally not possible for settings in which the number of actions is greater than two. Along the way, we obtain impossibility results for multi-class strategic classification which may be of independent interest. |
Keegan Harris · Chara Podimata · Steven Wu 🔗 |
-
|
Revealing the Bias in Large Language Models via Reward Structured Questions
(
Poster
)
link »
The success of the large language models have been utterly demonstrated in the recent time. Using these models and fine tuning for the specific task at hand results in highly performing models. However, these models also learn biased representations from the data they have been trained on. In particular, several studies recently showed that language models can learn to be biased towards certain genders. Quite recently, several studies tried to eliminate this bias via proposing human feedback included in fine-tuning. In our study we show that by changing the question asked to the language model the log probabilities of the bias measured in the responses changes dramatically. Furthermore, in several cases the language model ends up providing a completely opposite response. The recent language models finetuned on the prior gender bias datasets do not resolve the actual problem, but rather alleviates the problem for the dataset on which the model is fine-tuned. We believe our results might lay the foundation for further alignment and safety problems in large language models. |
Ezgi Korkmaz 🔗 |
-
|
Online Policy Optimization for Robust MDP
(
Poster
)
link »
Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework---in which the transition probabilities belong to an uncertainty set around a nominal model---provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs. |
Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang 🔗 |
-
|
Exploiting Variable Correlation with Masked Modeling for Anomaly Detection in Time Series
(
Poster
)
link »
Online anomaly detection in multi-variate time series is a challenging problem particularly when there is no supervision information. Autoregressive predictive models are often used for this task, but such detection methods often overlook correlations between variables observed in the most recent step and thus miss some anomalies that violate normal variable relations. In this work, we propose a masked modeling approach that captures variable relations and temporal relations in a single predictive model. Our method can be combined with a wide range of predictive models. Our experiment shows that our new masked modeling method improves detection performance over pure autoregressive models when the time series itself is not very predictable. |
Panagiotis Lymperopoulos · Yukun Li · Liping Liu 🔗 |
-
|
Comparison of Uncertainty Quantification in Time Series Regression
(
Poster
)
link »
Increasingly high-stakes decisions are made using neural networks in order to make predictions. Specifically, meteorologists and hedge funds apply these techniques to time series data. When it comes to prediction, there are certain limitations for machine learning models (such as lack of expressiveness, vulnerability of domain shifts and overconfidence) which can be solved using uncertainty estimation. There is a set of expectations regarding how uncertainty should ``behave". For instance, a wider prediction horizon should lead to more uncertainty or the model's confidence should be proportional to its accuracy. In this paper, different uncertainty estimation methods are compared to forecast meteorological time series data and evaluate these expectations. The results show how each uncertainty estimation method performs on the forecasting task, which partially evaluates the robustness of predicted uncertainty. |
Levente Foldesi · Matias Valdenegro-Toro 🔗 |
-
|
Out-of-Distribution Detection and Selective Generation for Conditional Language Models
(
Poster
)
link »
Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on out-of-distribution (OOD) inputs as the prediction is done auto-regressively over many steps. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. |
Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu 🔗 |
-
|
Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
(
Poster
)
link »
Recent advances in deep learning models for sequence classification have greatly improved their classification accuracy, specially when large training sets are available. However, several works have suggested that under some settings the predictions made by these models are poorly calibrated. In this work we study binary sequence classification problems and we look at model calibration from a different perspective by asking the question: Are deep learning models capable of learning the underlying target class distribution? We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. We develop an evaluation that measures how well a classifier is learning the target class distribution. In addition, our evaluation disentangles good performance achieved by mere compression of the training sequences versus performance achieved by proper model generalization. Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner, i.e. by proper generalization beyond data compression. |
Francesco Cazzaro · Ariadna Quattoni · Xavier Carreras 🔗 |
-
|
Conditional COT-GAN for Video Prediction with Kernel Smoothing
(
Poster
)
link »
Causal Optimal Transport (COT) results from imposing a temporal causality constraint on classic optimal transport problems.Relying on recent work of COT-GAN optimized for sequential learning, the contribution of the present paper is twofold. First, we develop a conditional version of COT-GAN suitable for sequence prediction. This means that the dataset is now used in order to learn how a sequence will evolve given the observation of its past evolution. Second, we improve on the convergence results by working with modifications of the empirical measures via kernel smoothing. The resulting kernel conditional COT-GAN (KCCOT-GAN) algorithm is illustrated with an application for video prediction. |
Tianlin Xu · Beatrice Acciaio 🔗 |
Author Information
Nathan Ng (Massachusetts Institute of Technology)
Haoran Zhang (Massachusetts Institute of Technology)
Vinith Suriyakumar (Massachusetts Institute of Technology)
Chantal Shaib
Kyunghyun Cho (Genentech / NYU)
Kyunghyun Cho is an associate professor of computer science and data science at New York University and a research scientist at Facebook AI Research. He was a postdoctoral fellow at the Université de Montréal until summer 2015 under the supervision of Prof. Yoshua Bengio, and received PhD and MSc degrees from Aalto University early 2014 under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.
Yixuan Li (University of Wisconsin-Madison)
Alice Oh (KAIST)
Marzyeh Ghassemi (MIT)
More from the Same Authors
-
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : KLUE: Korean Language Understanding Evaluation »
Sungjoon Park · Jihyung Moon · Sungdong Kim · Won Ik Cho · Ji Yoon Han · Jangwon Park · Chisung Song · Junseong Kim · Youngsook Song · Taehwan Oh · Joohong Lee · Juhyun Oh · Sungwon Lyu · Younghoon Jeong · Inkwon Lee · Sangwoo Seo · Dongjun Lee · Hyunwoo Kim · Myeonghwa Lee · Seongbo Jang · Seungwon Do · Sunkyoung Kim · Kyungtae Lim · Jongwon Lee · Kyumin Park · Jamin Shin · Seonghyun Kim · Lucy Park · Alice Oh · Jung-Woo Ha · Kyunghyun Cho -
2021 : Function-guided protein design by deep manifold sampling »
Vladimir Gligorijevic · Stephen Ra · Dan Berenberg · Richard Bonneau · Kyunghyun Cho -
2021 : Improving the Fairness of Deep Chest X-ray Classifiers »
Haoran Zhang · Natalie Dullerud · Karsten Roth · Stephen Pfohl · Marzyeh Ghassemi -
2022 Poster: SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning »
Haobo Wang · Mingxuan Xia · Yixuan Li · Yuren Mao · Lei Feng · Gang Chen · Junbo Zhao -
2022 : Multimodal Checklists for Fair Clinical Decision Support »
Qixuan Jin · Marzyeh Ghassemi -
2022 : Deep Metric Learning to predict cardiac pressure with ECG »
Hyewon Jeong · Marzyeh Ghassemi · Collin Stultz -
2022 : Identifying Disparities in Sepsis Treatment using Inverse Reinforcement Learning »
Hyewon Jeong · Taylor Killian · Sanjat Kanjilal · Siddharth Nayak · Marzyeh Ghassemi -
2022 : Evaluating and Improving Robustness of Self-Supervised Representations to Spurious Correlations »
Kimia Hamidieh · Haoran Zhang · Marzyeh Ghassemi -
2022 : Learning to Defer in Ranking Systems »
Aparna Balagopalan · Haoran Zhang · Elizabeth Bondi-Kelly · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 : Fair Active learning by exploiting causal data structure »
Sindhu C M Gowda · Haoran Zhang · Marzyeh Ghassemi -
2022 : Evaluation of Active Learning and Domain Adaptation on Health Data »
Kristina Holsapple · Haoran Zhang · Marzyeh Ghassemi -
2022 : A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences »
Nataša Tagasovska · Nathan Frey · Andreas Loukas · Isidro Hotzel · Julien Lafrance-Vanasse · Ryan Kelly · Yan Wu · Arvind Rajpal · Richard Bonneau · Kyunghyun Cho · Stephen Ra · Vladimir Gligorijevic -
2022 : PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design »
Ji Won Park · Samuel Stanton · Saeed Saremi · Andrew Watkins · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Richard Bonneau -
2022 : Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors »
Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi -
2022 : Feature Restricted Group Dropout for Robust Electronic Health Record Predictions »
Bret Nestor · Anna Goldenberg · Marzyeh Ghassemi -
2022 : Identifying Disparities in Sepsis Treatment by Learning the Expert Policy »
Hyewon Jeong · Siddharth Nayak · Taylor Killian · Sanjat Kanjilal · Marzyeh Ghassemi -
2022 : Identifying Disparities in Sepsis Treatment by Learning the Expert Policy »
Hyewon Jeong · Siddharth Nayak · Taylor Killian · Sanjat Kanjilal · Marzyeh Ghassemi -
2022 : EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation »
Jae Hyeon Lee · Payman Yadollahpour · Andrew Watkins · Nathan Frey · Andrew Leaver-Fay · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Aviv Regev · Richard Bonneau -
2022 : "Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts »
Haoran Zhang · Harvineet Singh · Marzyeh Ghassemi · Shalmali Joshi -
2022 : When Personalization Harms: Reconsidering the Use of Group Attributes of Prediction »
Vinith Suriyakumar · Marzyeh Ghassemi · Berk Ustun -
2022 : Real world relevance of generative counterfactual explanations »
Swami Sankaranarayanan · Thomas Hartvigsen · Lauren Oakden-Rayner · Marzyeh Ghassemi · Phillip Isola -
2022 : Just Following AI Orders: When Unbiased People Are Influenced By Biased AI »
Hammaad Adam · Aparna Balagopalan · Emily Alsentzer · Fotini Christia · Marzyeh Ghassemi -
2022 : Mitigating input-causing confounding in multimodal learning via the backdoor adjustment »
Taro Makino · Krzysztof Geras · Kyunghyun Cho -
2022 : Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling »
Romain Lopez · Nataša Tagasovska · Stephen Ra · Kyunghyun Cho · Jonathan Pritchard · Aviv Regev -
2022 : Dissecting In-the-Wild Stress from Multimodal Sensor Data »
Sujay Nagaraj · Thomas Hartvigsen · Adrian Boch · Luca Foschini · Marzyeh Ghassemi · Sarah Goodday · Stephen Friend · Anna Goldenberg -
2022 : Just Following AI Orders: When Unbiased People Are Influenced By Biased AI »
Hammaad Adam · Aparna Balagopalan · Emily Alsentzer · Fotini Christia · Marzyeh Ghassemi -
2022 : EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation »
Jae Hyeon Lee · Payman Yadollahpour · Andrew Watkins · Nathan Frey · Andrew Leaver-Fay · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Aviv Regev · Richard Bonneau -
2022 : Unsupervised Deep Metric Learning for the inference of hemodynamic value with Electrocardiogram signals »
Hyewon Jeong · Marzyeh Ghassemi · Collin Stultz -
2022 : Unsupervised Deep Metric Learning for the inference of hemodynamic value with Electrocardiogram signals »
Hyewon Jeong · Marzyeh Ghassemi · Collin Stultz -
2022 : Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors »
Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi -
2022 : Fair Multimodal Checklists for Interpretable Clinical Time Series Prediction »
Qixuan Jin · Haoran Zhang · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 : Fair Multimodal Checklists for Interpretable Clinical Time Series Prediction »
Qixuan Jin · Haoran Zhang · Thomas Hartvigsen · Marzyeh Ghassemi -
2022 Workshop: Learning from Time Series for Health »
Sana Tonekaboni · Thomas Hartvigsen · Satya Narayan Shukla · Gunnar Rätsch · Marzyeh Ghassemi · Anna Goldenberg -
2022 Poster: Generative multitask learning mitigates target-causing confounding »
Taro Makino · Krzysztof Geras · Kyunghyun Cho -
2022 Poster: Algorithms that Approximate Data Removal: New Results and Limitations »
Vinith Suriyakumar · Ashia Wilson -
2022 Poster: If Influence Functions are the Answer, Then What is the Question? »
Juhan Bae · Nathan Ng · Alston Lo · Marzyeh Ghassemi · Roger Grosse -
2022 Poster: SIREN: Shaping Representations for Detecting Out-of-Distribution Objects »
Xuefeng Du · Gabriel Gozum · Yifei Ming · Yixuan Li -
2022 Poster: Delving into Out-of-Distribution Detection with Vision-Language Representations »
Yifei Ming · Ziyang Cai · Jiuxiang Gu · Yiyou Sun · Wei Li · Yixuan Li -
2022 Poster: Is Out-of-Distribution Detection Learnable? »
Zhen Fang · Yixuan Li · Jie Lu · Jiahua Dong · Bo Han · Feng Liu -
2022 Poster: OpenOOD: Benchmarking Generalized Out-of-Distribution Detection »
Jingkang Yang · Pengyun Wang · Dejian Zou · Zitang Zhou · Kunyuan Ding · WENXUAN PENG · Haoqi Wang · Guangyao Chen · Bo Li · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Dan Hendrycks · Yixuan Li · Ziwei Liu -
2022 : Invited talk (Dr Alice Oh) - " The importance of multiple languages and multiple cultures in NLP research" »
Alice Oh -
2021 : Data Opportunities: unsolved medical problems and where new data can help »
Bin Yu · Regina Barzilay · Marzyeh Ghassemi · Emma Pierson -
2021 Workshop: Machine learning from ground truth: New medical imaging datasets for unsolved medical problems. »
Katy Haynes · Ziad Obermeyer · Emma Pierson · Marzyeh Ghassemi · Matthew Lungren · Sendhil Mullainathan · Matthew McDermott -
2021 : Uncovering the Deep Unknowns of ImageNet Model: Challenges and Opportunties »
Yixuan Li -
2021 : Function-guided protein design by deep manifold sampling »
Vladimir Gligorijevic · Stephen Ra · Dan Berenberg · Richard Bonneau · Kyunghyun Cho -
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 Poster: Learning Optimal Predictive Checklists »
Haoran Zhang · Quaid Morris · Berk Ustun · Marzyeh Ghassemi -
2021 Poster: True Few-Shot Learning with Language Models »
Ethan Perez · Douwe Kiela · Kyunghyun Cho -
2021 Poster: On the Importance of Gradients for Detecting Distributional Shifts in the Wild »
Rui Huang · Andrew Geng · Yixuan Li -
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2021 Poster: Can multi-label classification networks know what they don’t know? »
Haoran Wang · Weitang Liu · Alex Bocchieri · Yixuan Li -
2021 Poster: Emergent Communication under Varying Sizes and Connectivities »
Jooyeon Kim · Alice Oh -
2021 Poster: ReAct: Out-of-distribution Detection With Rectified Activations »
Yiyou Sun · Chuan Guo · Yixuan Li -
2021 Poster: Medical Dead-ends and Learning to Identify High-Risk States and Treatments »
Mehdi Fatemi · Taylor Killian · Jayakumar Subramanian · Marzyeh Ghassemi -
2020 Poster: Energy-based Out-of-distribution Detection »
Weitang Liu · Xiaoyun Wang · John Owens · Yixuan Li -
2020 : Policy Panel »
Roya Pakzad · Dia Kayyali · Marzyeh Ghassemi · Shakir Mohamed · Mohammad Norouzi · Ted Pedersen · Anver Emon · Abubakar Abid · Darren Byler · Samhaa R. El-Beltagy · Nayel Shafei · Mona Diab -
2020 Affinity Workshop: Muslims in ML »
Marzyeh Ghassemi · Mohammad Norouzi · Shakir Mohamed · Aya Salama · Tasmie Sarker -
2020 : Welcome »
Marzyeh Ghassemi -
2019 : Poster Session »
Pravish Sainath · Mohamed Akrout · Charles Delahunt · Nathan Kutz · Guangyu Robert Yang · Joseph Marino · L F Abbott · Nicolas Vecoven · Damien Ernst · andrew warrington · Michael Kagan · Kyunghyun Cho · Kameron Harris · Leopold Grinberg · John J. Hopfield · Dmitry Krotov · Taliah Muhammad · Erick Cobos · Edgar Walker · Jacob Reimer · Andreas Tolias · Alexander Ecker · Janaki Sheth · Yu Zhang · Maciej Wołczyk · Jacek Tabor · Szymon Maszke · Roman Pogodin · Dane Corneil · Wulfram Gerstner · Baihan Lin · Guillermo Cecchi · Jenna M Reinen · Irina Rish · Guillaume Bellec · Darjan Salaj · Anand Subramoney · Wolfgang Maass · Yueqi Wang · Ari Pakman · Jin Hyung Lee · Liam Paninski · Bryan Tripp · Colin Graber · Alex Schwing · Luke Prince · Gabriel Ocker · Michael Buice · Benjamin Lansdell · Konrad Kording · Jack Lindsey · Terrence Sejnowski · Matthew Farrell · Eric Shea-Brown · Nicolas Farrugia · Victor Nepveu · Jiwoong Im · Kristin Branson · Brian Hu · Ramakrishnan Iyer · Stefan Mihalas · Sneha Aenugu · Hananel Hazan · Sihui Dai · Tan Nguyen · Doris Tsao · Richard Baraniuk · Anima Anandkumar · Hidenori Tanaka · Aran Nayebi · Stephen Baccus · Surya Ganguli · Dean Pospisil · Eilif Muller · Jeffrey S Cheng · Gaël Varoquaux · Kamalaker Dadi · Dimitrios C Gklezakos · Rajesh PN Rao · Anand Louis · Christos Papadimitriou · Santosh Vempala · Naganand Yadati · Daniel Zdeblick · Daniela M Witten · Nicholas Roberts · Vinay Prabhu · Pierre Bellec · Poornima Ramesh · Jakob H Macke · Santiago Cadena · Guillaume Bellec · Franz Scherr · Owen Marschall · Robert Kim · Hannes Rapp · Marcio Fonseca · Oliver Armitage · Jiwoong Im · Thomas Hardcastle · Abhishek Sharma · Wyeth Bair · Adrian Valente · Shane Shang · Merav Stern · Rutuja Patil · Peter Wang · Sruthi Gorantla · Peter Stratton · Tristan Edwards · Jialin Lu · Martin Ester · Yurii Vlasov · Siavash Golkar -
2019 : Coffee Break and Poster Session »
Rameswar Panda · Prasanna Sattigeri · Kush Varshney · Karthikeyan Natesan Ramamurthy · Harvineet Singh · Vishwali Mhasawade · Shalmali Joshi · Laleh Seyyed-Kalantari · Matthew McDermott · Gal Yona · James Atwood · Hansa Srinivasan · Yonatan Halpern · D. Sculley · Behrouz Babaki · Margarida Carvalho · Josie Williams · Narges Razavian · Haoran Zhang · Amy Lu · Irene Y Chen · Xiaojie Mao · Angela Zhou · Nathan Kallus -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 Workshop: Context and Compositionality in Biological and Artificial Neural Systems »
Javier Turek · Shailee Jain · Alexander Huth · Leila Wehbe · Emma Strubell · Alan Yuille · Tal Linzen · Christopher Honey · Kyunghyun Cho -
2019 : Poster Session I »
Shuangjia Zheng · Arnav Kapur · Umar Asif · Eyal Rozenberg · Cyprien Gilet · Oleksii Sidorov · Yogesh Kumar · Tom Van Steenkiste · William Boag · David Ouyang · Paul Jaeger · Sheng Liu · Aparna Balagopalan · Deepta Rajan · Marta Skreta · Nikhil Pattisapu · Jann Goschenhofer · Viraj Prabhu · Di Jin · Laura-Jayne Gardiner · Irene Li · sriram kumar · Qiyuan Hu · Mehul Motani · Justin Lovelace · Usman Roshan · Lucy Lu Wang · Ilya Valmianski · Hyeonwoo Lee · Sunil Mallya · Elias Chaibub Neto · Jonas Kemp · Marie Charpignon · Amber Nigam · Wei-Hung Weng · Sabri Boughorbel · Alexis Bellot · Lovedeep Gondara · Haoran Zhang · Taha Bahadori · John Zech · Rulin Shao · Edward Choi · Laleh Seyyed-Kalantari · Emily Aiken · Ioana Bica · Yiqiu Shen · Kieran Chin-Cheong · Subhrajit Roy · Ioana Baldini · So Yeon Min · Dirk Deschrijver · Pekka Marttinen · Damian Pascual Ortiz · Supriya Nagesh · Niklas Rindtorff · Andriy Mulyar · Katharina Hoebel · Martha Shaka · Pierre Machart · Leon Gatys · Nathan Ng · Matthias Hüser · Devin Taylor · Dennis Barbour · Natalia Martinez · Clara McCreery · Benjamin Eyre · Vivek Natarajan · Ren Yi · Ruibin Ma · Chirag Nagpal · Nan Du · Chufan Gao · Anup Tuladhar · Sam Shleifer · Jason Ren · Pouria Mashouri · Ming Yang Lu · Farideh Bagherzadeh-Khiabani · Olivia Choudhury · Maithra Raghu · Scott Fleming · Mika Jain · GUO YANG · Alena Harley · Stephen Pfohl · Elisabeth Rumetshofer · Alex Fedorov · Saloni Dash · Jacob Pfau · Sabina Tomkins · Colin Targonski · Michael Brudno · Xinyu Li · Yiyang Yu · Nisarg Patel -
2019 Poster: Can Unconditional Language Models Recover Arbitrary Sentences? »
Nishant Subramani · Samuel Bowman · Kyunghyun Cho -
2019 Poster: The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers »
Alex Lu · Amy Lu · Wiebke Schormann · Marzyeh Ghassemi · David Andrews · Alan Moses -
2019 Tutorial: Imitation Learning and its Application to Natural Language Generation »
Kyunghyun Cho · Hal Daumé III -
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho -
2018 Workshop: Machine Learning for Health (ML4H): Moving beyond supervised learning in healthcare »
Andrew Beam · Tristan Naumann · Marzyeh Ghassemi · Matthew McDermott · Madalina Fiterau · Irene Y Chen · Brett Beaulieu-Jones · Michael Hughes · Farah Shamout · Corey Chivers · Jaz Kandola · Alexandre Yahi · Samuel Finlayson · Bruno Jedynak · Peter Schulam · Natalia Antropova · Jason Fries · Adrian Dalca · Irene Chen -
2018 Poster: Loss Functions for Multiset Prediction »
Sean Welleck · Zixin Yao · Yu Gai · Jialin Mao · Zheng Zhang · Kyunghyun Cho -
2017 : Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation »
Alice Oh · Bernhard Schölkopf -
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel -
2017 Workshop: Machine Learning for Health (ML4H) - What Parts of Healthcare are Ripe for Disruption by Machine Learning Right Now? »
Jason Fries · Alex Wiltschko · Andrew Beam · Isaac S Kohane · Jasper Snoek · Peter Schulam · Madalina Fiterau · David Kale · Rajesh Ranganath · Bruno Jedynak · Michael Hughes · Tristan Naumann · Natalia Antropova · Adrian Dalca · SHUBHI ASTHANA · Prateek Tandon · Jaz Kandola · Uri Shalit · Marzyeh Ghassemi · Tim Althoff · Alexander Ratner · Jumana Dakka -
2017 Poster: Saliency-based Sequential Image Attention with Multiset Prediction »
Sean Welleck · Jialin Mao · Kyunghyun Cho · Zheng Zhang -
2016 Workshop: Machine Learning for Health »
Uri Shalit · Marzyeh Ghassemi · Jason Fries · Rajesh Ranganath · Theofanis Karaletsos · David Kale · Peter Schulam · Madalina Fiterau -
2016 Poster: End-to-End Goal-Driven Web Navigation »
Rodrigo Nogueira · Kyunghyun Cho -
2016 Poster: Iterative Refinement of the Approximate Posterior for Directed Belief Networks »
R Devon Hjelm · Russ Salakhutdinov · Kyunghyun Cho · Nebojsa Jojic · Vince Calhoun · Junyoung Chung -
2015 Workshop: Multimodal Machine Learning »
Louis-Philippe Morency · Tadas Baltrusaitis · Aaron Courville · Kyunghyun Cho -
2015 Poster: Attention-Based Models for Speech Recognition »
Jan K Chorowski · Dzmitry Bahdanau · Dmitriy Serdyuk · Kyunghyun Cho · Yoshua Bengio -
2015 Spotlight: Attention-Based Models for Speech Recognition »
Jan K Chorowski · Dzmitry Bahdanau · Dmitriy Serdyuk · Kyunghyun Cho · Yoshua Bengio -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio -
2014 Demonstration: Neural Machine Translation »
Bart van Merriënboer · Kyunghyun Cho · Dzmitry Bahdanau · Yoshua Bengio -
2014 Poster: Iterative Neural Autoregressive Distribution Estimator NADE-k »
Tapani Raiko · Yao Li · Kyunghyun Cho · Yoshua Bengio