Timezone: »

Workshop
Robustness in Sequence Modeling
Nathan Ng · Haoran Zhang · Vinith Suriyakumar · Chantal Shaib · Kyunghyun Cho · Yixuan Li · Alice Oh · Marzyeh Ghassemi

Fri Dec 02 07:00 AM -- 03:00 PM (PST) @ Room 290

As machine learning models find increasing use in the real world, ensuring their safe and reliable deployment depends on ensuring their robustness to distribution shift. This is especially true for sequential data, which occurs naturally in various data domains such as natural language processing, healthcare, computational biology, and finance. However, building models for sequence data which are robust to distribution shifts presents a unique challenge. Sequential data are often discrete rather than continuous, exhibit difficult to characterize distributions, and can display a much greater range of types of distributional shifts. Although many methods for improving model robustness exist for imaging or tabular data, extending these methods to sequential data is a challenging research direction that often requires fundamentally different techniques.

This workshop aims to facilitate progress towards improving the distributional robustness of models trained on sequential data by bringing together researchers to tackle a wide variety of research questions including, but not limited to:
(1) How well do existing robustness methods work on sequential data, and why do they succeed or fail?
(2) How can we leverage the sequential nature of the data to develop novel and distributionally robust methods?
(3) How do we construct and utilize formalisms for distribution shifts in sequential data?

We hope that this workshop provides a first step towards improving the robustness, and ultimately safety and reliability, of models in sequential data domains.

 Fri 7:00 a.m. - 7:15 a.m. Opening Remarks 🔗 Fri 7:15 a.m. - 8:00 a.m. Invited Talk: Behnam Neyshabur (Invited Talk) 🔗 Fri 8:00 a.m. - 8:10 a.m. An Invariant Learning Characterization of Controlled Text Generation (Spotlight)  link »    Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to building a predictor of the desired attribute.For example, researchers hoping to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In this paper, we show that the performance of controlled generation may be poor if the target distribution of text differs from the distribution the predictor was trained on. Instead, we take inspiration from causal representation learning and cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. Experiments demonstrate the promise and difficulty of adapting invariant learning methods, which have been primarily developed for vision, to text. Link » Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei 🔗 Fri 8:10 a.m. - 8:20 a.m. Exploiting Variable Correlation with Masked Modeling for Anomaly Detection in Time Series (Spotlight)  link »    Online anomaly detection in multi-variate time series is a challenging problem particularly when there is no supervision information. Autoregressive predictive models are often used for this task, but such detection methods often overlook correlations between variables observed in the most recent step and thus miss some anomalies that violate normal variable relations. In this work, we propose a masked modeling approach that captures variable relations and temporal relations in a single predictive model. Our method can be combined with a wide range of predictive models. Our experiment shows that our new masked modeling method improves detection performance over pure autoregressive models when the time series itself is not very predictable. Link » Panagiotis Lymperopoulos · Yukun Li · Liping Liu 🔗 Fri 8:20 a.m. - 8:30 a.m. Online Policy Optimization for Robust MDP (Spotlight)  link »    Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework---in which the transition probabilities belong to an uncertainty set around a nominal model---provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs. Link » Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang 🔗 Fri 8:30 a.m. - 9:30 a.m. Poster Session #1 (Poster Session) 🔗 Fri 9:30 a.m. - 10:00 a.m. Invited Talk: Dan Hendrycks (Invited Talk) 🔗 Fri 10:00 a.m. - 10:30 a.m. Invited Talk: Suchi Saria (Invited Talk) 🔗 Fri 10:30 a.m. - 11:30 a.m. Lunch Break (Break) 🔗 Fri 11:30 a.m. - 12:15 p.m. Invited Talk: Yejin Choi (Invited Talk) 🔗 Fri 12:15 p.m. - 12:25 p.m. Comparison of Uncertainty Quantification in Time Series Regression (Spotlight)  link »    Increasingly high-stakes decisions are made using neural networks in order to make predictions. Specifically, meteorologists and hedge funds apply these techniques to time series data. When it comes to prediction, there are certain limitations for machine learning models (such as lack of expressiveness, vulnerability of domain shifts and overconfidence) which can be solved using uncertainty estimation. There is a set of expectations regarding how uncertainty should behave". For instance, a wider prediction horizon should lead to more uncertainty or the model's confidence should be proportional to its accuracy. In this paper, different uncertainty estimation methods are compared to forecast meteorological time series data and evaluate these expectations. The results show how each uncertainty estimation method performs on the forecasting task, which partially evaluates the robustness of predicted uncertainty. Link » Levente Foldesi · Matias Valdenegro-Toro 🔗 Fri 12:25 p.m. - 12:35 p.m. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors (Spotlight)  link » Large language models often err during deployment due to non-representative training data or distribution shift in the test set. Recently, model editors have been proposed to fix errors by adjusting a pre-trained model's weights. However, these approaches quickly decay a model's performance on upstream data, and forget how to fix previous errors. We propose and study a novel Lifelong Model Editing setting, where errors stream into a deployed model and we update the model to correct its predictions without influencing it for unrelated inputs. We propose General Retrieval Adaptors for Continual Editing, or GRACE, which learns and caches a particular layer's activations in a codebook as edits stream in, while the original model weights remain frozen. This ensures similar edits are treated similarly without altering the model's performance on unrelated instances. Experimentally, we show that GRACE substantially improves over recent model editors. Link » Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi 🔗 Fri 12:35 p.m. - 12:45 p.m. Out-of-Distribution Detection and Selective Generation for Conditional Language Models (Spotlight)  link »    Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on out-of-distribution (OOD) inputs as the prediction is done auto-regressively over many steps. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. Link » Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu 🔗 Fri 12:45 p.m. - 1:45 p.m. Poster Session #2 (Poster Session) 🔗 Fri 1:45 p.m. - 2:15 p.m. Invited Talk: Byron Wallace (Invited Talk) 🔗 Fri 2:15 p.m. - 2:45 p.m. Invited Talk: He He (Invited Talk) 🔗 Fri 2:45 p.m. - 3:00 p.m. Closing Remarks 🔗 - Perturbation Augmentation for Fairer NLP (Poster)  link » Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i) language models (LMs) pre-trained on demographically perturbed corpora are typically more fair, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks, and (iii) fairness improvements do not come at the expense of performance on downstream tasks. Lastly, we discuss outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this exploration of neural demographic perturbation will help drive more improvement towards fairer NLP. Link » Rebecca Qian · Candace Ross · Jude Fernandes · Eric Michael Smith · Douwe Kiela · Adina Williams 🔗 - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors (Poster)  link » Large language models often err during deployment due to non-representative training data or distribution shift in the test set. Recently, model editors have been proposed to fix errors by adjusting a pre-trained model's weights. However, these approaches quickly decay a model's performance on upstream data, and forget how to fix previous errors. We propose and study a novel Lifelong Model Editing setting, where errors stream into a deployed model and we update the model to correct its predictions without influencing it for unrelated inputs. We propose General Retrieval Adaptors for Continual Editing, or GRACE, which learns and caches a particular layer's activations in a codebook as edits stream in, while the original model weights remain frozen. This ensures similar edits are treated similarly without altering the model's performance on unrelated instances. Experimentally, we show that GRACE substantially improves over recent model editors. Link » Thomas Hartvigsen · Swami Sankaranarayanan · Hamid Palangi · Yoon Kim · Marzyeh Ghassemi 🔗 - Feature Restricted Group Dropout for Robust Electronic Health Record Predictions (Poster)  link » Recurrent neural networks are commonly applied to electronic health records to capture complex relationships and model clinically relevant outcomes. However, it is commonplace for the covariates in electronic health records to change distributions. This work extends restricted feature interactions in recurrent neural networks to address foreseeable and unexpected covariate shifts. We extend on the previous work by 1) Introducing a deterministic feature rotation so that hyperparameter tuning can search through all combinations of features, 2) Introduce a sub-network specific dropout to ablate the influence of entire features at output of the hidden network, and 3) Extend the feature restrictions to the GRU-D network, which has been shown to be a stronger baseline for covariate shift recovery. We show that feature restricted GRU-D's may be more robust to certain perturbations. Manual intervention was not needed to confer robustness. Despite this, the LSTM was still the best model in nearly 50\% of the cases. Link » Bret Nestor · Anna Goldenberg · Marzyeh Ghassemi 🔗 - CLIFT : Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain (Poster)  link » This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question Answering task. The testbed includes 25k high-quality question-answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several deep-learning models under the proposed testbed. Despite impressive results on the original test set with no adaptive overfitting, the performance degrades when applied to new test sets, which leads to a distribution shift. Our findings emphasise the need for and the potential for increasing the robustness of clinical domain models under distributional shift. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shift. The test sets and codes to reproduce the experiments and evaluate new models against Clift are available at anonymous.github.io Link » Ankit Pal 🔗 - Probabilistic thermal stability prediction through sparsity promoting transformer representation (Poster)  link » Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre- trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks especially in the context of ML driven drug design. Link » Yevgen Zainchkovskyy · Jesper Ferkinghoff-Borg · Anja Bennett · Thomas Egebjerg · Nikolai Lorenzen · Per Greisen · Søren Hauberg · Carsten Stahlhut 🔗 - On the Abilities of Sequence Extrapolation with Implicit Models (Poster)  link » Deep neural networks excel on a variety of different tasks, often surpassing human intelligence. However, when presented with out-of-distribution data, these models tend to break down even on the simplest tasks. In this paper, we compare robustness in sequence modeling of implicitly-defined and classical deep learning models on a series of extrapolation tasks where the models are tested with out-of-distribution samples during inference time. Throughout our experiments, implicit models greatly outperform classical deep learning networks that overfit the training distribution. We showcase implicit models' unique advantages for sequence extrapolation thanks to their flexible and selective framework. Implicit models, with potentially unlimited depth, not only adapt well to out-of-distribution inputs but also understand the underlying structure of inputs much better. Link » Juliette Decugis · Alicia Tsai · Ashwin Ganesh · Max Emerling · Laurent El Ghaoui 🔗 - An Invariant Learning Characterization of Controlled Text Generation (Poster)  link » Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to building a predictor of the desired attribute.For example, researchers hoping to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In this paper, we show that the performance of controlled generation may be poor if the target distribution of text differs from the distribution the predictor was trained on. Instead, we take inspiration from causal representation learning and cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. Experiments demonstrate the promise and difficulty of adapting invariant learning methods, which have been primarily developed for vision, to text. Link » Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei 🔗 - Robustness of Neural Networks used in Electrical Motor Time-Series (Poster)  link » Electrical motors are widely used in industrial and emerging applications such as electrical automotive. Industrial 4.0 has led to the usage of neural networks for electrical motor tasks like fault detection, monitoring, and control of electrical motors. The growing increase of neural networks in safety-critical systems requires an in-depth analysis of their robustness and stability. This paper studies the robustness of neural networks used in time-series tasks like system modeling, signal denoising, speed-torque estimation, temperature estimation, and fault detection. The dataset collected for these problems has all types of noise from the operating environment, sensors, and the system itself. This affects the performance of different network architectures during training and inference. We train and analyze under perturbations several different architectures that range from simple linear, convolutional and sequential networks to complex networks like 1D ResNet and Transformers. Link » Sagar Verma · Kavya Gupta 🔗 - Quantifying Uncertainty in Foundation Models via Ensembles (Poster)  link » As large-scale foundation models begin to have increasing impact in real-world applications, to guarantee reliability and trustworthiness it is important for these models to "know what they don't know": to be capable of quantifying uncertainty about their own outputs. In this work, we propose disagreement of model ensembles as an effective and compute-efficient method to quantify uncertainty. We also conduct a systematic study of uncertainty quantification spanning multiple tasks - a synthetic string task, and natural language arithmetic and question-answering tasks - over a progression of increasingly out of distribution inputs. We find that considering ensemble disagreement results in improved uncertainty prediction over only considering a single model's likelihood. We hope that our investigation and results encourage more research in the area of uncertainty quantification in foundation models and the use of model ensembles. Link » Meiqi Sun · Wilson Yan · Pieter Abbeel · Igor Mordatch 🔗 - An Adaptive Temporal Attention Mechanism to Address Distribution Shifts (Poster)  link » With the goal of studying robust sequence modeling via time series, we propose a robust multi-horizon forecasting approach that adaptively reacts to distribution shifts on relevant time scales. It is common in many forecasting domains to observe slow or fast forecasting signals at different times. For example wind and river forecasts are slow changing during drought, but fast during storms. Our approach is based on the transformer architecture, that across many domains, has demonstrated significant improvements over other architectures. Several works benefit from integrating a temporal context to enhance the attention mechanism's understanding of the underlying temporal behavior. In this work, we propose an adaptive temporal attention mechanism that is capable to dynamically adapt the temporal observation window as needed. Our experiments on several real-world datasets demonstrate significant performance improvements over existing state-of-the-art methodologies. Link » sepideh koohfar · Laura Dietz 🔗 - Defend Against Textual Backdoor Attacks By Token Substitution (Poster)  link » Backdoor attacks are a type of malicious threat to deep neural networks (DNNs). The attacker injects a trigger into the model during the training process. The victim model behaves normally on data without the backdoor attack trigger but gives a prediction the same as the attacker-specified target. Backdoor attacks were first investigated in computer vision. The investigation of backdoor attacks has also emerged in natural language processing (NLP) recently. However, the study of defense methods against textual backdoor attacks is still insufficient. Especially, there are not enough methods available to protect against backdoor attacks using syntax as the trigger. In this paper, we propose a novel method that can effectively defend against syntactic backdoor attacks. Experiments show the effectiveness of our method on BERT for syntactic backdoor attacks when choosing five different syntaxes as triggers. Link » Xinglin Li · Yao Li · Minhao Cheng 🔗 - Behavioral Classification of Sequential Neural Activity Using Time Varying Recurrent Neural Networks (Poster)  link » Shifts in data distribution across time can strongly affect early classification of time-series data. When decoding behavior from neural activity, early detection of behavior may help in devising corrective neural stimulation before the onset of behavior. Recurrent Neural Networks (RNNs) are common models to model sequence data. However, standard RNNs are not able to handle data with temporal distribution shifts to guarantee robust classification across time. To enable the network to utilize all temporal features of the neural input data, and to enhance the memory of an RNN, we propose a novel approach: RNNs with time-varying weights, here termed Time-Varying RNNs (TV-RNNs). These models are able to not only predict the class of the time-sequence correctly but also lead to accurate classification earlier in the sequence than standard RNNs. In this work, we focus on early robust sequential classification of brain-wide neural activity across time using TV-RNNs as subjects perform a motor task. Link » Yongxu Zhang · Shreya Saxena 🔗 - Strategy-Aware Contextual Bandits (Poster)  link » Algorithmic tools are often used to make decisions about people in high-stakes domains. In the presence of such automated decision making, there is incentive for strategic agents to modify their input to the algorithm in order to receive a more desirable outcome. While previous work on strategic classification attempts to capture this phenomenon, these models fail to take into account the multiple actions a decision maker usually has at their disposal, and the fact that they often have access only to bandit feedback. In contrast, we capture this setting as a contextual bandit problem, in which a decision maker must take actions based on a sequence of strategically modified contexts. We provide a low-strategic-regret algorithm for the two action setting, and prove that sublinear strategic regret is generally not possible for settings in which the number of actions is greater than two. Along the way, we obtain impossibility results for multi-class strategic classification which may be of independent interest. Link » Keegan Harris · Chara Podimata · Steven Wu 🔗 - Revealing the Bias in Large Language Models via Reward Structured Questions (Poster)  link » The success of the large language models have been utterly demonstrated in the recent time. Using these models and fine tuning for the specific task at hand results in highly performing models. However, these models also learn biased representations from the data they have been trained on. In particular, several studies recently showed that language models can learn to be biased towards certain genders. Quite recently, several studies tried to eliminate this bias via proposing human feedback included in fine-tuning. In our study we show that by changing the question asked to the language model the log probabilities of the bias measured in the responses changes dramatically. Furthermore, in several cases the language model ends up providing a completely opposite response. The recent language models finetuned on the prior gender bias datasets do not resolve the actual problem, but rather alleviates the problem for the dataset on which the model is fine-tuned. We believe our results might lay the foundation for further alignment and safety problems in large language models. Link » Ezgi Korkmaz 🔗 - Online Policy Optimization for Robust MDP (Poster)  link » Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework---in which the transition probabilities belong to an uncertainty set around a nominal model---provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs. Link » Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang 🔗 - Exploiting Variable Correlation with Masked Modeling for Anomaly Detection in Time Series (Poster)  link » Online anomaly detection in multi-variate time series is a challenging problem particularly when there is no supervision information. Autoregressive predictive models are often used for this task, but such detection methods often overlook correlations between variables observed in the most recent step and thus miss some anomalies that violate normal variable relations. In this work, we propose a masked modeling approach that captures variable relations and temporal relations in a single predictive model. Our method can be combined with a wide range of predictive models. Our experiment shows that our new masked modeling method improves detection performance over pure autoregressive models when the time series itself is not very predictable. Link » Panagiotis Lymperopoulos · Yukun Li · Liping Liu 🔗 - Comparison of Uncertainty Quantification in Time Series Regression (Poster)  link » Increasingly high-stakes decisions are made using neural networks in order to make predictions. Specifically, meteorologists and hedge funds apply these techniques to time series data. When it comes to prediction, there are certain limitations for machine learning models (such as lack of expressiveness, vulnerability of domain shifts and overconfidence) which can be solved using uncertainty estimation. There is a set of expectations regarding how uncertainty should behave". For instance, a wider prediction horizon should lead to more uncertainty or the model's confidence should be proportional to its accuracy. In this paper, different uncertainty estimation methods are compared to forecast meteorological time series data and evaluate these expectations. The results show how each uncertainty estimation method performs on the forecasting task, which partially evaluates the robustness of predicted uncertainty. Link » Levente Foldesi · Matias Valdenegro-Toro 🔗 - Out-of-Distribution Detection and Selective Generation for Conditional Language Models (Poster)  link » Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on out-of-distribution (OOD) inputs as the prediction is done auto-regressively over many steps. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. Link » Jie Ren · Jiaming Luo · Yao Zhao · Kundan Krishna · Mohammad Saleh · Balaji Lakshminarayanan · Peter Liu 🔗 - Are Deep Sequence Classifiers Good at Non-Trivial Generalization? (Poster)  link » Recent advances in deep learning models for sequence classification have greatly improved their classification accuracy, specially when large training sets are available. However, several works have suggested that under some settings the predictions made by these models are poorly calibrated. In this work we study binary sequence classification problems and we look at model calibration from a different perspective by asking the question: Are deep learning models capable of learning the underlying target class distribution? We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. We develop an evaluation that measures how well a classifier is learning the target class distribution. In addition, our evaluation disentangles good performance achieved by mere compression of the training sequences versus performance achieved by proper model generalization. Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner, i.e. by proper generalization beyond data compression. Link » Francesco Cazzaro · Ariadna Quattoni · Xavier Carreras 🔗 - Conditional COT-GAN for Video Prediction with Kernel Smoothing (Poster)  link » Causal Optimal Transport (COT) results from imposing a temporal causality constraint on classic optimal transport problems.Relying on recent work of COT-GAN optimized for sequential learning, the contribution of the present paper is twofold. First, we develop a conditional version of COT-GAN suitable for sequence prediction. This means that the dataset is now used in order to learn how a sequence will evolve given the observation of its past evolution. Second, we improve on the convergence results by working with modifications of the empirical measures via kernel smoothing. The resulting kernel conditional COT-GAN (KCCOT-GAN) algorithm is illustrated with an application for video prediction. Link » Tianlin Xu · Beatrice Acciaio 🔗

#### Author Information

##### Kyunghyun Cho (Genentech / NYU)

Kyunghyun Cho is an associate professor of computer science and data science at New York University and a research scientist at Facebook AI Research. He was a postdoctoral fellow at the Université de Montréal until summer 2015 under the supervision of Prof. Yoshua Bengio, and received PhD and MSc degrees from Aalto University early 2014 under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.