De-noising auto-encoders can be pre-trained at a very large scale by noising and then reconstructing any input text. Existing methods, based on variations of masked languages models, have transformed the field and now provide the de facto initialization to be tuned for nearly every task. In this talk, I will present our work on sequence-to-sequence pre-training that introduces and carefully measures the impact of two new types of noising strategies. I will fist describe an approach that allows arbitrary noising, by learning to translate any corrupted text back to the original with standard Transformer-based neural machine translation architectures. I will show that the resulting mono-lingual (BART) and multi-lingual (mBART) models provide effective initialization for learning a wide range of discrimination and generation tasks, including question answer, summarization, and machine translation. I will also present our recently introduced MARGE model, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance with no fine-tuning, as well as consistent performance gain when fine tuned for individual tasks. Together, these techniques provide the most comprehensive set of pre-training methods to date, as well as the first viable alternative to the dominant masked language modeling pre-training paradigm.