The Zero Resource Speech Challenge is a series that has been running since 2015, which aims to advance research in unsupervised training of speech and dialogue tools, with an application in speech technology for under-resourced languages. This year, we are running an "enhanced" version of the newest challenge task, language modelling from speech. This task asks participants to learn a sequential model that can assign probabilities to sequences---like a typical language model---but which must be trained, and operate, without any text. Assessing and improving on our ability to build such a model is critical to expanding applications such as speech recognition and machine translation to languages without textual resources. The "enhanced" version makes two modifications: it expands the call for submissions to the "high GPU budget" category, encouraging very large models in addition to the smaller, "lower-budget" ones experimented with up to now; and it includes a new, experimental "multi-modal" track, which allows participants to assess the performance of models that include images in training, in addition to audio. Baseline models are already prepared and evaluated for the high-budget and multi-modal settings.