Fri Dec 07 05:00 AM -- 03:30 PM (PST) @ Room 517 A
Razvan Pascanu · Yee Teh · Marc Pickett · Mark Ring
Continual learning (CL) is the ability of a model to learn continually from a stream of data, building on what was learnt previously, hence exhibiting positive transfer, as well as being able to remember previously seen tasks. CL is a fundamental step towards artificial intelligence, as it allows the agent to adapt to a continuously changing environment, a hallmark of natural intelligence. It also has implications for supervised or unsupervised learning. For example, when the dataset is not properly shuffled or there exists a drift in the input distribution, the model overfits the recently seen data, forgetting the rest -- phenomena referred to as catastrophic forgetting, which is part of CL and is something CL systems aim to address.
Continual learning is defined in practice through a series of desiderata. A non-complete lists includes:
* Online learning -- learning occurs at every moment, with no fixed tasks or data sets and no clear boundaries between tasks;
* Presence of transfer (forward/backward) -- the model should be able to transfer from previously seen data or tasks to new ones, as well as possibly new task should help improve performance on older ones;
* Resistance to catastrophic forgetting -- new learning does not destroy performance on previously seen data;
* Bounded system size -- the model capacity should be fixed, forcing the system use its capacity intelligently as well as gracefully forgetting information such to ensure maximising future reward;
* No direct access to previous experience -- while the model can remember a limited amount of experience, a continual learning algorithm should not have direct access to past tasks or be able to rewind the environment;
In the previous edition of the workshop the focus has been on defining a complete list of desiderata, of what a continual learning (CL) enabled system should be able to do. We believe that in this edition we should further constrain the discussion with a focus on how to evaluate CL and how it relates to other existing topics (e.g. life-long learning, transfer learning, meta-learning) and how ideas from these topics could be useful for continual learning.
Different aspects of continual learning are in opposition of each other (e.g. fixed model capacity and not-forgetting), which also raises the question of how to evaluate continual learning systems. One one hand, what are the right trade-offs between these different opposing forces? How do we compare existing algorithms given these different dimensions along which we should evaluate them (e.g. forgetting, positive transfer)? What are the right metrics we should report? On the other hand, optimal or meaningful trade-offs will be tightly defined by the data or at least type of tasks we use to test the algorithms. One prevalent task used by many recent papers is PermutedMNIST. But as MNIST is not a reliable dataset for classification, so PermutedMNIST might be extremely misleading for continual learning. What would be the right benchmarks, datasets or tasks for fruitfully exploiting this topic?
Finally, we will also encourage presentation of both novel approaches to CL and implemented systems, which will help concretize the discussion of what CL is and how to evaluate CL systems.