We propose an online replay-based Continual Learning policy, in which the learner stores data points to a local buffer and replays it during training. The core of our contribution is a new replay buffer content update policy that combines a Kullback-Leibler (K-L) loss and an appropriate modification of the celebrated Reservoir Sampling algorithm. The decisions at each time are, whether the newly arriving training data points will be inserted in the buffer, and which existing data points from the buffer will be substituted. We update the buffer content so that the proportion of stored data points from different classes in the buffer approximates a target distribution that depends on the empirical distribution of classes seen in the training data stream. We parameterize the target distribution with a single parameter that allows us to model different target class distributions in the buffer, ranging from the class distribution in the training data stream, the uniform class distribution, and a distribution with class percentages that are inversely proportional to those in the training data stream. We evaluate our method on MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 and we show that our method is superior to the state-of-the-art Reservoir Sampling algorithm. Our main finding is that the best (in terms of accuracy and forgetting) value of the parameter that determines the distribution of classes in the buffer versus that of the stream depends on statistics of the training data and on the dataset itself. Our work paves the way for further work to learn this parameter in the realistic scenario that it is unknown, thus contributing to the objective of an optimal replay-based continual learning approach that adapts to the specifics of each scenario.