Recent gains in model robustness towards out-of-distribution images are predominantly achieved through ever-increasing large-scale datasets. While this approach is very effective in achieving human-level distortion robustness, it raises the question of whether human robustness, too, requires massive amounts of experience. We therefore investigated the developmental trajectory of human object recognition robustness by comparing children aged 4–6, 7–9, 10–12, 13–15 against adults and against different neural networks. Assessing how recognition accuracy degrades when images are distorted by salt-and-pepper noise, we find that while overall performance improves with age, even the youngest children in the study showed remarkable robustness and outperformed standard CNNs on moderately distorted images. Using a back-of-the-envelope calculation, we then estimated the number of `images' that those young children had been exposed to during their lifetime. Even if we assume that a new image is seen every 2 seconds of wake time, children aged 4--6 only saw approximately 50M images, which is already lower than 90 epochs $\times$ 1.3M images during standard ImageNet training. This indicates that human out-of-distribution robustness develops very early and may not require seeing billions of different images during lifetime given the right choice of representation and information processing optimised during evolution.