Given that Neural Networks generalize unreasonably well in the IID setting, OOD presents a useful failure case to study their generalization performance. Recent studies have shown that a carefully trained ERM gives good performance in Domain Generalization (DG), with train samples from all domains randomly shuffled in each batch of training. Moreover, methods like MIRO can boost test performance of NN under distribution shift without training data being explicitly annotated with domain information. We present a new setting beyond the Traditional DG (TDG) called Class-wise DG (CWDG) benchmark, where for each class, we randomly select one of the domains and keep it aside for testing. Despite being exposed to all domains during training, our experiments show that the performance of neural network drop in this framework compared to Traditional DG(TDG). We evaluate popular DG methods in this setting and show that some methods that the performance are correlated for most methods but a few. Finally, we propose a novel method called Iterative Domain Feature Masking(IDFM), achieving state-of-the-art results on the proposed benchmark.