The ability to generalize to unseen domains is crucial for machine learning systems, especially when we only have data from limited training domains and must deploy the resulting models in the real world. In this paper, we study domain generalization via the classic empirical risk minimization (ERM) approach with a simple regularizer based on the nuclear norm of the learned features from the training set. Theoretically, we provide intuitions on why nuclear norm regularization works better than ERM and ERM with L2 weight decay in linear settings. Empirically, we show that nuclear norm regularization achieves state-of-the-art average accuracy compared to existing methods in a wide range of domain generalization tasks (e.g. 1.7\% test accuracy improvements over the second-best baseline on DomainNet).