Empirical studies have recently established that training differentially private models (with DP-SGD) results in disparities between classes. These works follow methodology from \emph{public} models in computing per-class accuracy and then comparing the worst-off class accuracy with other groups or with the overall accuracy. However, DP-SGD adds additional noise during model training and results in models that vary in prediction output across epochs and runs. Thus, it is largely unclear how to measure disparities in private models in the presence of noise; particularly when classes are not independent. In this work, we run extensive experiments by training state-of-the-art private models with various levels of privacy and find that DP training tends to over- or under-predict specific classes, leading to large variations in disparities between classes.