Abstract: Empirical Risk Minimization (ERM) problems are central to machine learning, and their efficient optimization has been studied from different perspectives, often taking advantage of the finite sum structure present in typical problem formulations. In particular, tight oracle complexity bounds have been established under fairly general assumptions about the loss functions. In this talk, I will present a rather surprising and general result that takes advantage of the separability of nonsmooth convex loss functions with efficiently computable proximal operators -- such as, e.g., the hinge loss and the sum of absolute errors -- to obtain an algorithm that exhibits significantly lower complexity than what is predicted by the lower bounds for general nonsmooth convex losses. I will then talk about how this result can be further improved for problems that can be stated in a form close to a linear program and discuss how these results relate to robust empirical risk minimization. The talk is based on joint results with Chaobing Song, Eric Lin, and Stephen Wright.