Skip to yearly menu bar Skip to main content


Poster

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Gideon S Mann · Ryan McDonald · Mehryar Mohri · Nathan Silberman · Dan Walker


Abstract:

Training conditional maximum entropy models on massive data requires significant time and computational resources. In this paper, we investigate three common distributed training strategies: distributed gradient, majority voting ensembles, and parameter mixtures. We analyze the worst-case runtime and resource costs of each and present a theoretical foundation for the convergence of parameters under parameter mixtures, the most efficient strategy. We present large-scale experiments comparing the different strategies and demonstrate that parameter mixtures over independent models use fewer resources and achieve comparable loss as compared to standard approaches.

Live content is unavailable. Log in and register to view live content