Timezone: »

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models
Gideon S Mann · Ryan McDonald · Mehryar Mohri · Nathan Silberman · Dan Walker

Wed Dec 09 03:25 PM -- 03:26 PM (PST) @ None

Training conditional maximum entropy models on massive data requires significant time and computational resources. In this paper, we investigate three common distributed training strategies: distributed gradient, majority voting ensembles, and parameter mixtures. We analyze the worst-case runtime and resource costs of each and present a theoretical foundation for the convergence of parameters under parameter mixtures, the most efficient strategy. We present large-scale experiments comparing the different strategies and demonstrate that parameter mixtures over independent models use fewer resources and achieve comparable loss as compared to standard approaches.

Author Information

Gideon S Mann (Google Inc.)
Ryan McDonald (Google)
Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences)
Nathan Silberman
Dan Walker

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors