Skip to yearly menu bar Skip to main content


KeyNote Talk
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

Deploying efficient translation at every level of the stack

Kenneth Heafield

[ ]
Sat 16 Dec 6:20 a.m. PST — 6:45 a.m. PST

Abstract:

Practical efficient neural networks combine several optimizations ranging from assembly code to network structure. Yet most papers about optimization start with an unoptimized baseline, omitting comparison even with simple methods like using a smaller network. Shared tasks force a different mentality, where each idea has to prove its worth against a highly optimized baseline. This informs our work on fast and small machine translation with latency under 20 ms for an average sentence. The models are now deployed in Firefox.

Chat is not available.