Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference (Spotlight Paper)
Abstract
Spotlight: Ruokai Yin (Yale) Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference
Video
Chat is not available.
Successful Page Load