Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems
Abstract
Traditional auto-parallelizing compilers, which depend on rigid heuristics, face challenges with the complexity of modern heterogeneous systems. This paper introduces a detailed evaluation of auto-parallelization driven by small (1B parameter) Language Models (LLMs) for compilers. We assess three models-gemma3, llama3.2, and qwen2.5 employing six reasoning strategies on 11 real-world kernels from scientific computing, graph algorithms, and machine learning. Our system is compared against strong compiler baselines such as LLVM Polly, TVM, and Triton. Across 376 evaluations, our LLM-driven method achieves an average speedup of 6.81x and a maximum performance of 43.25x on convolution operations. We examine scalability, confirm correctness using multiple sanitizers, and validate robustness across various compilers and hardware. Our results show that small, efficient LLMs can act as effective reasoning engines for intricate compiler optimization tasks.