Poster
Training LLMs to Better Self-Debug and Explain Code
Nan Jiang · Xiaopeng Li · Shiqi Wang · Qiang Zhou · Soneya Hossain · Baishakhi Ray · Varun Kumar · Xiaofei Ma · Anoop Deoras
West Ballroom A-D #7104
In code generation domain, self-debugging that allows LLMs to refine its generated code based on the execution feedback is important, since generating correct solutions in one go is challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose a training framework that significantly improves LLMs' self-debugging capability. We first find that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanation and refinement trajectories and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92\% and pass@10 by 9.30\% over four benchmarks. And RL training brings additional up to 3.54\% improvement on pass@1 and 2.55\% improvement on pass@10. The trained LLMs show iterative refinement ability, and can keep refining code continuously.Lastly, a human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.
Live content is unavailable. Log in and register to view live content