StylePitcher: Generating Style-Following, Expressive Pitch Curves for Versatile Singing Tasks
Abstract
Existing pitch curve generation methods either discard singer-specific expressiveness or require separate models for each singing task, limiting their practical applicability. We propose StylePitcher, a general-purpose pitch curve generation model that captures singing style from reference audio while maintaining musical accuracy. StylePitcher is a rectified flow matching diffusion transformer trained on pitch curves, capable of being conditioned on symbolic musical scores and surrounding pitch context. StylePitcher can be plugged into diverse singing tasks to provide pitch curve without any task-specific retraining. Subjective evaluation on pitch correction, style transfer, and singing voice conversion show StylePitcher achieving 15% higher style similarity and 8 MUSHRA points better audio quality than baselines while preserving pitch accuracy.