vHector and HeisenVec: Scalable Vector Graphics Generation Through Large Language Models
Abstract
We introduce HeisenVec, a large-scale dataset designed to advance research in vector graphics generation from natural language descriptions. Unlike conventional image generation datasets that focus on raster images, HeisenVec targets the structured and symbolic domain of Scalable Vector Graphics (SVG), where images are represented as sequences of drawing commands and style attributes. The dataset comprises 2.2 million SVGs collected from different online sources, each paired with four complementary textual descriptions generated by multi-modal models. To ensure structural consistency and efficiency for autoregressive modeling, all SVGs are standardized through a pre-processing pipeline that unifies geometric primitives as paths, applies affine transformations, and compresses syntax via custom tokens set. HeisenVec exhibits broad coverage among visual styles and sequence lengths, with a substantial portion of samples exceeding 8,000 tokens, making it particularly well-suited for benchmarking long-context language models. Our benchmark enables rigorous evaluation of text-conditioned SVG generation, encourages progress on sequence modeling with symbolic outputs, and bridges the gap between vision, graphics, and language. We release the dataset, tokenization tools, and evaluation pipeline to foster further research in this emerging domain.