Using Generative AI to Retrieve and Analyze CEO Compensation Consultant Information from Public Corpora
Abstract
We demonstrate how fine-tuning open-weight large language models can significantly improve the retrieval of information about compensation consultant engagement from company proxy statements, a task previously requiring costly manual collection in finance research and practice. We fine-tuned Gemma 3 12B-Instruct using parameter-efficient adapters (QLoRA) on 400 expert-labeled examples to identify consultant engagements and distinguish between retained advisers versus survey-only providers. Our fine-tuned model achieves 67.7% F1 score compared to 33.3% for the base model, a 103% improvement while reducing false positives by 68%. The approach addresses key challenges of processing non-standardized lengthy text data such as learning domain knowledge and requiring multi-hop reasoning. By trading throughput for fidelity through targeted chunking and conservative decoding, we provide a scalable, reproducible method for converting narrative-heavy filings into structured financial data, laying the foundation for a vertical agent capable of automating finance data-collection tasks.