From Similarity to Consequences: Decision-Oriented Evaluation of Market Digest Generation
Abstract
Natural language generation (NLG) is increasingly applied in finance, but its evaluation still relies on reference-based metrics that capture surface similarity rather than practical utility. This gap is critical for retail investors, who depend on short market digests, such as morning briefs and closing-bell reports, that have received little attention in prior research. Motivated by this gap, we define market digest generation as a new NLG task and benchmark it with performance-conditioned and professional-insight baselines. Beyond intrinsic metrics, we introduce a consequence-driven evaluation that measures how digests influence trading decisions made by both human investors and LLM agents. Our results show that LLM-generated morning briefs can improve decision accuracy over human references, while expert-curated asset selection further enhances outcomes, reaffirming the importance of human expertise. These findings underscore the limits of surface-level metrics and establish a decision-oriented framework for evaluating generated text by its real-world consequences.