eFinBERT: Efficient Financial Sentiment Classification
Abstract
Transformer-based models, such as FinBERT, have achieved state-of-the-art performance in financial sentiment analysis, a critical task for understanding market trends and investor sentiment. However, their high computational and memory requirements present significant challenges for deployment in resource-constrained edge environments. In this work, we investigate post-training model compression techniques, specifically layer-wise fixed-bit quantization (ranging from 8-bit to 1-bit) and unstructured magnitude-based pruning, to reduce model size and inference latency while maintaining task performance. Using the Financial PhraseBank dataset, we perform a detailed layer sensitivity analysis to identify quantization bottlenecks and prune-tolerant layers. We introduce a sensitivity radar plot to visualize the impact of bit-width reduction on layer-wise accuracy, providing an interpretable framework for mixed-precision optimization. Furthermore, we demonstrate that selectively applying lower bit-widths to robust layers (e.g., Layers 5 and 7) enables targeted compression with minimal accuracy loss. Our results show that up to 90\% parameter reduction is achievable with less than 2\% absolute accuracy degradation compared to the full-precision model, underscoring the potential for efficient deployment of financial NLP models in low-power environments. This work provides a scalable and effective approach to optimizing transformer models for real-world applications in financial analysis and beyond.