Novel Finetuning Strategies for Adapting Biomedical Vision Language Models to Organ-Centered Pathology Microscopy Tasks
Siddharth Venkatesh · Benjamin Liu · Ayman Sheikh · Anne Essien · Pratibh - · Rayhan Roswendi · Jeremiah Zhang · Kevin Zhu · Sunishchal Dev
Abstract
Biomedical vision-language models (VLMs) struggle with performance deterioration on earlier domains after fine-tuning and limited generalization under domain diversity and dataset imbalance. We propose an adapter-level framework combining Low-Rank Adaptation (LoRA) for efficient domain-specific tuning with model souping for cross-domain adaptability in microscopy images. Using BioMedCLIP and organ-specific domains from $\mu$-Bench, adapter soups mitigate low generalization and improve robustness, achieving gains of up to 15\% on fine-grained and 38\% on coarse-grained tasks over baseline BioMedCLIP. The process is data- and resource-efficient, and hyperparameter analysis reveals sensitivities to domain similarity and dataset imbalance. Adapter merging offers a lightweight scalable approach for organ-specific accuracy and cross-domain stability in biomedical VLMs.
Chat is not available.
Successful Page Load