Novel Finetuning Strategies for Adapting Biomedical Vision Language Models to Organ-Centered Pathology Microscopy Tasks
Siddharth Venkatesh ⋅ Benjamin Liu ⋅ Ayman Sheikh ⋅ Anne Essien ⋅ Pratibh - ⋅ Rayhan Roswendi ⋅ Jeremiah Zhang ⋅ Kevin Zhu ⋅ Sunishchal Dev
Abstract
Biomedical vision-language models (VLMs) struggle with performance deterioration on earlier domains after fine-tuning and limited generalization under domain diversity and dataset imbalance. We propose an adapter-level framework combining Low-Rank Adaptation (LoRA) for efficient domain-specific tuning with model souping for cross-domain adaptability in microscopy images. Using BioMedCLIP and organ-specific domains from $\mu$-Bench, adapter soups mitigate low generalization and improve robustness, achieving gains of up to 15\% on fine-grained and 38\% on coarse-grained tasks over baseline BioMedCLIP. The process is data- and resource-efficient, and hyperparameter analysis reveals sensitivities to domain similarity and dataset imbalance. Adapter merging offers a lightweight scalable approach for organ-specific accuracy and cross-domain stability in biomedical VLMs.
Chat is not available.
Successful Page Load