Workshop
Medical Imaging meets NeurIPS
Daniel Moyer · DOU QI · Yuankai Huo · Konstantinos Kamnitsas · Andrea Lara · Xiaoxiao Li · Islem Rekik
Hall B1 (level 1)
“Medical Imaging meets NeurIPS” aims to bring researchers together from the medical imaging and machine learning communities to create a cutting-edge venue for discussing the major challenges in the field and opportunities for research and novel applications. The proposed event will be the continuation of a successful workshop organized for the past 6 years. It will feature a series of invited speakers (all confirmed) from academia, medical sciences, and industry to present their latest work, and to present reviews of recent technological advances and remaining major challenges. This year we aim to have all keynotes presented in person (to facilitate speaker interaction and discourse), an extended number of submitted talks (approximately double from previous years), and an updated call that highlights changes occurring in our interdisciplinary field.
Schedule
Sat 7:00 a.m. - 7:15 a.m.
|
Keynote 1: Yu-Ping Wang
(
Keynote
)
>
SlidesLive Video Actual start time now 8:45 |
🔗 |
Sat 7:00 a.m. - 7:15 a.m.
|
Opening Remarks - Organizing Committee
(
Opening remarks
)
>
SlidesLive Video |
🔗 |
Sat 7:15 a.m. - 7:30 a.m.
|
LC-SD: Realistic Endoscopic Image Generation with Stable Diffusion and ControlNet
(
Invited Talk
)
>
SlidesLive Video |
Joanna Kaleta 🔗 |
Sat 7:30 a.m. - 7:45 a.m.
|
Deep Structural Causal Model for Investigating Causality between Genotype and Clinical Phenotype in Neurological Disorders
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:45 a.m. - 8:00 a.m.
|
On Mitigating Shortcut Learning for Fair Chest X-ray Classification
(
Invited Talk
)
>
SlidesLive Video |
Yuzhe Yang 🔗 |
Sat 8:00 a.m. - 8:40 a.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Sat 8:30 a.m. - 8:45 a.m.
|
equalcare
(
equalcare
)
>
SlidesLive Video |
🔗 |
Sat 8:45 a.m. - 9:15 a.m.
|
Keynote 2: Ehsan Adeli
(
Keynote
)
>
SlidesLive Video |
🔗 |
Sat 9:15 a.m. - 9:30 a.m.
|
Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data
(
Invited Talk
)
>
SlidesLive Video |
Hayden Gunraj 🔗 |
Sat 9:30 a.m. - 9:45 a.m.
|
Dual Heteroscedastic Uncertainty Estimation for Probabilistic Unsupervised Volumetric Registration of Noisy Medical Images
(
Invited Talk
)
>
SlidesLive Video |
Xiaoran Zhang 🔗 |
Sat 9:45 a.m. - 10:00 a.m.
|
HEALNet – Improving Medical Image Analysis using Multi-Omic Context via Hybrid Early Fusion
(
Invited Talk
)
>
SlidesLive Video |
Konstantin Hemker 🔗 |
Sat 11:30 a.m. - 12:00 p.m.
|
Keynote 3: Martin J McKeown
(
Keynote
)
>
SlidesLive Video |
🔗 |
Sat 12:00 p.m. - 12:15 p.m.
|
Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:15 p.m. - 12:30 p.m.
|
Convolve and Conquer: Data Comparison with Wiener Filter
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:30 p.m. - 12:45 p.m.
|
Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:45 p.m. - 1:30 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
Sat 1:30 p.m. - 1:45 p.m.
|
CellMixer: Annotation-free Semantic Cell Segmentation of Heterogeneous Cell Populations
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:45 p.m. - 2:00 p.m.
|
ProtoEEGNet: An Interpretable Approach for Detecting Interictal Epileptiform Discharges
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:00 p.m. - 2:15 p.m.
|
Universal Noise Annotation: Unveiling the Impact of Noisy annotation on Object Detection
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:15 p.m. - 2:45 p.m.
|
Keynote 4: Linwei Wang
(
Keynote
)
>
SlidesLive Video |
🔗 |
Sat 2:45 p.m. - 3:00 p.m.
|
Closing
(
Closing
)
>
SlidesLive Video |
🔗 |
-
|
UTAR: Source-free Unsupervised Test-time Adaptation for MRI Super-Resolution
(
Poster
)
>
Deep learning-based super-resolution (SR) usually does not perform well across domains, e.g., different Magnetic Resonance Imaging datasets, usually due to two reasons: 1) a mismatch in terms of data distributions between the training data and test data, and 2) the overly simple image degradation lacks modelling of the underlying physical processes. Hence, SR currently requires extensive fine tuning for every target domain including access to samples from the source domain. We propose UTAR, a source-free a source-free Unsupervised Test-time domain Adaptation framework for deep learning-based super-resolution for magnetic Resonance imaging. UTAR improves the quality of predicted high-resolution (HR) images from unseen target domain low-resolution (LR) images. We adapted a pre-trained SR model without re-accessing the source domain data. Our method is generic and can be used as a plug-in module in general SR networks. Experimental results verify the effectiveness of UTAR in reducing the performance gap without extensive adaption. We expect our method to provide a key step towards the deployment of MRI SR algorithms in clinical applications where significant domain shifts are inevitable. |
Weitong Zhang · Jonathan Stelter · Cheng Ouyang · Dimitrios Karampinos · Bernhard Kainz 🔗 |
-
|
Unveiling the Interplay Between Interpretability and Generative Performance in Medical Diffusion Models
(
Poster
)
>
Generative diffusion models are showing promising utility in medical imaging, particularly in synthesizing high-quality images like MRI scans and 4D data. However, despite advancements in multi-modal models that leverage both textual and visual information, a significant gap exists in understanding the trade-off between image generation quality and model interpretability. In this paper, we investigate this issue by fine-tuning a Stable Diffusion v2 model with a focus on text-image embeddings. Specifically, we assess the impact of keeping the language encoder frozen during the fine-tuning process. We show that freezing the language encoder significantly improves the interpretability of the generated images without compromising on quality. Through extensive evaluation on MS-COCO for in-domain training and MIMIC-CXR for out-of-domain data, we demonstrate that our approach outperforms existing benchmarks specifically trained for localization in terms of localization capabilities and generative quality across multiple disease classes. This study serves as a foundational step towards the development of high-performing yet interpretable generative models in medical imaging, addressing a critical need for effective and responsible AI adoption in healthcare. |
Mischa Dombrowski · Hadrien Reynaud · Johanna Paula Müller · Matthew Baugh · Bernhard Kainz 🔗 |
-
|
Robust semi-supervised segmentation with timestep ensembling diffusion models
(
Poster
)
>
Our research focuses on Semi-Supervised medical image Segmentation (SSS) using Denoising Diffusion Probabilistic Models latent representations and addressing domain generalisation. We find that optimal performance depends on the choice of diffusion steps and ensembling and craft a method outperforming competitors in domain-shifted settings. |
Margherita Rosnati · Mélanie Roschewitz · Ben Glocker 🔗 |
-
|
Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience
(
Poster
)
>
Machine learning (ML) has shown great promise for revolutionizing a number of areas, including healthcare. However, it is also facing a reproducibility crisis, especially in medicine. ML models that are carefully constructed from and evaluated on a training set might not generalize well on data from different patient populations or acquisition instrument settings and protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. |
Rongguang Wang · Pratik Chaudhari · Christos Davatzikos 🔗 |
-
|
Generative AI for Medical Video De-Identification
(
Poster
)
>
Sharing medical data for research purposes is presenting significant challenges due to patient privacy concerns. Specifically, within the field of Mental and Behavioral Health (MBH), these privacy concerns are particularly pronounced. In this work we focus on video MBH data, containing interviewed patients. Conventional techniques for de-identifying faces in such videos tend to completely eliminate facial information, rendering the development of algorithms reliant on facial analysis unfeasible. To address this issue, we propose the utilization of Generative AI to anonymize MBH videos, eliminating identifying information while retaining essential behavioral characteristics necessary for diagnoses. Our approach involves a two-stage process. We start by synthesizing an alternative first frame containing a face with a new identity, while preserving attributes from the original video such as pose, facial structure and mood. Subsequently, we animate this frame by generating macro and micro-expressions aligned with the fine motion patterns observed in the source video. The first stage employs a conditional latent Diffusion model, while the second stage leverages the First Order Motion Model algorithm. Two applications for our approach are presented: MBH dataset de-identification and synthetic dataset generation. |
George Leifman · Idan Kligvasser · Itay Ravia · Michael Elad · Ehud Rivlin 🔗 |
-
|
Exploring General Intelligence via Gated Graph Transformer in Functional Connectivity Studies
(
Poster
)
>
Functional connectivity (FC) as derived from fMRI has emerged as a pivotal tool in elucidating the intricacies of various psychiatric disorders and in delineating the neural pathways that underpin cognitive and behavioral dynamics inherent to the human brain. While Graph Neural Networks (GNNs) offer a structured approach to represent neuroimaging data, they are limited by their need for a predefined graph structure to depict associations between brain regions, a detail not solely provided by FCs. To bridge this gap, we introduce the Gated Graph Transformer (GGT) framework, designed to predict cognitive metrics based on FCs. Empirical validation on the Philadelphia Neurodevelopmental Cohort (PNC) underscores the superior predictive prowess of our model, further accentuating its potential in identifying pivotal neural connectivities that correlate with human cognitive processes. |
Gang Qu · Anton Orlichenko · Junqi Wang · Gemeng Zhang · Li Xiao · Aiying Zhang · Zhengming Ding · Yu-Ping Wang 🔗 |
-
|
Low Rank Mixup Augmentations for Contrastive Learning of Phenotypes from Functional Connectivity
(
Poster
)
>
Functional magnetic resonance imaging (fMRI) and fMRI-derived metrics such as functional connectivity (FC) allow for unmatched analysis of human cognition in vivo. At the same time, contrastive learning (CL) has shown state of the art results in the computer vision domain as well as in integration of images with genomics. However, many frameworks that utilize CL depend on image augmentations, a technique that is not present in FC. In this work, we present a robust mixup-style augmentation for FC achieved by recognizing that the rank-1 approximation of patient FC derived via eigendecomposition is not effective for predicting phenotype. A mixup of this first component of FC allows for increasing the limited number of subjects found in most fMRI studies. CL using these augmentations yields a 2-10% accuracy improvement on seven phenotype prediction tasks across two datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and the Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). |
Anton Orlichenko · Gang Qu · Ziyu Zhou · Anqi Liu · Hui Shen · Hong-Wen Deng · Zhengming Ding · Yu-Ping Wang 🔗 |
-
|
Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation
(
Poster
)
>
This work presents an extensive hyperparameter search on Image Diffusion Models for Echocardiogram generation. The objective is to establish foundational benchmarks and provide guidelines within the realm of ultrasound image and video generation. This study builds over the latest advancements, including cutting-edge model architectures and training methodologies. We also examine the distribution shift between real and generated samples and consider potential solutions, crucial to train efficient models on generated data. We determine an Optimal FID score of 0.88 for our research problem and achieve an FID of 2.60. This work is aimed at contributing valuable insights and serving as a reference for further developments in the specialized field of ultrasound image and video generation. |
Hadrien Reynaud · Bernhard Kainz 🔗 |
-
|
AUC-mixup: Deep AUC Maximization with Mixup
(
Poster
)
>
While deep AUC maximization (DAM) has shown remarkable success on imbalanced medical tasks, e.g., chest X-rays classification and skin lesions classification, it could suffer from severe overfitting when applied to small datasets due to its aggressive nature of pushing prediction scores of positive data away from that of negative data. This paper studies how to improve generalization of DAM by mixup data augmentation- an approach that is widely used for improving generalization of the cross-entropy loss based deep learning methods. However, AUC is defined over positive and negative pairs, which makes it challenging to incorporate mixup data augmentation into DAM algorithms. To tackle thischallenge, we employ the AUC margin loss and incorporate soft labels into the formulation to effectively learn from data generated by mixup augmentation, which is referred to as the AUC-mixup loss. Our experimental results demonstrate the effectiveness of the proposed AUC-mixup methods on imbalanced benchmark and medical image datasets compared to standard DAM training. |
JIANZHI XU · Gang Li · Tianbao Yang 🔗 |
-
|
Assessing Self-Supervised Pretraining for Multiple Lung Ultrasound Interpretation Tasks
(
Poster
)
>
In this study, we investigated whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple tasks in B-mode lung ultrasound analysis. When fine-tuning for three tasks, pretrained models resulted in an improvement of the average across-task area under the receiver operating curve (AUC) by 0.032 and 0.061 on local and external test sets respectively. When training using 1% of the available labels, pretrained models consistently outperformed fully supervised models, with a maximum observed test AUC increase of 0.396 for the task of view classification. Overall, the results indicate that self-supervised pretraining is useful for producing initial weights for lung ultrasound classifiers. |
Blake VanBerlo · Brian Li · Jesse Hoey · Alexander Wong 🔗 |
-
|
HEALNet – Improving Medical Image Analysis using Multi-Omic Context via Hybrid Early Fusion
(
Poster
)
>
Technological advances in medical data collection such as high-resolution histopathology and high-throughput genomic sequencing have contributed to the possibility to contextualise computer vision models with genomic information in a multi-modal manner. Complementing imaging models with other modalities has shown promise in providing cell-, molecular-, and patient-level context to improve overall predictive performance. However, the context representations for other modalities are often learned with modality-specific encoders, which cannot capture the crucial cross-modal information that motivates the integration of different data sources. This paper presents a Hybrid Early-fusion Attention Learning Network (HEALNet) – a flexible multi-modal fusion architecture that learns both a shared and modality-specific parameter space that a) preserves modality-specific structural information, b) captures the cross-modal interactions and structural information in a shared latent space, c) effectively handles missing modalities during training and inference, and d) enables intuitive model inspection by learning on the raw data input instead of opaque embeddings. We conduct multi-modal survival analysis on Whole Slide Images and Multi-omic data on four cancer cohorts of The Cancer Genome Atlas (TCGA). HEALNet achieves state-of-the-art performance, substantially improving over both image-only and recent multi-modal baselines, whilst being robust in scenarios with missing modalities. |
Konstantin Hemker · Nikola Simidjievski · Mateja Jamnik 🔗 |
-
|
Double-Condensing Attention Condenser: Leveraging Attention in Deep Learning to Detect Skin Cancer from Skin Lesion Images
(
Poster
)
>
Skin cancer is the most common type of cancer in the United States and is estimated to affect one in five Americans. Recent advances have demonstrated strong performance on skin cancer detection, as exemplified by state of the art performance in the SIIM-ISIC Melanoma Classification Challenge; however these solutions leverage ensembles of complex deep neural architectures requiring immense storage and compute costs, and therefore may not be tractable. A recent movement for TinyML applications is integrating Double-Condensing Attention Condensers (DC-AC) into a self-attention neural network backbone architecture to allow for faster and more efficient computation. This paper explores leveraging an efficient self-attention structure to detect skin cancer in skin lesion images and introduces a deep neural network design with DC-AC customized for skin cancer detection from skin lesion images. The final model is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer. |
Chi-en Tai · Elizabeth Janes · Chris Czarnecki · Alexander Wong 🔗 |
-
|
Dual Heteroscedastic Uncertainty Estimation for Probabilistic Unsupervised Volumetric Registration of Noisy Medical Images
(
Poster
)
>
Medical images are often subject to spatially non-uniform image noise due to various factors including the imaging technique, the underlying tissue properties, and the imaging conditions. Despite this intrinsic heterogeneity, previous learning-based unsupervised image registration techniques have primarily operated under the simplified homoscedastic assumption, such as an additive Gaussian noise with constant variance across the image space. This leads to an equally weighted image fidelity loss term, which has the potential to overemphasize image noise and introduce unnatural deformations. To mitigate this, we propose a novel probabilistic unsupervised registration framework that explicitly estimates and leverages \textit{heteroscedastic} image noise in the learning process. We present a collaborative learning strategy, where we jointly train a motion estimator and a variance estimator using separate objectives that include an improved signal-to-noise ratio (SNR)-based weighting strategy. We tested our method across diverse cardiac imaging datasets, including public 2D MRI, public 2D ultrasound, and a private 3D echocardiography dataset. Our method shows improved registration performance. |
Xiaoran Zhang · Daniel Pak · Shawn Ahn · Chenyu You · · Alex Wong · Lawrence Staib · James Duncan 🔗 |
-
|
Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images
(
Poster
)
>
This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations. |
Yehyun Suh · Peter Chan · Ryan Martin · Daniel Moyer 🔗 |
-
|
Spectral Image-Based Diagnosis of Voice Disorders: Leveraging Spectrograms for Non-Invasive Assessment
(
Poster
)
>
This paper proposes a novel speech-based diagnosis method for laryngeal diseases without invasive procedures such as endoscopy. Existing methods have used sustained vowels and mfcc, melspectrogram to develop algorithms. However, these methods have limitations in terms of data availability and the complexity of the data. In this paper, we address these limitations by using connected speech and multi-patched spectrogram model. Connected speech is more complex than sustained vowels, and multi-patched spectrogram model can extract more information from the data. Experimental results show that the proposed method achieves an accuracy of over 91.5%, which is superior to existing methods. Also, we showed Grad-CAM visualization to highlight disease-related image regions, which could help clinicians to better understand the disease and develop more effective treatments. |
Sangjae LEE · Kwangsuk Lee · Hansu Cho · Seungmo Cho · Young Min Park · Seung Jin Lee · Hye Rim Chae 🔗 |
-
|
Synthetic Tumor Manipulation: With Radiomics Features
(
Poster
)
>
We introduce RadiomicsFill, a synthetic tumor generator conditioned on radiomics features, enabling detailed control and individual manipulation of tumor subregions. This conditioning leverages conventional high-dimensional features of the tumor (i.e., radiomics features) and thus is biologically well-grounded. Our model combines generative adversarial networks, radiomics-feature conditioning, and multi-task learning. Through experiments with glioma patients, RadiomicsFill demonstrated its capability to generate diverse, realistic tumors and its fine-tuning ability for specific radiomics features like 'Pixel Surface' and 'Shape Sphericity'. The ability of RadiomicsFill to generate an unlimited number of realistic synthetic tumors offers notable prospects for both advancing medical imaging research and potential clinical applications. |
Inye Na · Hyunjin Park 🔗 |
-
|
Dynamic Neural Fields for Learning Atlases of 4D Fetal MRI Time-series
(
Poster
)
>
We present a method for fast biomedical image atlas construction using neural fields. Atlases are key to biomedical image analysis tasks, yet conventional and deep network estimation methods remain time-intensive. In this preliminary work, we frame subject-specific atlas building as learning a neural field of deformable spatiotemporal observations. We apply our method to learning subject-specific atlases and motion stabilization of dynamic BOLD MRI time-series of fetuses in utero. Our method yields high-quality atlases of fetal BOLD time-series with ∼5-7× faster convergence compared to existing work. While our method slightly underperforms well-tuned baselines in terms of anatomical overlap, it estimates templates significantly faster, thus enabling rapid processing and stabilization of large databases of 4D dynamic MRI acquisitions. Code is available at https://github.com/Kidrauh/neural-atlasing. |
Zeen Chi · Zhongxiao Cong · Clinton Wang · Yingcheng Liu · Esra Abaci Turk · Ellen Grant · Mazdak Abulnaga · Polina Golland · Neel Dey 🔗 |
-
|
RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification
(
Poster
)
>
In this paper we introduce RE-tune, a novel approach for fine-tuning pre-trained Multimodal Biomedical Vision-Language models (VLMs) in Incremental Learning scenarios for multi-label chest disease diagnosis. RE-tune freezes the backbones and only trains simple adaptors on top of the Image and Text encoders of the VLM. By engineering positive and negative text prompts for diseases, we leverage the ability of Large Language Models to steer the training trajectory. We evaluate RE-tune in three realistic incremental learning scenarios: class-incremental, label-incremental, and data-incremental. Our results demonstrate that Biomedical VLMs are natural continual learners and prevent catastrophic forgetting. RE-tune not only achieves accurate multi-label classification results, but also prioritizes patient privacy and it distinguishes itself through exceptional computational efficiency, rendering it highly suitable for broad adoption in real-world healthcare settings. |
Marco Mistretta · Andrew Bagdanov 🔗 |
-
|
Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data
(
Poster
)
>
The recent introduction of synthetic correlated diffusion (CDI^s) imaging has demonstrated significant potential in the realm of clinical decision support for prostate cancer (PCa). CDI^s is a new form of magnetic resonance imaging (MRI) designed to characterize tissue characteristics through the joint correlation of diffusion signal attenuation across different Brownian motion sensitivities. Despite the performance improvement, the CDI^s data for PCa has not been previously made publicly available. In our commitment to advance research efforts for PCa, we introduce Cancer-Net PCa-Data, an open-source benchmark dataset of volumetric CDI^s imaging data of PCa patients. Cancer-Net PCa-Data consists of CDI^s volumetric images from a patient cohort of 200 patient cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). We also analyze the demographic and label region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net PCa-Data is the first-ever public dataset of CDI^s imaging data for PCa, and is a part of the global open-source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer. |
Hayden Gunraj · Chi-en Tai · Alexander Wong 🔗 |
-
|
Self-Supervised Cross-Encoder for Diagnosis of Alzheimer's Disease
(
Poster
)
>
Deep learning has been extensively applied to the diagnosis of Alzheimer’s disease(AD) based on MRI data. However, these methods often require a substantial amount of labeled images, and the resulting feature representations are hard to interpret. To simultaneously address these two issues, we propose a self-supervised cross-encoder framework, which leverages the temporal information among longitudinal MRI scans as the supervision and yields disentangled representations comprising two components. The first component, obtained by adhering to an additional constraint enforced through contrastive learning, captures static brain information, and the second component, capturing the dynamic information, is a low-dimensional vector representation, which can be readily fine-tuned for downstream AD classification tasks. The proposed method demonstrates superior performance in both classification accuracy and interpretability on the ADNI dataset. |
Fangqi Cheng · Xiaochen Yang 🔗 |
-
|
Convolve and Conquer: Data Comparison with Wiener Filter
(
Poster
)
>
link
Quantitative evaluations of differences and/or similarities between data samples define and shape optimisation problems associated with learning data distributions. Current methods to compare data often suffer from limitations in capturing such distributions or lack desirable mathematical properties for optimisation (e.g. smoothness, differentiability, or convexity). In this paper, we introduce a new method to measure (dis)similarities between paired samples inspired by Wiener-filter theory. The convolutional nature of Wiener filters allows us to comprehensively compare data samples in a globally correlated way. We validate our approach in two machine learning applications focused on medical imaging problems: Magnetic Resonance Imaging (MRI) data imputation and non-parametric generative modelling. Our results demonstrate increased resolution in reconstructed images with better perceptual quality and higher data fidelity compared to analogue conventional mean-squared-error implementations. |
Deborah Pelacani Cruz · George Strong · Oscar Bates · Carlos Cueto · Jiashun Yao · Lluis Guasch 🔗 |
-
|
LKA: Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation
(
Poster
)
>
Deep learning models like Convolutional Neural Networks (CNNs) and Transformers have revolutionized image segmentation. While CNNs are computationally efficient due to their parameter-sharing mechanisms, transformers excel in capturing long-range data relationships and global context, at the cost of computational resources because of their self-attention layers. In this study, we evaluate an alternative approach that harnesses the benefits of both: a transformer architecture built exclusively with convolutions. We showcase that our model outperforms leading methods like nnUNet and Swin-UNETR in glioblastoma segmentation and provide evidence for the choice of architecture having control over models' texture bias. |
Liam Chalcroft · Ruben Lourenço Pereira · Mikael Brudfors · Andrew Kayser · Mark D'Esposito · Cathy Price · Ioannis Pappas · John Ashburner 🔗 |
-
|
Ultra-Resolution Cascaded Diffusion Model for Gigapixel Image Synthesis in Histopathology
(
Poster
)
>
Diagnoses from histopathology images rely on information from both high and low resolutions of WSIs. URCDMs allow for the synthesis of high-resolution images that are realistic at all magnifications, focusing not only on fidelity but also on long-distance spatial coherency. Our model beats existing methods, improving the pFID-50k score by 110.63 to 39.52 pFID-50k. Additionally, a human expert evaluation study was performed, reaching a weighted Mean Absolute Error (MAE) of 0.11 for the LRDM and a weighted MAE of 0.22 for the URCDM. |
Sarah Cechnicka · Hadrien Reynaud · James Ball · Naomi Simmonds · Catherine Horsfield · Andrew Smith · Candice Roufosse · Bernhard Kainz 🔗 |
-
|
Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference
(
Poster
)
>
Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiology, we highlight the excess operational complexity in a suboptimally configured ML model from prior work and demonstrate that it can be sculpted away using pruning to meet deployment criteria. Our results show a compression rate of 1148x with minimal loss in quality (~ 4%) and, at higher rates, achieve faster inference on a CPU than the GPU baseline, stressing the need to consider task complexity and architectural details when using off-the-shelf models. With this, we consider avenues for future research in streamlining workflows for clinical researchers to develop models quicker and better suited for real-world use. |
Sudarshan Sreeram · Bernhard Kainz 🔗 |
-
|
A Recall On Thin Structures
(
Poster
)
>
Thin structures like vessels and neurons are crucial for many biomedical processes. Preserving their topology in the context of semantic segmentation, especially ensuring connectedness, is essential. Current image segmentation objectives, like Dice or cross-entropy losses, do not emphasize the correct topology but rather focus on volumetric overlap. This can result in disconnected structures negatively influencing downstream tasks like flow calculation. In this paper, we tackle this shortcoming by proposing a new loss function, specifically tailored towards thin structures, which we call Skeleton Recall Loss. It performs better or on par on four public datasets in comparison to the clDice Loss, a similar state-of-the-art approach for topology preservation, while requiring significantly less compute and memory. |
Yannick Kirchhoff · Maximilian R. Rokuss · Saikat Roy · Balint Kovacs · Constantin Ulrich · Tassilo Wald · Maximilian Zenk · Fabian Isensee · Klaus H. Maier-Hein 🔗 |
-
|
M3-X: Multimodal Generative Model for Screening Mammogram Reading and Explanation
(
Poster
)
>
FDA (The United States Food and Drug Administration) approved multiple automated mammogram image reading models, but most of the models lack interpretability. Efforts have been made to interpret the model's decision through saliency maps or GradCAMs~\cite{selvaraju2017grad} that highlight the model’s attention on specific areas with the image. While technically sounds, these interpretability maps may not be well perceived by radiologists due to ambiguity and uncertainty of the findings. As such, we hypothesize that in addition to deriving the diagnosis, a text-based semantic explanation of a model’s attention (similar to findings documented in radiology reports) may be more readily understandable by humans and therefore may serve as a better trust-able component of an AI model. Therefore, the purpose of our study was to develop a transformer-based multi-modal generative model for the automatic interpretation of screening mammogram studies and the generation of text-based reasoning. Experimental results from our tests using the X-Institution\footnote{For anonymity concern, we will use X-Institution to not reveal the identity.} mammogram screening dataset demonstrate that our model significantly outperforms the baselines in both accuracy and the quality of explanations. |
Man Luo · Amara Tariq · Bhavik Patel · Imon Banerjee 🔗 |
-
|
Towards Generalist Models for Multimodal Clinical Diagnostics
(
Poster
)
>
We introduce MMCaD, the first multimodal dataset for general clinical diagnostics, consisting of nearly 60k real-world cases and one thousand health problems. Alongside MMCaD, we present GeMini, a multimodal transformer designed for clinical diagnostics. GeMini decouples the decision-making process into modality-specific encoding and modality-agnostic decoding, optimizing both stages jointly. Experimental results demonstrate that GeMini outperforms existing counterparts in digital medicine and computer vision, sometimes by up to 6%. Moreover, GeMini does not need pre-trained weights for decoding, allowing a more flexible architecture design. |
Yunxiang Fu · Hong-Yu Zhou · Yizhou Yu 🔗 |
-
|
Rethinking Knee Osteoarthritis Severity Grading: A Few Shot Self-Supervised Contrastive Learning Approach
(
Poster
)
>
Knee Osteoarthritis (OA) is a debilitating disease affecting over 250 million people worldwide. Currently, radiologists grade the severity of OA on an ordinal scale from zero to four using the Kellgren-Lawrence (KL) system. Recent studies have raised concern in relation to the subjectivity of the KL grading system, highlighting the requirement for an automated system, while also indicating that five ordinal classes may not be the most appropriate approach for assessing OA severity. This work presents preliminary results of an automated system with a continuous grading scale. This system, namely SS-FewSOME, uses self-supervised pre-training to learn robust representations of the features of healthy knee X-rays. It then assesses the OA severity by the X-rays' distance to the normal representation space. SS-FewSOME initially trains on only 'few' examples of healthy knee X-rays, thus reducing the barriers to clinical implementation by eliminating the need for large training sets and costly expert annotations that existing automated systems require. The work reports promising initial results, obtaining a positive Spearman Rank Correlation Coefficient of 0.43, having had access to only 30 ground truth labels at training time. |
Niamh Belton · Misgina Tsighe Hagos · Aonghus Lawlor · Kathleen Curran 🔗 |
-
|
Universal Noise Annotation: Unveiling the Impact of Noisy annotation on Object Detection
(
Poster
)
>
In medical image analysis, misclassifications such as false negatives can have grave implications, emphasizing the critical need to mitigate noise within datasets that contribute to these errors. While deep neural networks thrive on voluminous data, the prohibitive cost of curating noise-free datasets in medical image analysis poses challenges in assembling high-quality, large-scale data.Particularly in object detection, the landscape of potential noise is multifaceted, encompassing not only categorization noise but also issues like localization noise, missing annotations, and bogus bounding boxes. Despite this complexity, much of the existing literature has been limited in scope, often addressing only specific types of noise, such as localization or categorization.In response, this paper introduces the Universal-Noise Annotation (UNA), a holistic framework designed to capture the full spectrum of noise types inherent in object detection. We investigate the influence of UNA on detector performance, explore the evolution of past detection algorithms, and pinpoint factors that enhance the robustness of detection model training approaches.For the broader research community's benefit, we have open-sourced our code for integrating UNA into datasets, and we also provide complete access to our training logs and weights. |
Kwangrok Ryoo · Yeonsik Jo · Seungjun Lee · Mira Kim · Ji Ye Kim · Ahra Jo · Seung Hwan Kim · Seungryong Kim · Soonyoung Lee 🔗 |
-
|
On the notion of Hallucinations from the lens of Bias and Validity in Synthetic CXR Images
(
Poster
)
>
Medical imaging has revolutionized disease diagnosis, yet the potential is hampered by limited access to diverse and privacy-conscious datasets. Open-source medical datasets, while valuable, suffer from data quality and clinical information disparities. Generative models, such as diffusion models, aim to mitigate these challenges. At Stanford, researchers explored the utility of a fine-tuned Stable Diffusion model (RoentGen) for medical imaging data augmentation.Our work examines specific considerations to expand the Stanford research question, “Could Stable Diffusion Solve a Gap in Medical Imaging Data?” from the lens of bias and validity of the generated outcomes. We leveraged RoentGen to produce synthetic Chest-XRay (CXR) images and conducted assessments on bias, validity, and hallucinations. Diagnostic accuracy was evaluated by a disease classifier, while a COVID classifier uncovered latent hallucinations. The bias analysis unveiled disparities in classification performance among various subgroups, with a pronounced impact on the Female Hispanic subgroup. Furthermore, incorporating race and gender into input prompts exacerbated fairness issues in the generated images. The quality of synthetic images exhibited variability, particularly in certain disease classes, where there was more significant uncertainty compared to the original images. Additionally, we observed latent hallucinations, with approximately 42\% of the images incorrectly indicating COVID, hinting at the presence of hallucinatory elements. These identifications provide new research directions towards interpretability of synthetic CXR images, for further understanding of associated risks and patient safety in medical applications. |
Gauri Bhardwaj · Yuvaraj Govindarajulu · Sundaraparipurnan Narayanan · Manojkumar Parmar 🔗 |
-
|
Temporal Fine-tuning of Medical Vision-Language Representation
(
Poster
)
>
Despite the abundant data sources available in biomedical applications, existing machine learning models fail to effectively harness these resources for patient diagnosis. This work focuses on visual and textual data formats, which are often used to pre-train multimodal representations; however, the final diagnosis is based solely on the fine-tuned image encoder. To address this constraint, we introduce a novel framework designed to leverage temporal information obtained from previous medical image examinations and their associated reports during fine-tuning. Our evaluation, conducted on the MIMIC dataset with newly proposed temporal data generation process, demonstrates an average improvement of up to 3.89% compared to using only image data for diagnosis. |
Haoxu Huang · Kyunghyun Cho · Sumit Chopra · Divyam Madaan 🔗 |
-
|
ProsDectNet: Bridging the Gap in Prostate Cancer Detection via Transrectal B-mode Ultrasound Imaging
(
Poster
)
>
Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancers, highlighting the need for improved targeting. To address this issue, we propose ProsDectNet, a multi-task deep learning approach that localizes prostate cancer on B-mode ultrasound. Our model is pre-trained using radiologist-labeled data and fine-tuned using biopsy-confirmed labels. ProsDectNet includes a lesion detection and patch classification head, with uncertainty minimization using entropy to improve model performance and reduce false positive predictions. We trained and validated ProsDectNet using a cohort of 289 patients who underwent MRI-TRUS fusion targeted biopsy. We then tested our approach on a group of 41 patients and found that ProsDectNet outperformed the average expert clinician in detecting prostate cancer on B-mode ultrasound images, achieving a patient-level ROC-AUC of 82%, a sensitivity of 74%, and a specificity of 67%. Our results demonstrate that ProsDectNet has the potential to be used as a computer-aided diagnosis system to improve targeted biopsy and treatment planning. |
Sulaiman Vesal · Indrani Bhattacharya · Hassan Jahanandish · Cynthia Li · Moonhyung Choi · Steve Ran Zhou · Zachary Kornberg · Elijah Richard Sommer · Richard Fan · Geoffrey Sonn · Mirabela Rusu
|
-
|
Mapping and Diagnosing Augmented Whole Slide Image Datasets with Training Dynamics
(
Poster
)
>
Pediatric heart transplantation represents the standard of care for children confronting end-stage heart failure. One of the most common postoperative complications, heart transplant rejection, has been monitored via surveillance endomyocardial biopsies and manual assessment by cardiac pathology experts. However, manual annotations with interobserver and intraobserver variability among cardiovascular pathology experts lead to significant disagreements about the severity of rejection. Artificial intelligence (AI)-enabled computational pathology usually requires large-scale manual annotations of gigapixel whole-slide images (WSIs) for effective model training. To address these challenges, we develop an AI-enabled rare disease detection framework for automating heart transplant rejection detection from WSIs of pediatric patients. Specifically, we conduct dataset cartography with data maps and training dynamics to map and diagnose the augmented samples, exploring the model behavior on individual instances during model training. Extensive experiments on internal and external patient cohorts have demonstrated the feasibility of both tile-level and biopsy-level detection with augmented samples. The proposed data-efficient learning framework may support seamless scalability to real-world rare disease detection without the burden of iterative expert annotations. |
Wenqi Shi · Benoit Marteau · Felipe Giuste · May Dongmei Wang 🔗 |
-
|
On Mitigating Shortcut Learning for Fair Chest X-ray Classification
(
Poster
)
>
As machine learning models reach human level performance on many real-world medical imaging tasks, it is crucial to consider the mechanisms they may be using to make such predictions. Prior work has demonstrated the surprising ability of deep learning models to recover demographic information from chest X-rays. This suggests that disease classification models could potentially be utilizing these demographics as shortcuts, leading to prior observed performance gaps between demographic groups. In this work, we start by investigating whether chest X-ray models indeed use demographic information as shortcuts when classifying four different diseases. Next, we apply five existing methods for tackling spurious correlations, and examine performance and fairness both for the original dataset and five external hospitals. Our results indicate that shortcut learning can be corrected to remedy in-distribution fairness gaps, though this reduction often does not transfer under domain shift. We also find trade-offs between fairness and other important metrics, raising the question of whether it is beneficial to remove such shortcuts in the first place. |
Yuzhe Yang · Haoran Zhang · Dina Katabi · Marzyeh Ghassemi 🔗 |
-
|
Mitigating Spurious Correlations for Medical Image Classification via Natural Language Concepts
(
Poster
)
>
Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). In this work, we propose a new paradigm to build robust medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a pre-trained vision-language model. We systematically evaluate our method on public medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. |
An Yan · Yu Wang · Petros Karypis · Zexue He · Amilcare Gentili · Chun-Nan Hsu · Julian McAuley 🔗 |
-
|
MRI Reconstruction with Fourier-Constrained Diffusion Bridges
(
Poster
)
>
Diffusion-based image priors have gained recent traction in MRI reconstruction. Common diffusion priors use a multi-step transformation to map Gaussian noise onto fully-sampled MRI data. However, this transformation diverges from the desired reconstruction transformation from undersampled to fully-sampled data, yielding suboptimal results. To overcome this limitation, we introduce Fourier-constrained diffusion bridges (FDB; https://github.com/icon-lab/FDB) for accelerated MRI reconstruction. FDB learns a multi-step transformation from undersampled to fully-sampled data guided by two degradation operators: random noise addition and random frequency removal. Unlike common diffusion priors that use an asymptotic endpoint (e.g., Gaussian noise), FDB performs a finite transformation with an endpoint based on moderately degraded data. Unlike common diffusion bridges that assume learnable forward and backward processes, FDB improves learning by injecting a task-relevant Fourier-domain constraint via its frequency removal operator. Demonstrations on brain MRI show that FDB outperforms state-of-the-art reconstruction methods including previous diffusion priors. |
Muhammad Usama Mirza · Tolga Cukur 🔗 |
-
|
Automated Neuroimaging Pipeline to Identify Structural Biomarkers using Deep Learning Segmentation Applied to Adolescent Mental Disorders
(
Poster
)
>
Mental disorders are a severe public health concern still without clear biological underpinnings. Magnetic resonance imaging has emerged as a tool for interrogating biological differences in disorders, leading to structural changes as potential biomarkers for diagnostics and mechanistic understanding. To date, there are few reliable and consistently reported findings due to a need for studies with large sample sizes. Imaging analysis has previously been manually intensive, limiting the scope of such studies. Here we present an automated neuroimaging pipeline for the identification of structural volume differences between disordered and control populations. As a proof of concept, it is applied to various mental disorders screened for in the Adolescent Brain Cognitive Development study. |
Margot Wagner · Brandon Liu · Alessandra Camassa · Gert Cauwenberghs · Terrence Sejnowski 🔗 |
-
|
SITReg: Multi-resolution architecture for symmetric, inverse consistent, and topology preserving image registration
(
Poster
)
>
Deep learning has emerged as a strong alternative for classical iterative methods for deformable medical image registration, where the goal is to find a mapping between the coordinate systems of two images. Popular classical image registration methods enforce the useful inductive biases of symmetricity, inverse consistency, and topology preservation by construct. However, while many deep learning registration methods encourage these properties via loss functions, none of the methods enforces all of them by construct. Here, we propose a novel registration architecture based on extracting multi-resolution feature representations which is by construct symmetric, inverse consistent, and topology preserving. We also develop an implicit layer for memory efficient inversion of the deformation fields. Our method achieves state-of-the-art registration accuracy on two datasets. |
Joel Honkamaa · Pekka Marttinen 🔗 |
-
|
Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images
(
Poster
)
>
Vision Transformers (ViTs) have ushered in a new era in computer vision, showcasing unparalleled performance in many challenging tasks. However, their practical deployment in computational pathology has largely been constrained by the sheer size of whole slide images (WSIs), which result in lengthy input sequences. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. Given the analogous challenge with WSIs and their inherent hierarchical structure, Hierarchical Vision Transformers (H-ViTs) emerge as a promising solution in computational pathology. This work delves into the capabilities of H-ViTs, evaluating their efficiency for prostate cancer grading in WSIs. Our results show that they achieve competitive performance against existing state-of-the-art solutions. |
Clément Grisi · Geert Litjens · Jeroen van der Laak 🔗 |
-
|
A multi-modal image pipeline for automated generation of large, labeled H&E image data-sets.
(
Poster
)
>
We leverage paired multiplex immunofluorescence (mpIF) imaging to identify cell types in hematoxylin and eosin (H$\&$E) stained images. By synergizing the strengths of these two imaging modalities, our pipeline enables accurate cell-type annotation in H$\&$E images. This breakthrough allows for the creation of a large, annotated H$\&$E dataset, significantly increasing the scalability of training data generation for machine learning models. This expansion of the dataset is especially crucial for training highly effective deep learning models, as it provides a wealth of more diverse, and representative samples, leading to improved performance and generalization. The pipeline’s ability to generate such a large, annotated dataset offers a valuable resource for detailed analysis and characterization of cell populations, facilitating advanced machine learning applications in pathology and biomedical imaging.
|
Matthew Lee · Victoria Fang · Rami Vanguri · Abigail Zellmer · Amy Baxter · Dokyoon Kim · Derek Oldridge · John Wherry 🔗 |
-
|
Thinking Outside the Box: Orthogonal Approach to Equalizing Protected Attributes
(
Poster
)
>
There is growing concern that the potential of black box AI may exacerbate health-related disparities and biases such as gender and ethnicity in clinical decision-making. Biased decisions can arise from data availability and collection processes, as well as from the underlying confounding effects of the protected attributes themselves. This work proposes a machine learning-based orthogonal approach aiming to analyze and suppress the effect of the confounder through discriminant dimensionality reduction and orthogonalization of the protected attributes against the primary attribute information. By doing so, the impact of the protected attributes on disease diagnosis can be realized, undesirable feature correlations can be mitigated, and the model prediction performance can be enhanced. |
Jiahui Liu · Xiaohao Cai · Mahesan Niranjan 🔗 |
-
|
Assessment of Explainable AI Approaches in the Context of Digital Histopathology
(
Poster
)
>
Increases in the accuracy and accessibility of machine learning, and particularly deep learning models, have driven their recent extensive use in a variety of fields, including digital histopathology. However, the impressive performance of these models for clinical tasks is counterbalanced by a lack of transparency – a significant obstacle for their usefulness in clinical contexts. Explainable AI (XAI) methods try to address this issue by providing explanations for model decisions. In image analysis, saliency maps are among the most popular techniques used. To date, no comprehensive study has been done of XAI methods evaluating different aspects (e.g., model fidelity, localization ability, and stability) utilizing different architectures. Understanding the relative efficacy of XAI methods via evaluation metrics, as well as the strengths and shortcomings of existing XAI approaches, will bolster translational work in this important field. |
Alexander Claman · Alicia Bilbao Martinez · Nicolas Echevarrieta-Catalan · Daniel Bilbao-Cortes · Vanessa Aguiar-Pulido 🔗 |
-
|
Performance-based Wisdom of the Crowd Algorithms for Medical Image Dataset Labeling
(
Poster
)
>
A crucial bottleneck in medical artificial intelligence is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We test 168 wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and considering the task environment. These algorithms also far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI. |
Eeshan Hasan · Erik Duhaime · Jennifer Trueblood 🔗 |
-
|
Enhancing Instance-Level Image Classification with Set-Level Labels
(
Poster
)
>
Instance-level image classification tasks have traditionally relied on single-instance labels to train models, e.g., few-shot learning and transfer learning. However, set-level coarse-grained labels that capture relationships among instances can provide richer information in real-world scenarios. In this paper, we present a novel approach to enhance instance-level image classification by leveraging set-level labels. We provide a theoretical analysis of the proposed method, including recognition conditions for fast excess risk rate, shedding light on the theoretical foundations of our approach. We conducted experiments on two distinct categories of datasets: natural image datasets and histopathology image datasets. Our experimental results demonstrate the effectiveness of our approach, showcasing improved classification performance compared to traditional single-instance label-based methods. Importantly, our experimental findings align with the theoretical analysis, reinforcing the robustness and reliability of our proposed method. This work bridges the gap between instance-level and set-level image classification, offering a promising avenue for advancing the capabilities of image classification models with set-level coarse-grained labels. |
Renyu Zhang · Aly Khan · Yuxin Chen · Robert Grossman 🔗 |
-
|
CellMixer: Annotation-free Semantic Cell Segmentation of Heterogeneous Cell Populations
(
Poster
)
>
In recent years, several unsupervised cell segmentation methods have been presented, trying to omit the requirement of laborious pixel-level annotations for the training of a cell segmentation model. Most if not all of these methods handle the instance segmentation task by focusing on the detection of different cell instances ignoring their type. While such models prove adequate for certain tasks, like cell counting, other applications require the identification of each cell's type. In this paper, we present CellMixer, an innovative annotation-free approach for the semantic segmentation of heterogeneous cell populations. Our augmentation-based method enables the training of a segmentation model from image-level labels of homogeneous cell populations. Our results show that CellMixer can achieve competitive segmentation performance across multiple cell types and imaging modalities, demonstrating the method's scalability and potential for broader applications in medical imaging, cellular biology, and diagnostics. |
Mehdi Naouar · Gabriel Kalweit · Anusha Klett · Yannick Vogt · paula silvestrini · Diana Laura Infante Ramirez · Roland Mertelsmann · Joschka Boedecker · Maria Kalweit 🔗 |
-
|
ProtoEEGNet: An Interpretable Approach for Detecting Interictal Epileptiform Discharges
(
Poster
)
>
In electroencephalogram (EEG) recordings, the presence of interictal epileptiform discharges (IEDs) serves as a critical biomarker for seizures or seizure-like events. Detecting IEDs can be difficult; even highly trained experts disagree on the same sample. As a result, specialists have turned to machine-learning models for assistance. However, many existing models are black boxes and do not provide any human-interpretable reasoning for their decisions. In high-stakes medical applications, it is critical to have interpretable models so that experts can validate the reasoning of the model before making important diagnoses. We introduce ProtoEEGNet, a model that achieves state-of-the-art accuracy for IED detection while additionally providing an interpretable justification for its classifications. Specifically, it can reason that one EEG looks similar to another "prototypical" EEG that is known to contain an IED. ProtoEEGNet can therefore help medical professionals effectively detect IEDs while maintaining a transparent decision-making process. |
Dennis Tang · Frank Willard · Ronan Tegerdine · Luke Triplett · Jon Donnelly · Luke Moffett · Lesia Semenova · Alina Barnett · Jin Jing · Cynthia Rudin · Brandon Westover
|
-
|
MoCo-Transfer: Investigating out-of-distribution contrastive learning for limited-data domains
(
Poster
)
>
Medical imaging data is often siloed within hospitals, limiting the amount of data available for specialized model development. With limited in-domain data, one might hope to leverage larger datasets from related domains. In this paper, we analyze the benefit of transferring self-supervised contrastive representations from moment contrast (MoCo) pretraining on out-of-distribution data to settings with limited data. We consider two X-ray datasets which image different parts of the body, and compare transferring from each other to transferring from ImageNet. We find that depending on quantity of labeled and unlabeled data, contrastive pretraining on larger out-of-distribution datasets can perform nearly as well or better than MoCo pretraining in-domain, and pretraining on related domains leads to higher performance than if one were to use the ImageNet pretrained weights. Finally, we provide a preliminary way of quantifying similarity between datasets. |
Yuwen Chen · Helen Zhou · Zachary Lipton 🔗 |
-
|
Improved Decoding of Audio-Evoked fMRI Sequences with Sequential Transfer Learning
(
Poster
)
>
We present a sequential transfer learning framework for transformers on functional Magnetic Resonance Imaging (fMRI) data and demonstrate its significant benefits for decoding instrumental timbre. In the first of two phases, we pretrain our stacked-encoder transformer architecture on Next Thought Prediction, a self-supervised task of predicting whether or not one sequence of fMRI data follows another. This phase imparts a general understanding of the temporal and spatial dynamics of neural activity, and can be applied to any fMRI dataset. In the second phase, we finetune the pretrained models and train additional randomly initialized models on the supervised task of predicting whether or not two sequences of fMRI data were obtained while listening to the same musical timbre. The finetuned models achieve significantly higher accuracy on heldout participants than the randomly initialized models, demonstrating the efficacy of our framework for facilitating transfer learning on fMRI data. This work contributes to the growing literature on transformer architectures for sequential transfer learning on fMRI data. |
Sean Paulsen · Mike Casey 🔗 |
-
|
Patient-adaptive and Learned MRI Data Undersampling Using Neighborhood Clustering
(
Poster
)
>
There has been much recent interest in adapting undersampling trajectories in MRI based on training data. In this work, we propose a novel patient-adaptive MRI sampling algorithm based on grouping scans within a training set. Scan-adaptive sampling patterns are optimized together with an image reconstruction network for the training scans. The training optimization alternates between determining the best sampling pattern for each scan (based on a greedy search or iterative coordinate descent (ICD)) and training a reconstructor across the dataset. The eventual scan-adaptive sampling patterns on the training set are used as labels to predict sampling design using nearest neighbor search at test time. The proposed algorithm is applied to the fastMRI knee multicoil dataset and demonstrates improved performance over several baselines. |
Siddhant Gautam · Angqi Li · Saiprasad Ravishankar 🔗 |
-
|
SAM vs BET: A Comparative Study for Brain Extraction and Segmentation of Magnetic Resonance Images using Deep Learning
(
Poster
)
>
Brain extraction, a critical preprocessing step in neuroimaging studies, enables the automatic segmentation of brain vs non-brain compartments as well as relevant within-brain tissue compartments and structures. While FSL’s Brain Extraction Tool (BET) is recognized as the gold standard, it often struggles with inaccuracies, especially in brains with outer lesions or compromised image quality. We present an alternative based on Meta AI's Segment Anything Model (SAM), renowned for its zero-shot segmentation capabilities. Our comparative analysis across diverse magnetic resonance imaging (MRI) sequences reveals SAM's superiority over BET, particularly in challenging imaging scenarios. Our study not only underscores SAM's potential for general brain extraction, but also its versatility in segmenting specific intra-brain regions of interest. |
Sovesh Mohapatra · Advait Gosai · Gottfried Schlaug 🔗 |
-
|
Class-Incremental Continual Learning for General Purpose Healthcare Models
(
Poster
)
>
Healthcare clinics regularly encounter dynamic data that changes due to variations in patient populations, treatment policies, medical devices, and emerging disease patterns. Deep learning models can suffer from catastrophic forgetting when fine-tuned in such scenarios, causing poor performance on previously learned tasks. Continual learning allows learning on new tasks without performance drop on previous tasks. In this work, we investigate the performance of continual learning models on four different medical imaging scenarios involving ten classification datasets from diverse modalities, clinical specialties, and hospitals. We implement various continual learning approaches and evaluate their performance in these scenarios. Our results demonstrate that a single model can sequentially learn new tasks from different specialties and achieve comparable performance to naive methods. These findings indicate the feasibility of recycling or sharing models across the same or different medical specialties, offering another step towards the development of general-purpose medical imaging AI that can be shared across institutions. |
Amritpal Singh · Mustafa Gurbuz · Prahlad Jasti · Shiva Souhith Gantha 🔗 |
-
|
Uncovering the latent dynamics of whole-brain fMRI tasks with a sequential variational autoencoder
(
Poster
)
>
The neural dynamics underlying brain activity are critical to understanding cognitive processes and mental disorders. However, current voxel-based whole-brain dimensionality reduction techniques fall short of capturing these dynamics, producing latent timeseries that inadequately relate to behavioral tasks. To address this issue, we introduce a novel approach to learning low-dimensional approximations of neural dynamics by using a sequential variational autoencoder (SVAE) that learns the latent dynamical system. Importantly, our method finds smooth dynamics that can predict cognitive processes with accuracy higher than classical methods, with improved spatial localization to task-relevant brain regions. |
Eloy Geenjaar · Donghyun Kim · Riyasat Ohib · Marlena Duda · Amrit Kashyap · Sergey Plis · Vince Calhoun 🔗 |
-
|
Decentralized Sparse Federated Learning for Efficient Training on Distributed NeuroImaging Data
(
Poster
)
>
Neuroimaging advancements have increased data sharing among researchers. Yet, institutions often retain data control due to research culture, privacy, and accountability. There is therefore a need for tools that analyze combined datasets without transmitting the actual data. We introduce a decentralized sparse federated learning (FL) approach that locally trains sparse models for efficient communication in such settings. By leveraging sparsity and transmitting only some parameters among client sites throughout the training, we reduce communication costs, especially with larger models and varied site-specific resources. We validate our method using the ABCD data. |
Bishal Thapaliya · Riyasat Ohib · Eloy Geenjaar · Jingyu Liu · Vince Calhoun · Sergey Plis 🔗 |
-
|
Multi-task Learning for Optical Coherence Tomography Angiography (OCTA) Vessel Segmentation
(
Poster
)
>
link
Optical Coherence Tomography Angiography (OCTA) is a non-invasive imaging technique that provides high-resolution cross-sectional images of the retina, which are useful for diagnosing and monitoring various retinal diseases. However, manual segmentation of OCTA images is a time-consuming and labor-intensive task, which motivates the development of automated segmentation methods. In this paper, we propose a novel multi-task learning method for OCTA segmentation, called OCTA-MTL, that leverages an image-to-DT (Distance Transform) branch and an adaptive loss combination strategy. The image-to-DT branch predicts the distance from each vessel voxel to the vessel surface, which can provide useful shape prior and boundary information for the segmentation task. The adaptive loss combination strategy dynamically adjusts the loss weights according to the inverse of the average loss values of each task, to balance the learning process and avoid the dominance of one task over the other. We evaluate our method on the ROSE-2 dataset its superiority in terms of segmentation performance against two baseline methods: a single-task segmentation method and a multi-task segmentation method with a fixed loss combination. |
Can Koz · Onat Dalmaz · Mertay Dayanc 🔗 |
-
|
Face-GPS: A Comprehensive Technique for Quantifying Facial Muscle Dynamics in Videos
(
Poster
)
>
We introduce a novel method that combines differential geometry, kernels smoothing, and spectral analysis to quantify facial muscle activity from widely accessible video recordings, such as those captured on personal smartphones. Our approach emphasizes practicality and accessibility. It has significant potential for applications in national security and plastic surgery. Additionally, it offers remote diagnosis and monitoring for medical conditions such as stroke, Bell's palsy, and acoustic neuroma. Moreover, it is adept at detecting and classifying emotions, from the overt to the subtle. The proposed face muscle analysis technique is an explainable alternative to deep learning methods and a non-invasive substitute to facial electromyography (fEMG). |
Juni Kim · Zhikang Dong · Pawel Polak 🔗 |
-
|
Calibrating Where It Matters: Constrained Temperature Scaling
(
Poster
)
>
We consider calibration of convolutional classifiers for diagnostic decision making. Clinical decision makers can use calibrated classifiers to minimise expected costs given their own cost function. Such functions are usually unknown at training time. If minimising expected costs is the primary aim, algorithms should focus on tuning calibration in regions of probability simplex likely to effect decisions. We give an example, modifying temperature scaling calibration, and demonstrate improved calibration where it matters using convnets trained to classify dermoscopy images. |
Stephen J. McKenna · Jacob Carse 🔗 |
-
|
Deep Structural Causal Model for Investigating Causality between Genotype and Clinical Phenotype in Neurological Disorders
(
Poster
)
>
Understanding the causal relationship between genotype and clinical phenotype is crucial for disease treatment and prognosis. Despite the existing literature on exploring associations of genetics with clinical phenotypes such as imaging patterns and survival in various diseases, there are few to none work address the causation of these correlated genotypes. This paper leverages recent advances in causal deep learning to formulate the phenotypical outcome given the change in genotype as a causal inference problem. We build upon structural causal model (SCM) with normalizing flows parameterized by deep networks to perform the counterfactual query to investigate the causal relationship between genotype and clinical phenotype in two types of neurological disorders. Specifically, we focus on the causal effect of (1) APOE4 allele on brain volumetric measures in Alzheimer's disease; (2) key driver gene mutations on overall survival (OS) in glioblastoma. Experimental results show that APOE4 noncarriers causally lead to greater gray matter atrophy in the frontal lobe, and survival-correlated genes do not exhibit causal effect on OS in glioblastoma. |
Fanyang Yu · Rongguang Wang · Pratik Chaudhari · Christos Davatzikos 🔗 |
-
|
MIMIC-NLE-v2: Can Large Language Models Reason about Chest X-rays?
(
Poster
)
>
Diagnosing medical images requires reasoning: Radiologists usually identify different findings on a scan and then integrate them to form an overall diagnosis in light of the patient's condition. At the same time, large language models (LLMs) have demonstrated remarkable language reasoning skills, and these capabilities are currently being adapted to vision and vision-language problems. In this work, we investigate whether vision-enabled LLMs are capable of reasoning about patient context and radiographic observations to arrive at a diagnosis. In order to achieve this, we first propose MIMIC-NLE-v2, a new chest X-ray dataset for Natural Language Explanations. Next, we compare different methods for training models of up to 30 billion parameters to reason about chest X-rays. We then show how their reasoning capabilities can lead to improvements in other image analysis tasks. |
Maxime Kayser · Oana-Maria Camburu · Thomas Lukasiewicz 🔗 |
-
|
LC-SD: Realistic Endoscopic Image Generation with Stable Diffusion and ControlNet
(
Poster
)
>
Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can overcome these limitations, but it is necessary to reduce the domain gap between real and synthetic data. We propose a method for image-to-image translation based on a Stable Diffusion model, which generates realistic images from synthetic data. Compared to previous works, the proposed method is better suited for clinical application, requiring a much smaller amount of input data and allowing finer control over the generation of details by introducing different variants of supporting control networks. The proposed method is applied in the context of laparoscopic cholecystectomy, using synthetic and real public data. It achieves a mean Intersection over Union of 69.76\%, significantly improving the baseline results (69.76 vs. 42.21\%). The proposed method for translating synthetic images into images with realistic characteristics will enable the training of deep learning methods that can generalize optimally to real-world contexts, thereby improving computer-assisted intervention guidance systems. |
Joanna Kaleta · Diego Dall'Alba · Szymon Płotka · Przemyslaw Korzeniowski 🔗 |
-
|
Zero-Shot Image Registration through Feature Extraction (ZSIR - FE): Medical Image Registration using Pre-Trained Neural Networks
(
Poster
)
>
We introduce a novel image registration framework, termed Zero-Shot Image Registration through Feature Extraction (ZSIR-FE), employing a pre-trained deep neural network for feature extraction. This framework is termed a zero-shot learning approach due to the non-overlapping nature of the training and testing datasets, coupled with the fact that the network modules within the overall network architecture are not trained for image registration. This approach eliminates the need for any training data specific to image registration, as it autonomously estimates the locations of significant features, which we herein termed as key points. Although the model provides the provision to fine-tune the key points as a hyperparameter, in our implementation, it remains fixed. This novel pipeline has been tested on the BraTS dataset, showcasing an enhancement in performance metrics, notably the Dice score, particularly for affine transformations. Moreover, this method yields instantaneous results for registration, irrespective of the input image size. The innovative framework of ZSIR-FE fosters a unified registration model, adept at addressing diverse medical imaging tasks and scenarios across varying domains. |
Abjasree S · Avinash Kori · ganapathy krishnamurthi 🔗 |
-
|
Semi-Supervised Diffusion Model for Brain Age Prediction
(
Poster
)
>
Brain age prediction models have succeeded in predicting clinical outcomes in neurodegenerative diseases, but can struggle with tasks involving faster progressing diseases and low quality data. To enhance their performance, we employ a semi-supervised diffusion model, obtaining a 0.83(p<0.01) correlation between chronological and predicted age on low quality T1w MR images. This was competitive with state-of-the-art non-generative methods. Furthermore, the predictions produced by our model were significantly associated with survival length in Amyotrophic Lateral Sclerosis. Thus, our approach demonstrates the value of diffusion-based architectures for the task of brain age prediction. |
Ayodeji Ijishakin · Sophie Martin · Florence Townend · Federica Agosta · James Cole · Andrea Malaspina 🔗 |
-
|
Overcoming Challenges of Small Data and Over-parameterized DNN in fMRI-based Diagnosis
(
Poster
)
>
Classification of High-Dimensional Low Sample Size (HDLSS) datasets is a serious challenge, especially using Deep Neural Networks (DNNs).We present an improved DNN method for the classification of HDLSS dataset with the application of psychiatric and neurological disorders based on rest-fMRI brain signals. According to the comparison of several state-of-the-art supervised vs. self-supervised metric loss functions, we find the best method, i.e., self-supervised Mixup, and suggest some modifications. We propose Triplet Mixup, which locally samples neighbored triplets and augments new data within the triangular space, rather than the linear interpolation between pairs in classic Mixup. The loss is also extended to the Triplet Mixup loss. Our modifications promote better exploration of embedding space thus more diverse augmented data.Experimental results show that our method not only does not overfit despite too small datasets but also achieves almost best classification accuracies in disease predictions. The results also confirm the theory of overparameterized but generalized DNNs in the HDLSS setting on fMRI data. |
Kimia Alavi · Saeed Masoudnia · Ahmad Kalhor · Mohammadreza Nazemzadeh 🔗 |
-
|
Multi-Task Learning for Segmentation of Breast Arterial Calcifications in Mammograms
(
Poster
)
>
Screening mammogram is a standard process to measure breast cancer risk among 45+ year old women. Quantifying breast arterial calcification (BAC) from screening mammograms is a non-invasive and cost-efficient approach to assess the future risk of cardiovascular diseases (CVD) among women, such as heart attack and stroke. However, segmentation of breast arterial calcification is an involved task and poses several technical challenges such as extremely small BAC finding - low breast arteries to breast area ratio in the mammogram images; tissue features such as breast folds and heterogeneous density look very similar to BAC. In this work, we aim to address the shortcomings of existing SOTA methods, e.g., SCUNet, and analyze the comparative performance. We propose a multi-task learning approach for BAC segmentation by adding an auxiliary task of patch position prediction based on prior knowledge about anatomy. The proposed method achieves state-of-the-art performance compared to the baselines. To demonstrate the utility, we also validate our method on external data and provide survival analysis for CVD based on the BAC score. |
Aisha Urooj · William Charles O'Neill · Hari Trivedi · Imon Banerjee 🔗 |
-
|
Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification
(
Poster
)
>
This paper focuses on the classification task of breast ultrasound images and conducts research on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationals based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the Test Time Enhancement. The effectiveness of this reliability evaluation framework has been verified on our self-made breast ultrasound clinical datasets, and its robustness is verified on public datasets BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of our proposed reliability measurement. |
Shuge Lei 🔗 |