Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Kalap, Katharine

Publication:
Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Files

Kalap_Katharine.pdf (5.68 MB)

Date

2025-04-14

Authors

Kalap, Katharine

Abstract

Accurate medical image classification using computer vision remains a challenge in clinical radiology, particularly in low-data settings where labelled examples are scarce or expensive to obtain. This thesis evaluates 20 separate model configurations across 3 binary chest X-ray classification tasks, to determine the impact of pre-training, base architecture, and fine-tuning strategies on diagnostic accuracy. The models' variations include convolutional (ResNet-18, ResNet-50) and transformer base models (ViT, DINOv2), application of pre-training or not on a large corpus of healthy chest X-rays images, and method of fine-tuning (transfer learning, full fine-tuning, Low-Rank Adaptation, Chain of Low-Rank Adaptation, and Weight-Decomposed Low-Rank Adaptation).

The findings prove that the effect of domain-specific pre-training significantly boosts downstream performance of CNNs and models trained on small datasets (< 2,500 diseased images) by an average of 2.55% and 3.9% respectively. Convolutional architectures consistently outperform the top transformer-based models by an average of 9.5%. Almost all pre-trained ResNet models match or exceeded benchmark standards for public datasets, achieving up to 96.1% accuracy on the largest dataset (8,716 labelled examples) and up to 79.4% average accuracy across the 3 tasks, with a mean dataset size of 5,209 labelled images. As a result, this work comprehensively shows CNN-based models in computer vision-based medical diagnostics and pre-training on a large, related, healthy corpora improves downstream classification accuracy. The two of which should be adopted into routine use in critical fields such as radiology, where high-quality data is scarce and accuracy is paramount.

URI

https://theses-dissertations.princeton.edu/handle/88435/dsp016395wb56b

Collections

Electrical and Computer Engineering, 1932-2025

Full item page

Publication:
Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Description

Keywords

Citation

URI

Collections

Publication: Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Description

Keywords

Citation

URI

Collections

Publication:
Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training