Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Kalap, Katharine

Publication:
Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

dc.contributor.advisor	Jha, Niraj Kumar
dc.contributor.author	Kalap, Katharine
dc.date.accessioned	2025-08-12T16:27:10Z
dc.date.available	2025-08-12T16:27:10Z
dc.date.issued	2025-04-14
dc.description.abstract	Accurate medical image classification using computer vision remains a challenge in clinical radiology, particularly in low-data settings where labelled examples are scarce or expensive to obtain. This thesis evaluates 20 separate model configurations across 3 binary chest X-ray classification tasks, to determine the impact of pre-training, base architecture, and fine-tuning strategies on diagnostic accuracy. The models' variations include convolutional (ResNet-18, ResNet-50) and transformer base models (ViT, DINOv2), application of pre-training or not on a large corpus of healthy chest X-rays images, and method of fine-tuning (transfer learning, full fine-tuning, Low-Rank Adaptation, Chain of Low-Rank Adaptation, and Weight-Decomposed Low-Rank Adaptation). The findings prove that the effect of domain-specific pre-training significantly boosts downstream performance of CNNs and models trained on small datasets (< 2,500 diseased images) by an average of 2.55% and 3.9% respectively. Convolutional architectures consistently outperform the top transformer-based models by an average of 9.5%. Almost all pre-trained ResNet models match or exceeded benchmark standards for public datasets, achieving up to 96.1% accuracy on the largest dataset (8,716 labelled examples) and up to 79.4% average accuracy across the 3 tasks, with a mean dataset size of 5,209 labelled images. As a result, this work comprehensively shows CNN-based models in computer vision-based medical diagnostics and pre-training on a large, related, healthy corpora improves downstream classification accuracy. The two of which should be adopted into routine use in critical fields such as radiology, where high-quality data is scarce and accuracy is paramount.
dc.identifier.uri	https://theses-dissertations.princeton.edu/handle/88435/dsp016395wb56b
dc.language.iso	en
dc.title	Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training
dc.type	Princeton University Senior Theses
dspace.entity.type	Publication
dspace.workflow.startDateTime	2025-04-15T00:27:07.979Z
dspace.workflow.startDateTime	2025-04-29T16:59:00.927Z
pu.contributor.authorid	920308576
pu.date.classyear	2025
pu.department	Electrical and Computer Engineering

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kalap_Katharine.pdf
Size:: 5.68 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 100 B
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Electrical and Computer Engineering, 1932-2025

Publication: Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

Files

Original bundle

License bundle

Collections

Publication:
Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training