Campus users should disconnect from VPN to access senior theses, as there is a temporary disruption affecting VPN.
 

Publication:

Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training

datacite.rightsrestricted
dc.contributor.advisorJha, Niraj Kumar
dc.contributor.authorKalap, Katharine
dc.date.accessioned2025-08-12T16:27:10Z
dc.date.available2025-08-12T16:27:10Z
dc.date.issued2025-04-14
dc.description.abstractAccurate medical image classification using computer vision remains a challenge in clinical radiology, particularly in low-data settings where labelled examples are scarce or expensive to obtain. This thesis evaluates 20 separate model configurations across 3 binary chest X-ray classification tasks, to determine the impact of pre-training, base architecture, and fine-tuning strategies on diagnostic accuracy. The models' variations include convolutional (ResNet-18, ResNet-50) and transformer base models (ViT, DINOv2), application of pre-training or not on a large corpus of healthy chest X-rays images, and method of fine-tuning (transfer learning, full fine-tuning, Low-Rank Adaptation, Chain of Low-Rank Adaptation, and Weight-Decomposed Low-Rank Adaptation). The findings prove that the effect of domain-specific pre-training significantly boosts downstream performance of CNNs and models trained on small datasets (< 2,500 diseased images) by an average of 2.55% and 3.9% respectively. Convolutional architectures consistently outperform the top transformer-based models by an average of 9.5%. Almost all pre-trained ResNet models match or exceeded benchmark standards for public datasets, achieving up to 96.1% accuracy on the largest dataset (8,716 labelled examples) and up to 79.4% average accuracy across the 3 tasks, with a mean dataset size of 5,209 labelled images. As a result, this work comprehensively shows CNN-based models in computer vision-based medical diagnostics and pre-training on a large, related, healthy corpora improves downstream classification accuracy. The two of which should be adopted into routine use in critical fields such as radiology, where high-quality data is scarce and accuracy is paramount.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp016395wb56b
dc.language.isoen
dc.titleImproving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-04-15T00:27:07.979Z
dspace.workflow.startDateTime2025-04-29T16:59:00.927Z
pu.contributor.authorid920308576
pu.date.classyear2025
pu.departmentElectrical and Computer Engineering

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kalap_Katharine.pdf
Size:
5.68 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download