Publication: Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training
dc.contributor.advisor | Jha, Niraj Kumar | |
dc.contributor.author | Kalap, Katharine | |
dc.date.accessioned | 2025-08-12T16:27:10Z | |
dc.date.available | 2025-08-12T16:27:10Z | |
dc.date.issued | 2025-04-14 | |
dc.description.abstract | Accurate medical image classification using computer vision remains a challenge in clinical radiology, particularly in low-data settings where labelled examples are scarce or expensive to obtain. This thesis evaluates 20 separate model configurations across 3 binary chest X-ray classification tasks, to determine the impact of pre-training, base architecture, and fine-tuning strategies on diagnostic accuracy. The models' variations include convolutional (ResNet-18, ResNet-50) and transformer base models (ViT, DINOv2), application of pre-training or not on a large corpus of healthy chest X-rays images, and method of fine-tuning (transfer learning, full fine-tuning, Low-Rank Adaptation, Chain of Low-Rank Adaptation, and Weight-Decomposed Low-Rank Adaptation). The findings prove that the effect of domain-specific pre-training significantly boosts downstream performance of CNNs and models trained on small datasets (< 2,500 diseased images) by an average of 2.55% and 3.9% respectively. Convolutional architectures consistently outperform the top transformer-based models by an average of 9.5%. Almost all pre-trained ResNet models match or exceeded benchmark standards for public datasets, achieving up to 96.1% accuracy on the largest dataset (8,716 labelled examples) and up to 79.4% average accuracy across the 3 tasks, with a mean dataset size of 5,209 labelled images. As a result, this work comprehensively shows CNN-based models in computer vision-based medical diagnostics and pre-training on a large, related, healthy corpora improves downstream classification accuracy. The two of which should be adopted into routine use in critical fields such as radiology, where high-quality data is scarce and accuracy is paramount. | |
dc.identifier.uri | https://theses-dissertations.princeton.edu/handle/88435/dsp016395wb56b | |
dc.language.iso | en | |
dc.title | Improving Data-Scarce Medical Diagnosis by Healthy Image Pre-Training | |
dc.type | Princeton University Senior Theses | |
dspace.entity.type | Publication | |
dspace.workflow.startDateTime | 2025-04-15T00:27:07.979Z | |
dspace.workflow.startDateTime | 2025-04-29T16:59:00.927Z | |
pu.contributor.authorid | 920308576 | |
pu.date.classyear | 2025 | |
pu.department | Electrical and Computer Engineering |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Kalap_Katharine.pdf
- Size:
- 5.68 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 100 B
- Format:
- Item-specific license agreed to upon submission
- Description: