Campus users should disconnect from VPN to access senior theses, as there is a temporary disruption affecting VPN.
 

Publication:

Optical Document Recognition (ODR) with Large Vision-Language Models: Enhancing Metadata Creation and Digitization in Libraries

datacite.rightsrestricted
dc.contributor.advisorKernighan, Brian W.
dc.contributor.authorZhang, James
dc.date.accessioned2025-08-06T14:49:07Z
dc.date.available2025-08-06T14:49:07Z
dc.date.issued2025-04-10
dc.description.abstractAs libraries and archives digitize their collections, a familiar challenge persists: making them searchable and accessible. Manual metadata creation is infeasible at scale, and legacy OCR systems often falter on complex, symbol-rich pages. This thesis introduces MetaScribe, a flexible system that uses recent advances in large vision-language models (LVLMs) to support metadata generation at scale. Tested on materials from the Princeton Prosody Archive (PPA), MetaScribe improved character recognition accuracy by over 20 percentage points and produced field-level metadata with promising reliability (average F1 score of 0.72). Yet the aim is not automation for its own sake. MetaScribe is designed to work alongside archivists and librarians, not in place of them. Through this thesis, we offer a modular, transparent framework that preserves human judgment while extending institutional capacity. As AI capabilities grow, tools like MetaScribe are a practical path forward: adaptable, accountable, and grounded in the needs of cultural stewardship.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp016w924g28r
dc.language.isoen_US
dc.titleOptical Document Recognition (ODR) with Large Vision-Language Models: Enhancing Metadata Creation and Digitization in Libraries
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-04-27T16:05:01.334Z
dspace.workflow.startDateTime2025-05-01T19:17:56.215Z
pu.contributor.authorid920291153
pu.date.classyear2025
pu.departmentComputer Science
pu.minorStatistics and Machine Learning
pu.minorLinguistics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
James_Zhang_COS_Thesis.pdf
Size:
11.89 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download