Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN). If you are not part of the University requesting a copy of a thesis, please note, all requests are processed manually by staff and will require additional time to process.
 

Publication:

LLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages

dc.contributor.advisorKshirsagar, Mihir
dc.contributor.authorAgarwal, Tara
dc.date.accessioned2025-08-06T15:54:47Z
dc.date.available2025-08-06T15:54:47Z
dc.date.issued2025-05-04
dc.description.abstractThe Constitution of India designates English as the language of jurisprudence in the Supreme and High Courts, thereby excluding those without English proficiency—the majority of the country’s population—from meaningful participation in legal discourse. To address the dual challenges of linguistic inaccessibility and legal complexity, this thesis investigates a range of large language model (LLM) configurations for summarizing English-language legal judgments into Hindi, India’s most widely spoken language. Recognizing the limitations of traditional rule-based evaluation metrics for cross-lingual tasks, we introduce a task-aligned, interpretable evaluation suite. Key components include pairwise BERTScores to assess output homogeneity, a question-answering framework (LLM-as-a-judge) to measure faithfulness, and named entity preservation metrics as proxies for legal precision. Consistent with prior work, we observe that the summarize-then-translate pipeline outperforms direct end- to-end generation. Surprisingly, across both paradigms, one-shot prompting results in performance declines relative to its zero-shot counterpart for LLaMA 3.1 8B and Qwen 2.5 7B alike. Accompanied by increased homogeneity and extractiveness, this behavior indicates that providing a judgment-summary example encourages stylistic imitation at the cost of information coverage. We find that decoder-only models, despite generating less extractive summaries, achieve substantial gains over baseline ROUGE and BERTScores, contrary to the presumed trade-off between abstractiveness and faithfulness. Finally, our question-answering framework reveals that models are prone to error when reproducing the court’s reasoning. Despite favorable scores from embedding- and overlap-based metrics, this demonstrates that current LLMs fall short of the factuality required for high-stakes legal summarization tasks. Nonetheless, our task-aligned evaluation suite serves as an important institutional readiness check for public-facing deployment, mirrors how humans evaluate summaries, and yields deeper insight into model behavior.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp0112579w71w
dc.language.isoen_US
dc.titleLLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-05-04T14:30:08.840Z
pu.contributor.authorid920282230
pu.date.classyear2025
pu.departmentComputer Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Thesis_vFINAL.pdf
Size:
3.79 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download