Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

LLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages

datacite.rightsrestricted
dc.contributor.advisorKshirsagar, Mihir
dc.contributor.authorAgarwal, Tara
dc.date.accessioned2025-08-06T15:54:47Z
dc.date.available2025-08-06T15:54:47Z
dc.date.issued2025-05-04
dc.description.abstractThe Constitution of India designates English as the language of jurisprudence in the Supreme and High Courts, thereby excluding those without English proficiency—the majority of the country’s population—from meaningful participation in legal discourse. To address the dual challenges of linguistic inaccessibility and legal complexity, this thesis investigates a range of large language model (LLM) configurations for summarizing English-language legal judgments into Hindi, India’s most widely spoken language. Recognizing the limitations of traditional rule-based evaluation metrics for cross-lingual tasks, we introduce a task-aligned, interpretable evaluation suite. Key components include pairwise BERTScores to assess output homogeneity, a question-answering framework (LLM-as-a-judge) to measure faithfulness, and named entity preservation metrics as proxies for legal precision. Consistent with prior work, we observe that the summarize-then-translate pipeline outperforms direct end- to-end generation. Surprisingly, across both paradigms, one-shot prompting results in performance declines relative to its zero-shot counterpart for LLaMA 3.1 8B and Qwen 2.5 7B alike. Accompanied by increased homogeneity and extractiveness, this behavior indicates that providing a judgment-summary example encourages stylistic imitation at the cost of information coverage. We find that decoder-only models, despite generating less extractive summaries, achieve substantial gains over baseline ROUGE and BERTScores, contrary to the presumed trade-off between abstractiveness and faithfulness. Finally, our question-answering framework reveals that models are prone to error when reproducing the court’s reasoning. Despite favorable scores from embedding- and overlap-based metrics, this demonstrates that current LLMs fall short of the factuality required for high-stakes legal summarization tasks. Nonetheless, our task-aligned evaluation suite serves as an important institutional readiness check for public-facing deployment, mirrors how humans evaluate summaries, and yields deeper insight into model behavior.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp0112579w71w
dc.language.isoen_US
dc.titleLLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-05-04T14:30:08.840Z
pu.contributor.authorid920282230
pu.date.classyear2025
pu.departmentComputer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_vFINAL.pdf
Size:
3.79 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download