Publication: LLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages
dc.contributor.advisor | Kshirsagar, Mihir | |
dc.contributor.author | Agarwal, Tara | |
dc.date.accessioned | 2025-08-06T15:54:47Z | |
dc.date.available | 2025-08-06T15:54:47Z | |
dc.date.issued | 2025-05-04 | |
dc.description.abstract | The Constitution of India designates English as the language of jurisprudence in the Supreme and High Courts, thereby excluding those without English proficiency—the majority of the country’s population—from meaningful participation in legal discourse. To address the dual challenges of linguistic inaccessibility and legal complexity, this thesis investigates a range of large language model (LLM) configurations for summarizing English-language legal judgments into Hindi, India’s most widely spoken language. Recognizing the limitations of traditional rule-based evaluation metrics for cross-lingual tasks, we introduce a task-aligned, interpretable evaluation suite. Key components include pairwise BERTScores to assess output homogeneity, a question-answering framework (LLM-as-a-judge) to measure faithfulness, and named entity preservation metrics as proxies for legal precision. Consistent with prior work, we observe that the summarize-then-translate pipeline outperforms direct end- to-end generation. Surprisingly, across both paradigms, one-shot prompting results in performance declines relative to its zero-shot counterpart for LLaMA 3.1 8B and Qwen 2.5 7B alike. Accompanied by increased homogeneity and extractiveness, this behavior indicates that providing a judgment-summary example encourages stylistic imitation at the cost of information coverage. We find that decoder-only models, despite generating less extractive summaries, achieve substantial gains over baseline ROUGE and BERTScores, contrary to the presumed trade-off between abstractiveness and faithfulness. Finally, our question-answering framework reveals that models are prone to error when reproducing the court’s reasoning. Despite favorable scores from embedding- and overlap-based metrics, this demonstrates that current LLMs fall short of the factuality required for high-stakes legal summarization tasks. Nonetheless, our task-aligned evaluation suite serves as an important institutional readiness check for public-facing deployment, mirrors how humans evaluate summaries, and yields deeper insight into model behavior. | |
dc.identifier.uri | https://theses-dissertations.princeton.edu/handle/88435/dsp0112579w71w | |
dc.language.iso | en_US | |
dc.title | LLMs for Legal and Linguistic Accessibility: Toward Task-Aligned Interpretable Evaluation in Low-Resource Languages | |
dc.type | Princeton University Senior Theses | |
dspace.entity.type | Publication | |
dspace.workflow.startDateTime | 2025-05-04T14:30:08.840Z | |
pu.contributor.authorid | 920282230 | |
pu.date.classyear | 2025 | |
pu.department | Computer Science |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Thesis_vFINAL.pdf
- Size:
- 3.79 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 100 B
- Format:
- Item-specific license agreed to upon submission
- Description: