Uncertainty-Aware Transformers: Conformal Prediction for LLMs

Jha, Niraj KumarVellore, Abhiram2025-08-122025-08-122025-04-14https://theses-dissertations.princeton.edu/handle/88435/dsp018623j2203This study extends the CONFINE algorithm as a framework for uncertainty quantification onto transformer-based language models. CONFIDE (CONformal prediction for FIne-tuned DEep language models) applies conformal prediction to the internal embeddings of BERT and RoBERTa architectures, introducing new hyperparameters such as distance metrics and PCA. CONFIDE uses either [CLS] token embeddings or flattened hidden states to construct class-conditional nonconformity scores, enabling statistically valid prediction sets with instance-level explanations. Empirically, CONFIDE improves test accuracy by up to 4.09% on BERT-TINY and achieves greater correct efficiency compared to prior methods, including NM2 and VanillaNN. We show that early and intermediate transformer layers often yield better-calibrated and more semantically meaningful representations for conformal prediction. In resource-constrained models and high-stakes tasks with ambiguous labels, CONFIDE offers robustness and interpretability where softmax-based uncertainty fails.en-USUncertainty-Aware Transformers: Conformal Prediction for LLMsPrinceton University Senior Theses