Leveraging Natural Language Processing for Sentiment Analysis: Model Performance and Insights

Watson, Mark W.Swartwout, William2025-07-282025-07-282025-04-10https://theses-dissertations.princeton.edu/handle/88435/dsp01vh53x017gSentiment analysis has emerged as a significant area of research across various disciplines, offering a powerful tool to quantify human emotion. While existing studies have demonstrated that sentiment metrics can offer some predictive power, they often rely on infrequently updated indicators that are typically only updated on a monthly basis, which limits their usefulness for real-time applications. Prior research has largely overlooked the full potential of high-frequency sentiment measures, but this study aims to address that gap by exploring methods for constructing a daily sentiment index using advanced natural language processing (NLP) models, including BERT, RoBERTa, XLNet, and VADER. We examine how each of these models performs in extracting sentiment from five major news sources: The New York Times, The Wall Street Journal, The Washington Post, The Chicago Tribune, and The Los Angeles Times from 2010 through 2018. Through this analysis, we seek to better understand the strengths and limitations of each model in producing high-frequency sentiment indicators with potential applications in economic forecasting and beyond. Additionally, we examine how various news sources influence the sentiment detected, investigate the presence of regional disparities, and explore potential underlying factors that may shape sentiment levels across publications.en-USLeveraging Natural Language Processing for Sentiment Analysis: Model Performance and InsightsPrinceton University Senior Theses