Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN). If you are not part of the University requesting a copy of a thesis, please note, all requests are processed manually by staff and will require additional time to process.
 

Publication:

Leveraging Natural Language Processing for Sentiment Analysis: Model Performance and Insights

Loading...
Thumbnail Image

Files

Swartwout_William_SeniorThesis_FINAL.pdf (3.67 MB)

Date

2025-04-10

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Sentiment analysis has emerged as a significant area of research across various disciplines, offering a powerful tool to quantify human emotion. While existing studies have demonstrated that sentiment metrics can offer some predictive power, they often rely on infrequently updated indicators that are typically only updated on a monthly basis, which limits their usefulness for real-time applications. Prior research has largely overlooked the full potential of high-frequency sentiment measures, but this study aims to address that gap by exploring methods for constructing a daily sentiment index using advanced natural language processing (NLP) models, including BERT, RoBERTa, XLNet, and VADER. We examine how each of these models performs in extracting sentiment from five major news sources: The New York Times, The Wall Street Journal, The Washington Post, The Chicago Tribune, and The Los Angeles Times from 2010 through 2018. Through this analysis, we seek to better understand the strengths and limitations of each model in producing high-frequency sentiment indicators with potential applications in economic forecasting and beyond. Additionally, we examine how various news sources influence the sentiment detected, investigate the presence of regional disparities, and explore potential underlying factors that may shape sentiment levels across publications.

Description

Keywords

Citation