Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics

datacite.rightsrestricted
dc.contributor.advisorFong, Ruth Catherine
dc.contributor.authorWachspress, Benjamin
dc.date.accessioned2025-08-06T14:24:19Z
dc.date.available2025-08-06T14:24:19Z
dc.date.issued2025-04-07
dc.description.abstractIn recent years, hate speech has risen at an alarming rate, underscoring the urgent need for effective content moderation systems to ensure the safety of online spaces. However, mounting political pressure from the new Trump administration, coupled with widespread skepticism about the reliability of hate speech classification models, has led many social media platforms to significantly reduce their moderation efforts. This thesis investigates the weaknesses and vulnerabilities of three hate speech detection models - logistic regression, SVM, and BERT - on Twitter posts. It explores how these models distinguish hate speech from offensive or neutral language, with a particular focus on the impact of slurs and references to gender, race, and sexuality on classification outcomes. The findings reveal three key insights: (1) While BERT achieves the highest overall accuracy (82%), all models struggle to differentiate hate speech from offensive language. (2) All models also exhibit a clear bias against classifying even blatant misogyny as hate speech. (3) Model performance deteriorates significantly when encountering text that differs from the training data. As the issue of online hate speech continues to escalate, it is crucial that we improve the ability of hate speech detection systems to identify and mitigate the most harmful online discourse.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp01cn69m758b
dc.language.isoen_US
dc.titleInvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-04-07T17:29:07.813Z
pu.contributor.authorid920245577
pu.date.classyear2025
pu.departmentComputer Science
pu.minorStatistics and Machine Learning

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
written_final_report.pdf
Size:
6.39 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download