InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics

Wachspress, Benjamin

Publication:
InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics

datacite.rights	restricted
dc.contributor.advisor	Fong, Ruth Catherine
dc.contributor.author	Wachspress, Benjamin
dc.date.accessioned	2025-08-06T14:24:19Z
dc.date.available	2025-08-06T14:24:19Z
dc.date.issued	2025-04-07
dc.description.abstract	In recent years, hate speech has risen at an alarming rate, underscoring the urgent need for effective content moderation systems to ensure the safety of online spaces. However, mounting political pressure from the new Trump administration, coupled with widespread skepticism about the reliability of hate speech classification models, has led many social media platforms to significantly reduce their moderation efforts. This thesis investigates the weaknesses and vulnerabilities of three hate speech detection models - logistic regression, SVM, and BERT - on Twitter posts. It explores how these models distinguish hate speech from offensive or neutral language, with a particular focus on the impact of slurs and references to gender, race, and sexuality on classification outcomes. The findings reveal three key insights: (1) While BERT achieves the highest overall accuracy (82%), all models struggle to differentiate hate speech from offensive language. (2) All models also exhibit a clear bias against classifying even blatant misogyny as hate speech. (3) Model performance deteriorates significantly when encountering text that differs from the training data. As the issue of online hate speech continues to escalate, it is crucial that we improve the ability of hate speech detection systems to identify and mitigate the most harmful online discourse.
dc.identifier.uri	https://theses-dissertations.princeton.edu/handle/88435/dsp01cn69m758b
dc.language.iso	en_US
dc.title	InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics
dc.type	Princeton University Senior Theses
dspace.entity.type	Publication
dspace.workflow.startDateTime	2025-04-07T17:29:07.813Z
pu.contributor.authorid	920245577
pu.date.classyear	2025
pu.department	Computer Science
pu.minor	Statistics and Machine Learning

Files

Original bundle

Now showing 1 - 1 of 1

Name:: written_final_report.pdf
Size:: 6.39 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 100 B
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Computer Science, 1987-2025

Publication: InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics

Files

Original bundle

License bundle

Collections

Publication:
InvestiHate: How Hate Speech Detection Models Identify Language Targeting Different Social Demographics