Princeton University users: to view a senior thesis while away from campus, connect to the campus network via the Global Protect virtual private network (VPN). Unaffiliated researchers: please note that requests for copies are handled manually by staff and require time to process.
 

Publication:

VocalSep: High-Resolution Target Speaker Extraction

datacite.rightsrestricted
dc.contributor.advisorFinkelstein, Adam
dc.contributor.authorEggert, Sam
dc.date.accessioned2025-08-06T14:19:43Z
dc.date.available2025-08-06T14:19:43Z
dc.date.issued2025-04-10
dc.description.abstractTarget Speech Separation (TSE) is the task of isolating an individual speakers from an auditory scene composed of a mixture of multiple speakers and environmental noise. Recent models in the larger field of audio source separation have achieved impressive performance utilizing convolutional neural networks. These models vary in their use cases, from isolating individual instruments in music to more general-use models capable of separating based on a language query (text description). Impressive performance has also been achieved by recent “voice encoder” models capable of creating useful representations of the characteristics of a speaker’s voice. This thesis seeks to combine the methods of recent audio source separation and voice encoder models to isolate individual voices from complex auditory scenes containing multiple speakers and environmental noise. While previous TSE models have succeeded in extracting individual voices from an auditory scene, they can only be used on low sample rate audio that captures frequencies less than half the human-audible range. In this work, I introduce VocalSep, a high resolution TSE model that uses a short audio prompt of a target speaker to recognize and extract their voice from noisy audio mixtures containing multiple speakers.
dc.identifier.urihttps://theses-dissertations.princeton.edu/handle/88435/dsp015425kf140
dc.language.isoen
dc.titleVocalSep: High-Resolution Target Speaker Extraction
dc.typePrinceton University Senior Theses
dspace.entity.typePublication
dspace.workflow.startDateTime2025-05-07T21:03:49.781Z
pu.contributor.authorid920278178
pu.date.classyear2025
pu.departmentComputer Science
pu.minorStatistics and Machine Learning

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
written_final_report.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
Download

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
100 B
Format:
Item-specific license agreed to upon submission
Description:
Download