Princeton University Users: If you would like to view a senior thesis while you are away from campus, you will need to connect to the campus network remotely via the Global Protect virtual private network (VPN). If you are not part of the University requesting a copy of a thesis, please note, all requests are processed manually by staff and will require additional time to process.
 

Publication:

A Supervised Learning Framework for Generating DJ Transitions

Loading...
Thumbnail Image

Files

Michael Hein Final Thesis.pdf (2.65 MB)

Date

2025-04-10

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

A disc jockey (DJ) curates a seamless auditory experience by skillfully transitioning between tracks. While these transitions can sometimes involve complex loops and sound effects, their most fundamental components often involve manipulating volume and adjusting frequency ranges to blend two songs. Prior work on automating DJ transitions has largely relied on heuristics or unsupervised learning approaches such as generative adversarial networks (GANs). In this paper, we present a unique supervised learning framework for generating DJ transitions between two tracks, providing an interpretable, data-driven alternative to previous methods. Using a dataset from 1001Tracklists containing real DJ mixes and their source tracks, we extract mel-spectrograms of the audio and train a convolutional neural network (CNN) to predict control signals that specify how volume and equalizer (EQ) bands should change over time. These predicted control signals are then applied to the source tracks to produce a transition, which is compared to the original transition from the DJ mix. To generate labeled input-output training pairs, we developed a full preprocessing pipeline that includes track-to-mix alignment using dynamic time warping (DTW), supported by both theoretical and empirical analyses of feature selection. While inspired by differentiable digital signal processing (DDSP), our learning phase operates entirely in the mel-spectrogram domain for simplicity and interpretability. We trained the model on a single example and found that it was able to replicate the corresponding ground truth transition with reasonable accuracy, offering early evidence that the task is learnable and that our framework has the capacity to produce non-trivial transitions. This work demonstrates the potential of supervised learning in generating realistic DJ transitions and lays the foundation for future research training on more data.

Description

Keywords

Citation