Audio Transposition

Published 5/30/2020


We explore using deep learning to automatically transpose audio: remap the pitch and respective harmonics of a given sound to a new frequency mapping. State-of-the-art systems that model raw-audio are limited to generating audio one sample at a time, which is prohibitively slow. This motivates our use of deep recurrent neural networks to develop a system that can run in near-real time, while attempting to minimize noise and interference. We find that our neural network was able to learn encodings of pitch and timbre (tone color) from raw audio. This allowed the network to generate audio that has been shifted up in pitch while maintaining accurate perception of the music’s timbre. Our model also generalized from training with only synthetic monophonic sounds to accurately remapping real polyphonic sounds with accurate timbre.

Team Members

  • Team member portrait
    Parker Carlson


    Parker Carlson

    B.S. Computer Science

    Doctoral candidate, University of California, Santa Barbara