• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Bricolage v10e-6: Resynthesis from Multiple Sources with the EchoNest Remix API

Page history last edited by rif 14 years, 3 months ago

Team: rif

Tools: EchoNest remix API, python.


The EchoNest Remix API includes functionality to break down a song into short units of sound [segments], and also to measure features on these segments [pitch classes, timbre, and volume are currently provided].  Previous work by Ben Lacker [afromb.py in the echonest remix examples directory] resynthesized a track by mixing in closely matching segments from a different track.


In this work, we extend this idea to use multiple resynthesis tracks.  At this time, there are two main innovations:

  • In the initial afromb example, the resynthesis is performed by computing between each segment in the original song and each segment in the resynthesized song.  This is much too slow.  We instead use a kd-tree to more rapidly compute the nearest neighbors [http://docs.scipy.org/doc/scipy/reference/spatial.html].  This allows us to find closely matching segments in a large number of tracks much faster than the brute force approach.
  • The original afromb code resynthesized by storing the uncompressed audio in memory.  This is not possible for a large song collection.  Instead, we store the uncompressed songs in disk, one file per segment, and retrieve them as needed.


As of now, we have a resynthesis corpus consisting of 92 tracks:


  • Moby's album "Play"
  • The first disc of the 2 CD "Best of Bowie" set
  • All of the Backstreet Boys' "The Hits - Chapter One" album except for "I Want It That Way"
  • All of the Jean-Yves Thibaudet CD "The Magic of Satie" except for the first Gymnopedie
  • Most of Jimmy Yancey's "Vol. 3, 1943-1950"


Using this corpus, we have resynthesized a number of examples, including Bob Dylan's "Don't Think Twice It's Alright",  Aretha Franklin's "Dr. Feelgood", Cake's "The Distance", and Satie's 1st Gymnopedie.


There are so many improvements possible, I've decided to call this version 10e-6.  Things to do in the short-term:


  • The code is a gross mess.  I do not mean a sexy mess.
  • There are lots of obvious speed improvements, notably related to caching results of API calls.  Robert Ochshorn actually has some of this working, although I haven't integrated it into my code due to lack of time/fear of breaking something.
  • We really ought to be reading compressed audio off disk, and probably using mysqlite or similar rather than files to store the segments.
  • The spatial kdtree is a pretty weak data structure, and occasionally fails to build properly.  We're in high enough dimensions that we should probably pull out the big guns, like locality sensitive hashing or the recent work of Dasgupta and Freund on random projection trees.


Longer term, there are a bunch of things I'd love to do:


  • Scale to a really large corpus [many thousands of songs].  This may be possible using a combination of the above techniques.
  • Real-time control:  Have a collection of large corpuses and let the user reweight as the synthesis is occuring.
  • Offer choices for segments, make it into more of a composition tool.  Use it to find things to add from various tracks.
  • Learning more about what sounds good in this area.


Thanks to: the awesome people at EchoNest for providing the RemixAPI, especially Brian Whitman, Paul Lamere and Ben Lacker, who helped out with suggestions.

Comments (1)

Nick Peters said

at 8:12 pm on Nov 22, 2009

This sounds fascinating, would love to here some of the resynthesized tracks.

You don't have permission to comment on this page.