Joseph Wilk

Joseph Wilk

Things with code, creativity and computation.

Audio Fingerprint Smudges

Procedurally generating music, scoring generations based on an audio service (like SoundCloud) identifying it as an existing song/artist. The more famous the track/artist the better.

Machines identifying audio tend to:

  • Reduce the audio features to their essence (facilitating fast lookup or accuracy on a sliding scale).
  • Rely on computer vision techniques to create audio fingerprints.
  • Account for differing audio quality and ambient background noise.
  • Use a training set of sane music to teach the algorithm to recognise music.

Audiofinger print generated by Chromaprint:
Example audio fingerprint generated by Chromaprint for: Heaven by UNKLE

We use these propeties to guide us in creating new music for machines that explores the smudged edges around machine listening. Highlighting how differently humans and machines identify music. And for fun.

To try and match our generative audio to songs we will use a number of music services and some open-source audio fingerprinting tools. Most commercial audio fingerprinting algorithms are secret and patented up to the eyeballs. So there is a lot of trial and errors. Some services used to evalute a song match:

  • Soundcloud (copyright detection)
  • Youtube (copyright detection)
  • Shazam (audio lookup)
  • Chromaprint (Open-source audio fingerpinter)

Music for Machines

All generated tracks have comments exactly when a song was detected.

Warning: The audio clips are designed for machines and not your weak human ears ear. Hence keep your volume low before listening

Audio generations by @Finger_smudger

Generation #1 – 1467636802259

Artists/Songs identified:

  • Sophonic Media – Make It Shine
  • Pachanga Boys – Time
  • Johan Vilborg – Second Wind (Illuminor Remix)
  • Oelki – Galileo
  • Lipps, Inc – Funkytown
  • Spaceanddisco – Nights
  • Matt Fax – Voyage orignal mix Bullistik
  • Katty Perry – Birthday (Cash Cash Remix)


Generation #1.1 – 1469483772764

Artists/Songs identified:

  • George Michael – A Different Corner
  • Dezt – Last Year
  • Axiuw – Be Yourself (Original Mix)
  • Duran Duran – Thank You


Generation #2.0 – 1470683054969

Artist/Songs identified:

  • Dimension – Mangata
  • Michael Jackson – You Are Not Alone (tempo 60 bpm / B major)


Generation #2.0 – 1470700413305

Artist/Songs identified:

  • T-Pain Vs Chuckie Feat. Pitbull – Its Not You (Its Me)
  • Pink Floyd – Cymbaline


Machines listening with ambient background noise

While I experimented with lots of different services the above examples were most successful when using Shazam for identification. This focuses on operating in noisy environments and identifying a sound as quickly as possible based only on partial song information. This tolerance makes it easy to get Shazam to mis-match audio to songs.

The other services also had a nasty habit of banning new accounts uploading what appeared to be copyrighted infringing content (who would have thought!). Which makes the whole mass procedural generation somewhat challenging.

Shazam has a desktop app which will run detection on audio for 8 hours continuously. So over that time we generate a large set of audio and pick the winners from each generation.

Overtone synths & score generation

Using Overtone and Clojure a single audio generation works by:

  1. Dynamically generating Overtone synths using generative testing framework test.check. Using QuickCheck style generators is a cheap way of exploring a permutation space given grammar rules, like those of a synth definition in Overtone. Supports selection of:

    • Audio wave (potentially many combined)
    • Envelope type
    • Effects (reverb/echo/pitchshift/delays)
    • Filters (low pass/high pass)
      The various properties of the audio units are selected randomly.

  2. Dynamically generating a score varying:

    • Clock tempo
    • Note lengths
    • Root note / scale
    • Octaves

  3. Dynamically generating synth control parameters:

    • Distortion
    • Changing audio wave (sin/saw/triangle/square/etc)

  4. Running for 3 minutes with a random chance of mutation to score and parameters.

  5. Store state of a generation for future evolution. We store the state and mutations as edn: Example state for one generation

  6. Each generation scored based on number of Shazam matches (scraped from Notification alerts on OS X).

  7. Each generation scored by popularity of artists matched (manually grimace).


To avoid any background noise or messing with microphones we use SoundFlower with the following setup:

Overtone main audio ->speaker Soundflower input device
Soundflower output device ->speaker Shazam Desktop.

Conclusion

There is a clear difference in the strength of accuracy when it comes to fingerprinting audio for copyright infringement. It’s noticeable that Soundcloud or YouTube are matching when processing the entire track (even though it will check for partial matches) while Shazam focuses on as small a segment as possible LIVE. Open-source alternatives (like Chromaprint) while useful, provided little help tricking the commercial services.

Coming back to Shazam, what actually made the tracks match remains somewhat of a mystery. If we look at one example “Michael Jackson - You Are Not Alone” our generative score was not even in the same scale or tempo! We can identify things that made it hard to match, for example adding drums patterns killed off all matches. More layers of audio, more permutations to explore.

One thing is clear, the way machines learn and the specialisation on a single application rules out a whole subset of sound that is unlikely to enter the realm of music. Hence for the creators of the algorithms, a mismatch of this type is of little relevance.

This lost ghost music is perhaps just for the machines.

Source code

Comments