How to search video by its audio content

johnmeyer wrote on 7/17/2013, 4:16 PM
A client just asked if there is a way to find all video clips on a 2TB drive in which the song "Happy Birthday" is played.

This same person, several weeks ago, showed me an app on his smartphone that could identify virtually any song I played on my computer, including various "covers" of the original recording of the song. Its ability to unerringly identify each song, even though the music was coming from speakers on the other side of the room, and even though I specifically chose performances that were oddball variations of the original artists, was uncanny. It also identified some extremely rare and unusual songs that almost no one (I thought) has ever heard of. Amazing.

So, does anyone know of an application that would let me use that same technology to search through video files and flag those (preferably with a timecode where the song starts) where a given song is played?


willqen wrote on 7/17/2013, 4:37 PM
Wow !!! that would be amazing. And it was a mobile app? wow.

I hope you find it Mr. johnmeyer and share it with the rest of us, especially me! Hah.!

I would be very grateful. I already am, I learn a lot from your posts.

We are lucky you invest the time you do on the forum.

Just thought I would mention it .......

fldave wrote on 7/17/2013, 4:43 PM
My android ap Shazam listens to songs being played and tells you what the song is, then pops up the words, in time with the music, and gives you a link to buy mp3s.

The tech is there, like when you upload a youtube video and it flags it as infringing on someone's rights.

But I haven't heard of a standalone PC ap that would do that yet, probably because the licensing is expensive. I'm sure the forensics labs have it.

Is this client a licensing validator? Happy Birthday is a very expensive song to get the rights to, depending on what you wanted to do with it.
fldave wrote on 7/17/2013, 4:44 PM
By the way, Shazam was a great party trick when it first came out!
johnmeyer wrote on 7/17/2013, 5:29 PM
Wow !!! that would be amazing. And it was a mobile app? wow. I hope you find it Mr. johnmeyer and share it with the rest of us, especially me! Hah.!I don't own a smartphone (not even a cell phone ...) so I didn't realize that there are several apps that can do this on a cell phone. The one I saw (heard) demo'd was Shazam, as already mentioned by the previous posters. I guess SoundHound is another.

What I want is to be able to access the Shazam technology within a video editing or searching program. I've contacted their developers, but I thought people here might have heard of something.

What amazed me was Shazam's ability to recognize the song no matter who was playing it (although it couldn't identify the song when I sang it ... go figure). The only time I stumped it was when I played Arthur P. Barnes' 1963 arrangement of the "Star Spangled Banner" which is extraordinarily strange in both arrangement and tempo (although -- diving way OT -- it is my all-time favorite arrangement, and has a wonderful story behind it -- one of the first performances was at the 1963 "Big Game" eight days after the Kennedy assasination).

But I digress ...

It was also able to recognize most songs in only a few notes, much like the old game show "Name That Tune."

Finally, thanks for the kind words.

willqen wrote on 7/17/2013, 5:34 PM
Thank you, johnmeyer.

ChristoC wrote on 7/17/2013, 6:24 PM
Yes there's been ways to do this on ordinary 'dumb' cell phones long before we heard of 'apps' and 'smart' phones... I had a cheap $10 phone years ago that did that, but never tried it ....

I wonder can they identify the 'tune' under a blanket of FX, Atmos & Dialog?

As Music Editor on many feature films it's often my responsibility to fill out the Music Cue Sheets for Composer and Publisher royalties and rights calculations, which includes known 'source' music as well as music specifically written for the film - although I have all the music in front of me on a DAW at the time of delivery to the film's final mix, and can see the timings easily, the cue sheets are still written by listening to the final mix with stop-watch in hand as the parties require timings based on what is audible and recognizable only (not all music delivered to the mixing stage is necessarily used); it is hard to envisage an application which could determine that criteria given my initial question. (and the fact that much of the music has never been released!)
john_dennis wrote on 7/17/2013, 8:18 PM
If you want to go down that path, you might search for the [I]music genome project[/I] which was the underlying technology behind Pandora.

I also don't own a "smart phone" because I don't need one. I bought my wife one for Christmas but took it back as neither one of us is smart enough.
Grazie wrote on 7/17/2013, 11:47 PM
I use SoundHound.

The apps I have make my iPhone a window through which one can not only gaze at what IS, but what WILL continue in becoming the choice method communication and sharing media. I've even bought a very simple NLE for my iPhone. It came with an extra Audio Track.

Oh, I'm typing this on my iPhone. And there are several VegForum members here who I've SKYPE-ed from my London garden from my iGrazie!



Tech Diver wrote on 7/18/2013, 9:25 AM
My field of research is machine vision, but most of the same principles apply to acoustics as well. There are two basic types of recognition: instance and category. The former is where a specific entity is sought (e.g. the Eifel tower, a blue Toyota Camry, etc.), while the latter is where a class of entities is sought (e.g. cars, people, zebras, etc.). The case of finding all clips containing "Happy Birthday" is category recognition, which is more difficult than instance recognition.

Basically, one has to first quantize the music into some form of representation such as histograms of frequencies, frequency progressions, histograms of volumes, etc. Then one has to train classifiers using techniques such as a Support Vector Machine (SVM), adaptive boosting (adaBoost), randomized forests, etc. A novel sample can then be quantized and subsequently identified using the trained classifiers.

For machine vision, there are plenty of technical papers on these topic describing techniques to go about implementing various approaches (just do a Google Scholar search), and many universities have free software that you can download. I strongly suspect that the same is true for acoustics. Chances are, you will find only the source code and will have to build the application yourself.

Zelkien69 wrote on 7/18/2013, 9:35 AM
Something is wrong with the forum. This post is dated July of 2013 and not July of 2008. Weird.

I kid, I kid.
Laurence wrote on 7/18/2013, 11:11 AM
You guys know that Premier Pro has done this for quite a while right? It has feature where it will transcribe all the words on your video and you can search for any part by the text then go immediately to that part of the video.

This was the main reason I bought version PP version 5. My experience was that while this feature had some use in finding key words, that it really didn't work that well overall. Maybe it has been improved since then. I don't know. I do know that what you're asking for is an advertised feature of this software.
johnmeyer wrote on 7/18/2013, 11:24 AM
You guys know that Premier Pro has done this for quite a while right?No, I didn't know that. That gives me a third product to try out. I've never installed Premier because I have such a morbid aversion to Adobe products because they are so bloated and, to me, unintuitive. However, I'll take a look. Thanks!
rmack350 wrote on 7/18/2013, 12:21 PM
So Premiere pro has a speech to text transcriber and this suggests a way to search audio files for the phrases in the song. Ideally, you could use anything that would do that. as a kludge you could put all your media on a timeline and play it for a speech to text app but this would be slow as it would happen in real time and probably without a timecode reference.

What you really need to do this efficiently is an application that can step through each of your files at much better than real time.

johnmeyer wrote on 7/18/2013, 1:07 PM
What you really need to do this efficiently is an application that can step through each of your files at much better than real time.Great point. With 2TB of mostly DV video (which consumes about 13 GB/hour), the drive contains over 150 hours of video. That would take over six days, if it can only be done in real time. Still, that might be better than trying to scrub through all that material.
Laurence wrote on 7/18/2013, 1:15 PM
I have a friend who uses Premiere Pro in a Pentecostal Church setting. The pastor will ask him to find a bit where he talks about a certain subject and he will use the PP transcription feature to find it. Keep in mind that Pentecostal services can last several hours and this would be quite a task without this feature. I think that is the main reason he is on Premiere Pro.
larry-peter wrote on 7/18/2013, 2:25 PM
In the case of searching for the phrase "Happy Birthday", Premiere would probably do the job pretty well, and its much faster than real time. I believe it allows you to set up a batch process in Media Encoder (that's what it processes the audio with) to do multiple audio clip transcriptions at once.

I attempted to use the transcription feature a few times for closed captioning text as well as a transcription of an industrial interview that a client requested. It's overall accuracy with typical conversation is impressive, but in the industrial interview that included a lot of tech jargon particular to the industry, just provided a great deal of comedy.
Tech Diver wrote on 7/18/2013, 3:27 PM
It is no surprise that it is faster than real-time, as audio data is only a minute fraction of the size of video and therefore much easier to analyze.

ushere wrote on 7/18/2013, 11:42 PM
my (admittedly brief) experiences with ppro audio functions mentioned above was pretty much an abysmal failure. it only seemed to work (sort of) when inputting both a script and matching video - otherwise a waste of time imho.... no hope of analysis in a video clip

btw - here's a link to my original question on adobe forum