Syncing separately recorded video and sound

ingvarai wrote on 5/3/2009, 3:35 PM
I have tried to synchronize sound and video that I have recorded separately. Both are pure digital recordings (not tape). Audio is recording using Roland Edirol, video using a Panasonic HMC 151.

I am not as successful as I had hoped. Both in preview in Vegas, and when rendering to a WMV file, the sound is out of sync. I have not made a DVD yet.

In the scene, I clap my hand once and then start to speak. When carefully aligning the audio peak with the frame where my hands meet, the "clap" looks fairly natural, but as soon as I start to speak, the sound and video is so much out of sync that it is noticeable to everyone. When sliding the audio track back and forth, it gets better, yes, but it is still so far away from natural that I am starting to wonder what is wrong with my approach to this.

Now, does anyone do anything similar?
I assume that digital recordings ought to be in sync, within a millisecond, like two digital watches, so that if the first few frames are in sync, so will the entire scene be. Is my assumption right? Ok, I could of course try with a "clap" at the end too..

The biggest mystery is why my carefully aligned audio peak and hand clap is so out of sync..

ingvarai

Comments

baysidebas wrote on 5/3/2009, 3:46 PM
I always record sound with the camera, even if I'm going to use an externally recorded audio track. Makes it so much easier to synch by just aligning the audio waveforms. Just don't use the camera's microphone, feed the camera's audio input from the same source you're using for the external recorder.
TGS wrote on 5/3/2009, 4:07 PM
I can't give you an exact answer.
For the most part, most of my video and audio recordings will stay in sync, but for some reason if I record using a firewire mixer and then try to match the video, the audio will start to drift and keep getting worse .
My options here are, line up the audio on each track as close as possible at the beginning of the video (cut off any excess Edirol if it starts earlier), the go the the end of your Edirol track , hover mouse over the end frame of that track and a Square icon will appear. Keep hovering and hit Ctrl and a squiggly line will appear under the square. Now you can left click and drag that edge to match the audio of the video. (That will literally speed up or slow down the recording)

If you zoom in on the timeline, you can actually start seeing patterns in the audio wave and you will eventually see how you can match them up, from sight alone. Sometimes you have to find that right level of "Zoom" to see them. When you get close to matching, Zoom in real close to fine tune it.

If your audio from both sources does not start in sync at the beginning at the same point, then you will have to juggle each end of the Edirol recording until if finally matches on both ends. (sync each track at the beginning and cut off any excess of the longer track, so the new "beginning frame" of the video and Edirol now start at the same place, this way you only have to match the end.)

The worse option, would be to stretch or shrink the video track to match the Edirol.

This is a good trick to know, because at any time, you may get a recording that isn't quite in sync with another source and this will fix it.
John_Cline wrote on 5/3/2009, 4:10 PM
No, digital sources are not guaranteed to be in sync. While crystal clocks are very stable, they don't necessarily operate at the exact same same frequency from device to device. The only thing that's guaranteed is that the files will eventually drift unless they have been "genlocked" and running off the same master clock.

You need to align the waveform at the start of the file and grab the end of the audio track holding down the CTRL key to align the audio with the video at the end of the file, too. This will stretch/shrink the audio file to match the length of the video.
ingvarai wrote on 5/3/2009, 4:23 PM
I hadn't believed to get so many good answers, so soon. Thanks to all of you! Baysidebas put me on the right track. Fortunately, the internal mike was on, so my video clip also has the audio that has to match the video. By amplifying the "original" audio in Sound Forge, I was able to see the waveform (it was very weak).

To my pleasure and joy, I can confirm that both the Edirol digital recording and the internal recording match each other exactly throughout an 8 minutes long scene. This is what I had hoped for, and now I have confirmed it.

In hindsight, my question now seems a bit silly, I did not go for the obvious and easy way, I tried to make a simple thing complicated. I will now record sound separately if I can. And thanks a lot for the advice on how to stretch video and sound! Vegas is great.

ingvarai
musicvid10 wrote on 5/3/2009, 6:13 PM
Your question is certainly not silly.
It is an issue that is faced by anyone looking to make a second recording with a reasonably priced portable device, none of which yet have support for timecode sync and genlock.

As it is, without a wire to synchronize the two device clocks, there is nothing to keep them together. Think of it this way: If you set two small boats adrift on calm seas at exactly the same place and time, would they arrive at the opposite shore at exactly the same place and time? Of course not -- there is nothing to keep them together. Now tie the two boats together with a 5m rope and repeat the experiment. Are their chances better? That's the logic behind synchronization. Remember that even the most precise clocks in the world, the atomic time clocks would drift apart slowly if not kept together with a signal to chase.

There are two approaches:
1) The first, as John previously described, is better for short segments, say 20 minutes or less. You can sync the waveforms at the beginning, then stretch the audio event at the end, repeating the process and shifting the audio ever so slightly until it is lined up throughout.

2) The second method, which I use on longer events, usually lasting two hours or more is to break the audio into 10-minute chunks, and align the waveforms of each chunk as precisely as possible. With my equipment, this leaves little gaps between chunks that are usually not noticeable.

NOTE that either method requires much more precise alignment if you are going to mix the external audio with the on-camera track, such as for rear-channel surround or stereo ambiance.

If you are just going to replace the on-camera audio with the external track, it is much less critical, and all you often need to do is align the waveforms somewhere near the middle of the video and mute the on-camera track. Even if the audio drifts by 1/4 frame or so, it is rarely noticeable.

HTH
ingvarai wrote on 5/4/2009, 2:32 AM
>Your question is certainly not silly

Thanks, in any case it is an interesting question!
My concern about digital media not being accurate enough is in any case gone. At least with the kind of equipment I currently have, provided not something drastic happens when the ambient temperature changes, battery power changes etc.
I made a test, I recorded 20 minutes of video and sound. After 20 minutes, the Edirol audio is about 30 milliseconds late, compared with the video audio. This is less than one frame (25 fps). It corresponds to two persons speaking, one close up, one 10 meters away (32 feet). For my purpose, this difference of 30 milliseconds after 20 minutes is not noticeable at all, since my scenes typically last 10-40 seconds. After 20 minutes, mixing the two channels, I do hear an echo, but only if I concentrate on finding an echo.

>If you are just going to replace the on-camera audio with the external track, it is much less critical
Exactly, this is what I intend to do.

To sum it up - using the video audio as a reference, I am able to my record with my digital camera, which records to a memory card, combined with recording sound using a separate digital device. I then put it all on the Vegas the time line.

ingvarai
VideJoe wrote on 5/4/2009, 3:16 AM
That is exactly what I am going to do soon, recording classical concerts.
I will use my Sony Z7 to record video of course, but will feed the audio coming from the Focusrite TwinTrack Pro XLR outputs. The TwinTrack Pro also has a SP/DIF interface that will be hooked up to a Marantz PMD661 audio recorder.

In post I will join and align the two audio tracks and mute the audio that was recorded with the Z7.
Of course that has been the plan so far, but I am glad to find out similar field trips worked out quite well.

F.y.i.
I considered using a laptop to record the sound separately, but decided on a handy but professional portable audio recorder.
Video/audio from my Z7 will be stored on a 32 Gb CF card, the PMD661 audio on a 8 Gb SDHC card.


farss wrote on 5/4/2009, 3:56 AM
Your original question is interesting.
You have a visual cue and a matching audio cue, you line them up and yet they still appear out of sync. The trick in part is to realise that sound slightly lags vision, always. Also it can be tricky to see exactly what made the sound, even with something as obvious as a clapper. You'll most likely find that the actual event (the two pieces of wood hitting) can be a frame later than when it appears to touch.

Some tips if you're trying to align audio from the waveforms:

1) Turn off all AGC in all recroding device. Fail to observe this and you are going to make everything else very, very difficult.

2) You can magnify the waveforms in Vegas by holding down the shift key and using the up/down arrow keys. If one waveform is from a much louder signal you can reduce it by pulling down the fade envelope, remember to put it back to 0dB before rendering. The aim is to get the two waveforms roughly the same height so that visually matching them is easier.

3) It's very easy to get one recording with the phase inverted. This can happen because the mics are some distance apart or because of how they've been wired. Again Vegas to the rescue, invert one track to get them in phase. I found that my little shotgun on my EX1 is out of phase with the inbuilt mics, yish.

That's about it, good luck. I've done heaps of concerts with double head recording. I'm VERY lucky at the moment as MY EX1 and MY Edirol R-4 hold sync within 1 frame per hour. Your mileage will almost certainly vary but it's pretty easy to fix in Vegas.

Bob.
ingvarai wrote on 5/4/2009, 4:32 AM
>The trick in part is to realise that sound slightly lags vision, always
In real life, it will, because it travels so much slower than light. I wonder if this is different when watching moving pictures, if the compensation built into our brains is fooled. I am sensible to this, and often have the feeling that sound is slightly out of sync, on TV, and watching movies.
Clapping hands, I thought this was a precise way, it is not that precise after all. When watching the video clip alone with sound, the audio clapping spike occurs almost two frames later than the frame where the hands meet. A real film clapper is probably better.

>Turn off all AGC in all recroding device
No prob. I hate AGC, especially those noise pumps.

>I'm VERY lucky at the moment as MY EX1 and MY Edirol R-4 hold sync within 1 frame per hour
You have the top notch equipment here.. And the sync you achieve is amazing.
I use an Edirol R-09HR, and the sound is outstanding with the built in mics, but I use separate mics.

ingvarai

farss wrote on 5/4/2009, 4:49 AM
"You have the top notch equipment here.. And the sync you achieve is amazing."

Same audio recorder, different camera, different story. As John points out above, there's nothing to force the clocks to run ins sync so it's purely luck of the draw.

Bob.
richard-amirault wrote on 5/4/2009, 6:19 AM
I don't have to add much to this discussion .. other than I normally shoot 50 to 60 min. continuous events with dual system sound. I leave the camcorder mic on and record the sound with an Edirol R-09. Usually with a pair of Crown Sound Grabber mics.

I don't use any sort of "clapper" to syc. I just expand the timeline and match waveforms ... then listen and watch to see if it's good enough.

I do check the sync thru the video. Occasionaly I have lost sync by loosing frames during the capture/transfer from the camcorder to the computer. A bit of cutting and overlap on the audio sets it right back to where it should be.
musicvid10 wrote on 5/4/2009, 8:30 AM
"After 20 minutes, the Edirol audio is about 30 milliseconds late,"

That's a fairly typical result, and really not a function of the quality of your equipment or the price paid.

As you rightly pointed out, that is a small drift (1.67^-3% or .00167%) and lining it up at one point should work just fine since you are replacing the on-camera audio. I wouldn't even attempt to stretch the external audio in that case since that process introduces its own artifacts and q noise.

It is only in longer (30 min+) shoots or when I am mixing the two tracks that I like to employ one of the methods in my previous post, often the second as for the same reason as in the previous paragraph.
musicvid10 wrote on 5/4/2009, 9:10 AM
""I wonder if this is different when watching moving pictures, if the compensation built into our brains is fooled."

You mean when the actors are miked close with boom mics but the camera is back 20 ft. or so. No, they just delay the audio track to compensate, maybe 15 ms.

As Bob pointed out, putting the audio dead on with a visual frame is the worst thing to do. Always put it downstream just a bit.

"I am sensible to this, and often have the feeling that sound is slightly out of sync, on TV, and watching movies."

So many variables here, the biggest being DVD players and ATSC tuners have no precise way of matching the audio exactly with the video. It's each manufacturer's "best guess" as to where the sweet spot is and all you have to do is listen to two digital TV tuners simultaneously to see how big the differences are. It is also a challenge to us as home DVD producers. Even in a big movie theater, the delays are not always set precisely enough to keep everyone from noticing a slight sync issue.
UlfLaursen wrote on 5/4/2009, 10:48 AM
So many variables here

I agree with you. I have worked quite some with syncing separate audio to video, mostly music, and it can be a challenge. Most of what I have done has been separate songs though, 3-5 min. each, and that has worked out pretty well most of the time.

One of the worst things about when you have done quite a lot of this stuff, is that you almost imidiately notice when there is something real wrong on TV, and the wife just says "what do you mean out of sync" :-)

/Ulf