Prores audio video sync offset by 2 or 3 frames

joost-berk wrote on 3/3/2022, 9:45 AM

Hi there,

Last week we had made recordings with our live setup (blackmagic HyperDeck) and I noticed something. All recordings made in ProRes are showing a sync offset of 2 or 3 frames in 1920x1080 50i. I tested this by making recordings directly to my PC over embedded SDI and those are all in sync. Making recordings in the HyperDeck in DNxHD are in sync in my VP19 timeline.

The same ProRes files in Premiere Pro appear in sync as wel.

So I was wondering if somebody else is experiencing this issue? I tested this with different ProRes variations and on several VP19 machines. Al cases show a sync offset with ProRes on VP19. This feels like there is work to be done on the Intermediate Codec.

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

Comments

VEGASHeman wrote on 3/3/2022, 10:48 AM

@joost-berk Could you upload a sample file for us to check out?

john_dennis wrote on 3/3/2022, 12:12 PM

@joost-berk @VEGASHeman

"This feels like there is work to be done on the Intermediate Codec."

I looked at a test file I use for audio sync issues. Rendering to ProRes 444 XQ at 1920X1080-29.970p in Vegas Pro 19-458 produced a file that is perfectly synced when placed back on the timeline. The output side of the equation doesn't appear to be broken. I'm curious to see the sample 1920x1080 50i file, also.

Render Properties

Sync Results

Results from the machine in my signature.

joost-berk wrote on 3/4/2022, 2:36 AM

Both files where recorded simultanously. the AVI is slightly longer, but just jump to the hand clap parts of both clips.

 

The first link is a Sony YUV AVI file that was recorded in Vegas on a DeckLink SDI feed with embedded audio:

https://drive.google.com/file/d/1SW8GPvv--iWirIX0LYpHXi7so-yW_3g-/view?usp=sharing

 

This second link is the same recording but directly in ProRes in the HyperDeck

https://drive.google.com/file/d/1RCzF4RhbxEnMe59oz-aaIEAqVjhtwR08/view?usp=sharing

 

If you put both in your timeline, and walk through the clips frame by frame, you will notice that the ProRes file is offset by 2 or 3 frames.

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

john_dennis wrote on 3/4/2022, 8:37 PM

@joost-berk

On the machine in my signature, I observe the AVI audio leads the video by one frame, while in the ProRes MOV file, the audio leads the video by three frames.

Vegas Pro 19-458.

Former user wrote on 3/4/2022, 9:54 PM
ProRes MOV file, the audio leads the video by three frames.

Vegas Pro 19-458.

I see the same audio 3 frames advanced in VP18(527)

In Premiere2022 and Resolve audio is 1 frame advanced, so real sync issue looks to be Vegas being 2 frames advanced, 3 frames showing due to audio/video sync of original recording being out by 1 frame

Musicvid wrote on 3/5/2022, 12:02 AM

I am not able to duplicate the reported behavior.

  • Source: Uncompressed AVI, 60.000 fps, PCM Audio Interleaved at .250
  • Source audio was offset by +1 frame before rendering for best average alignment with LED, variance <±100 ms
  • Bottom track is rendered ProRes 422, 60.000 fps
  • Audio track alignment is visually perfect, no double-ticks heard on preview playback

If there is a particular flavor of ProRes I should be trying, I can do so; it's an easy test for me to run.

EricLNZ wrote on 3/5/2022, 2:06 AM

When looking at sync there's another aspect that needs considering. Light travels much faster than sound. Does it matter? Well yes it might. Your camera will instantly see the visual whereas the audio arrives later.

Sound according to one source travels at 343 metres per second. If you are shooting at 60 fps that's 6 metres per frame. So for every 6 metres between the camera and the audio object you can expect a frame difference.

At least that's my mathematics.

joost-berk wrote on 3/5/2022, 2:12 AM

I am not able to duplicate the reported behavior.

  • Source: Uncompressed AVI, 60.000 fps, PCM Audio Interleaved at .250
  • Source audio was offset by +1 frame before rendering for best average alignment with LED, variance <±100 ms
  • Bottom track is rendered ProRes 422, 60.000 fps
  • Audio track alignment is visually perfect, no double-ticks heard on preview playback

If there is a particular flavor of ProRes I should be trying, I can do so; it's an easy test for me to run.

So could it be that the ProRes file you are using is created in Vegas Pro?

I tested this with all flavours of ProRes that are available on the BMD HyperDeck Studio. Proxy/LT/normal/HQ.

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

Musicvid wrote on 3/5/2022, 2:16 AM

Eric, Because of that, our brains are programmed to accept a slight audio delay, but are intolerant of very much audio lead, such as badly dubbed ninja movies. Had a lot of experience with this miking auditoriums.

joost-berk wrote on 3/5/2022, 2:19 AM

When looking at sync there's another aspect that needs considering. Light travels much faster than sound. Does it matter? Well yes it might. Your camera will instantly see the visual whereas the audio arrives later.

Sound according to one source travels at 343 metres per second. If you are shooting at 60 fps that's 6 metres per frame. So for every 6 metres between the camera and the audio object you can expect a frame difference.

At least that's my mathematics.

This is not the case here. Because I tested this in several stages in our studio, with the same source.

- Directly embedded SDI from the camera to SDI YUV AVI recording in Vegas = Sync

- embedded SDI from the camera to ATEM switcher recorded to SDI in Vegas = (almost) Sync (offset is due to processing in the ATEM) but less than one frame

- embedded SDI from the camera to ATEM switcher, recorded on Hyperdeck in ProRes = not sync

- embedded SDI from the camera to ATEM switcher, recorded on Hyperdeck in DNxHD = (almost) Sync

So you see, al options use the same source from our studio. No offset was created in the travelpath from subject to camera. It was only 2 meters.

 

 

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

Musicvid wrote on 3/5/2022, 2:51 AM

So could it be that the ProRes file you are using is created in Vegas Pro?

I tested this with all flavours of ProRes that are available on the BMD HyperDeck Studio. Proxy/LT/normal/HQ.

Yes, as explained, the ProRes file on the second track in my test was created in Vegas from the AVI on the top track.

This was done purposely to either confirm or rule out the possibility that either Vegas' ProRes decoder or encoder is implicated in the audio offset that you report. I think the result of this test model is fairly compelling evidence that they are not, although there may be something in your import method that I'm not able to replicate here

EricLNZ wrote on 3/5/2022, 2:56 AM

This is not the case here.

@joost-berk I wasn't trying to suggest it was the case but it's something that needs to be considered if the distance is sufficient to cause an audio time lag.

 

Former user wrote on 3/5/2022, 5:32 AM

I tried the Shutter Encoder FFMPEG version of Prores to re-encode a file, aligned with the original AVC in both Vegas and another leading NLE. It is same problem, Prores audio slightly out of sync with original AVC audio on Vegas only.

 

joost-berk wrote on 3/5/2022, 9:06 AM

So could it be that the ProRes file you are using is created in Vegas Pro?

I tested this with all flavours of ProRes that are available on the BMD HyperDeck Studio. Proxy/LT/normal/HQ.

Yes, as explained, the ProRes file on the second track in my test was created in Vegas from the AVI on the top track.

This was done purposely to either confirm or rule out the possibility that either Vegas' ProRes decoder or encoder is implicated in the audio offset that you report. I think the result of this test model is fairly compelling evidence that they are not, although there may be something in your import method that I'm not able to replicate here

Thanks for ruling that out, with the VP encoding/decoding. Importing process at my side was just dragging and dropping the recoded files directly to the VP19 timeline. The files where not altered in any way. So, coded in the recorder and straight to Vegas.

On monday I wil test what happens if I re-encode these files with Adobe Media Encoder to Sony MXF. Maybe then the files appear in sync in VP.

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

VEGASHeman wrote on 3/5/2022, 8:25 PM

Thanks to everyone for their input. I analyzed the file in question using our internal parser and found that the file has varying audio and video durations. The video track (and also the movie header ('mvhd'), list the duration at 42800 units, with the time scale being 2500 (so a length of about 17.12 seconds). The audio track is listed at a duration of 43000 units (or 17.2 seconds), and so VEGAS (so4compoundplug, to be precise) adjusts the start offset to make them match up (there is some rounding involved to get it to work on frame boundaries).

D:\code.alt\sony4\main\build\fileio\wbin-x64d>Test_libIPEParser.exe "m:\bugs\DVP-xxxx (ProRes audio out of sync)\4-Sync-ENG door ATEM in Recorder ProRes.mov" -PA
 --- official format: ProRes
 --- official codec: ProRes 422
 --- official bit rate: 0
 --- overall bitrate: 140863017
 --- start timecode: 1, drop-frame:0
Detected 1 audio tracks
    with 16 sub-audio channels
Total audio bitrate: 18432000
channel # 0
 --- start offset: 3840
 --- format / codec: PCM
 --- channel count: 1
 --- bits / sample: 24
 --- samples / pack: 1
 --- bytes / pack: 48
 --- samples / second: 48000
 --- 1st bit offset: 0
 --- bit pitch: 384
 --- sample count: 825597
 --- sample map count: 825597
 --- duration (sec): 17.1199
 --- max bitrate: 18432000
 --- avg bitrate: 18432000
 --- avg entry size: 2201592
 --- max entry samples: 48000
 --- fourcc: ----

If I disable this "start offset" adjustment, the audio will now lead the video by 1 frame, instead of 3 frames as observed by several of you, which is still not perfect, but in line with the AVI file.

The question we now face is about how to resolve this to satisfaction: a preference to disable the start offset adjustment would be one way forward, but I would prefer to "fix" this without any manual intervention, if possible, and also without affecting the current behavior which may be useful for other MP4/MOV files.

joost-berk wrote on 3/6/2022, 2:48 AM

@VEGASHeman That is actually good to hear! What is the best solution to treath this? I am wondering, how does Adobe Premiere handle this? Because there is no adjustment of any kind involved there.

So what happens is, that the audio is actually to short, and therefore Vegas lines up the audio from the beginning of the video-stream? And to get this sync the audio needs to line-up at the end of the video-stream? Is this the case?

For this particulair example it's an easy fix. But what happens if the audio is short by 2 seconds? Does this make the fix an issue?

Ideally it would be the best if Vegas handles these cases just like Premiere does. But I don't know how the do it. If we can not figure out how those guys do this, I would rather see a checkbox in the clip-property window that manages the way Vegas handles the sync offset.

By the way, the 1 frame off set is due to video processing in the video switcher. So we get completely in sync with the 2 frames off set. :-)

Last changed by joost-berk on 3/6/2022, 2:50 AM, changed a total of 1 times.

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

Musicvid wrote on 3/6/2022, 12:10 PM

joost-berk, I dealt with this a lot in the earliest days of video tape hardware capture and production, both with interleaved and non-interleaved output formats.

The variables are Offset, Short and Long Term Drift.

  • All modern video deliverables are non-interleaved, unlike 20+ years ago (Interleaved means 250ms video followed by 250ms audio and repeat).
  • All consumer video equipment is synced with internal clocks, not slaved to line frequency as it was 50 years ago.

So the answer to your question is a resounding, "It depends."

We demand good sync, and don't always get good sync. And it is encumbent on the NLE to thresh out the variations of duration and drift and decide on the best guess without any stable timecode reference in most cases, just start frame, clock frequency, time offset, and audio "frames." Finding the sweet spot was actually easier when we shot interleaved tapes, and produced our multicam productions with SMPTE timecode and Genlock.

Assuming you'd like to play with something to help answer your own questions with your decks and different NLEs, I will upload the sample AVI I shot tonight and post the link in this thread.

The net short-term tolerance of my sample is ~ [+2/-3] frames, meaning the audio originally leads the video by an average of 1 frame, or ~100ms. Note that it is Uncompressed AVI 720p 60.000fps (not 59.940). I'm sure you'll have as many "aha" moments as I did playing with it. The file is 1.25GB, so I won't try to upload it today while I am using my computer.

Hats off to @VEGASHeman for narrowing down the issue with your deck-encoded source.

Musicvid wrote on 3/6/2022, 12:44 PM

Here's one such test I did back when libav was having problems with AVI source. As you can see, our brains have a big problem with >30ms audio lead, but the same audio lag goes unnoticed because of the acoustics pointed out by @EricLNZ

john_dennis wrote on 3/6/2022, 3:16 PM

@VEGASHeman

Warning! Shamelessly stolen from VideoReDo TV Suite!

At the limit, I'd prefer a slider that changes the input to the start offset value.

Full Disclosure: If I ever used the feature in VideoReDo, it was so long ago that I don't remember the outcome.

Musicvid wrote on 3/7/2022, 12:28 AM

@VEGASHeman

Warning! Shamelessly stolen from VideoReDo TV Suite!

At the limit, I'd prefer a slider that changes the input to the start offset value.

Full Disclosure: If I ever used the feature in VideoReDo, it was so long ago that I don't remember the outcome.

It's a godsend when it doesn't autocorrect properly.

 

VEGASHeman wrote on 3/14/2022, 9:16 AM

Thank you, @john_dennis and @Musicvid for your suggestions. I think that makes a lot of sense, and ideally, we should be able to do this on a per file basis, instead of something global. That kind of framework is in the pipeline, but I was thinking of something like the following for now, which would be global (just because it can be implemented much quicker):

A new setting in the File I/O tab for audio sync with video:

1. On (current behavior - if audio lags or leads video, start offset introduced for respective stream)

2. Off (no offset modification)

3. Threshold based (start offsets modified only if it exceeds duration specified by a threshold value, default being 100 ms, but can be specified by a separate preference).

Would this be acceptable as an interim solution?

joost-berk wrote on 3/14/2022, 9:26 AM

@VEGASHeman That would be a good solution. Be able to offset it with a checkbox in file properties is the way to go! Thanks Herman!

Vegas Pro user since version 1.2

OS: Windows 10 Pro (Latest version)

CPU: AMD Ryzen 7 3800X

RAM: 32GB DDR4 3200MHz

GPU: Nvidia GeForce RTX 2080 Super 8GB GDDR (Latest Studio Driver)

Monitoring: Black Magic Design DeckLink SDI 4K (or Nvidia HDMI for 4K HDR)

Audio: M-Audio M-Track Eight ASIO

Controller: Behringer X-Touch

Musicvid wrote on 3/14/2022, 9:40 AM

Yes, it sounds like you've got the bases covered.

Dirty transport streams are, of course, a different matter and need to be sanitized / resynced in VRD anyway before any NLE can touch them. It does a surprisingly good job as long as there aren't too many video frames needing to be dropped.

john_dennis wrote on 3/14/2022, 12:56 PM

@VEGASHeman