Audio event "Classic" stretch attributes

megabit wrote on 1/3/2009, 2:35 AM
When time-stretching my solo classic guitar recording recently, I discovered for the first time that in the stretched audio event's properties, several (A01 through A19) "Stretch Attributes" exist. I am not satisfied with the resultant stretched audio quality because some tones flutter slightly; I guess my problem has probably been I left Vegas decide which one to use while stretching my audio track. But I have no clue which one to choose for a solo guitar event so that it isn't distorted! Do you know the theory behind these attributes?

AMD TR 2990WX CPU | MSI X399 CARBON AC | 64GB RAM@XMP2933  | 2x RTX 2080Ti GPU | 4x 3TB WD Black RAID0 media drive | 3x 1TB NVMe RAID0 cache drive | SSD SATA system drive | AX1600i PSU | Decklink 12G Extreme | Samsung UHD reference monitor (calibrated)

Comments

PeterWright wrote on 1/3/2009, 3:29 AM
No, but a new "Elastique" feature in the latest version of Acid Pro is said to improve "stretched" quality considerably.
megabit wrote on 1/17/2009, 5:30 AM
Thanks Peter, but coming back to Vegas (or video editing in general)... I must say I'm at lost. The stretched audio quality mentioned here is just a part of a more general problem - how to best render a 25p music video as 24p?

We have both concluded with Bob long time ago that as far as the picture quality is considered, it's best to stretch the video track in a 24p project so that it occupies exactly the same absolute number of frames, before rendering it out as 24p. This way, the output will be slowed down of course (some 4%), but the picture quality will not suffer from recompressing each frame otherwise inevitable in order to put 24 frames (instead of the original 25) into each second of the video. So far, it's clear; to get 24p out of 25p, we agree on two points:

- the resultant render will last longer (be slowed down) by some 4%
- the picture quality will not suffer from transcoding

But how about the audio? Of course, to keep things in sync, it needs to be stretched by exactly the same amount as the video. However, the stretching introduces considerable distortions, which are particularly distracting in a music production - Vegas offers as many as 19 algorithm to stretching audio without pitch change, but nowhere have I found any theory on which to use when! I experimented a bit, and so far find the A10 (Solo instruments1) method to work best - at least with my 24bit, 48KHz solo classic guitar sound...

However, I'd like to eliminate experimenting in the future, hence the request for some background for each setting. And I will be transcoding from 25p (acquisition in PAL area) to 24p (progressive DVD / BD in "area-independent" format) quite often, I guess!

Of course, the problem can be eliminated by simply rendering the ready 25p project as 24p (no time-stretching involved). Not only isn't the audio distorted, but the event isn't slowed down either... However, I tried this route as well and definitely find the picture quality to be deteriorated considerably.

Any opinions and suggestions welcome!

PS One could ask "why not shoot 24p in the first place?", and I must say I'm tempted with this option, as well. However, as most music events take place with artificial lighting, I'm afraid of some nasty interference artefacts that might arise from using my EX1 in 24p mode (with shutter off, or more "clasically", with 180 deg shutter) with 50 Hz lighting devices...

AMD TR 2990WX CPU | MSI X399 CARBON AC | 64GB RAM@XMP2933  | 2x RTX 2080Ti GPU | 4x 3TB WD Black RAID0 media drive | 3x 1TB NVMe RAID0 cache drive | SSD SATA system drive | AX1600i PSU | Decklink 12G Extreme | Samsung UHD reference monitor (calibrated)

farss wrote on 1/17/2009, 2:11 PM
Not much direct help but some background as to why there's so many choices in pitch shifting.
Pitch shifting is used to change a note, say C to C#. Sounds a trivial task, you just change the frequencies. However instruments including voice are not so simple. You have a fundamental frequency which defines the note, you have harmonics which have a mathematical relationship to the fundamental and you have resonance which have no relationship to the fundamental. In the case of the guitar the resonance would come from the wooden body. When a guitarist plays a C and then a C# he changes the length of the string, the nature of the resonances in the body remain unaltered by the change in the length of the string.

If you consider all that then you should see why trying to make a C played on a guitar sound like a C# is no simple task at all. Not all frequencies in the spectrum that makes up the note should be handled the same. Various algorithms attempt to solve this problem using different stratergies. Of course there's even more complications due to room resonance and reflections.

That's about as much as I know about the problem.
Where I get lost is this. You're not trying to make a C sound like a C# at all. You're trying to stop a C from becoming a C# due to the 4% increase in speed. Do you even need to do this and if not how to stop it from happening?

I've always been under the impression that a 4% pitch shift is mostly unnoticed by the average listener as everything stays in tune. What further confounds me is that changing frame rates between 24 and 25fps is very, very common when film is telecined in PAL countries. On top of that today a lot of vocals that we hear are pitch shifted thanks to Eventide and other magic to get no talent bums to sound like they can sing.

I think you might get better answers in the audio forum or even in forums outside of here that specialise in audio and it's processing.

One random thought that's just entered my head. Processing digital audio involves interpolation and this is done using oversampling. I seem to recall the original recording of this performance were made at a very high sample rate. You may get better results working with that, the less interpolation being done the better I suspect.
One other hope. I now know someone who works in broadcast who has not only worked as a telecine operator but is also a rabid audio geek of the highest order. He may not immediately know the precise answers to this problem however I'm certain if I unleash him on it I'll get an answer at least 50,000 words long.

Bob.
farss wrote on 1/17/2009, 11:06 PM
Short version of a very long story.

The early telecines simply ran film at 25fps and the pitch went up 4%. Most people accepted this.

Rank Cintel MkIII comes with a Lexicon unit to shift pitch. It is not perfect. My mate has had exactly the same kind of problem with these units. Mostly they're OK however a buggler playing Taps was a mess, the end of notes had holes and warbles in them.
Solution was to bypass the Lexicon and layoff sound and vision to digital betacam and it was passed to an audio guru who captured the audio and processed it through some unknown software. Audio file was then layed back to digibeta and all was well.
Enquiries are being made as to what software was used. My friend indicated that it seemed to take quite some time to crunch the numbers.

So it would seem solo instruments are problematic, take heart that you're not alone having these problems. Hopefully in a week I'll have the solution to your problem.

Bob.
Chienworks wrote on 1/18/2009, 5:29 AM
Two suggestions, neither of which is perfect, but both are extremely simple and may be more useful than trying the audio manipulations.

1) Keep the speed normal and turn off frame resampling. Advantages: The frames will retain all their quality perfectly since one input frame will be used exactly for one output frame. Also, the audio isn't affected at all. The disadvantage is that every 25th frame will be dropped. Video sync will slide forward and backward up to half a frame as the 24 frame cycle goes by, and every 24th frame will skip over one of the original frames. However, a viewer who doesn't know this is happening probably won't notice it. Total effort involved: 3 mouse clicks.

2) Don't pitch-correct the audio at all. Before slowing the video down to 24fps, choose Classic method and check the "Lock to stretch" option. This will slow the audio down in perfect sync by lowering the pitch, much the same as using the pitch control on a tape deck. Advantages: no resampling of the video, no lost frames, no artifacts from correcting the audio pitch so the original quality of the audio is preserved too. Disadvantages: the pitch will drop 4%. This is less than a semitone so a viewer who doesn't know this is happening, doesn't have perfect pitch (or isn't carrying a pitchpipe) probably won't notice it. Total effort involved: 3 mouse clicks.

I'll point out that film transfers to videotape here in NTSC-land use the first method. Watch a VHS copy of a film in step-frame and you'll see that every 4th frame is duplicated so you get sequence of 1 2 3 4 4 5 6 7 8 8 9 10 11 12 12 ... etc. Seems pretty drastic, but watched at normal speed it's completely unnoticeable. I suspect that 25->24 with dropped frames would be even smoother.
farss wrote on 1/19/2009, 12:32 AM
Option 1) would be very problematic. At least some of the content contains close ups of fast rythmic movement and dropping a frame would introduce nasty jumps in the motion. Whilst true that film in NTSC uses something similar (pulldown) it is being rendered to 60i, not 30p. Converting between 24p and 30p is generally considered very problematic.

Option 2) I would have thought would be acceptable and is the way things used to be done. However I was told in no uncertain terms that musicians notice immediately and used to complain about the way old telecines did it.

The solution is a tool called MPEX3 (Minimum Perceived Loss Time Compression/EXpansion) developed between Pyramix and Presoniq. More details in the manual here. Other option is Zplane's Elastique which now ships with ACID however the only comparisons I can find between Elastique and MPEX3 indicate that MPEX3 is better. Like all things one finds on the web that could be drivel and/or out of date, you'd have to do your own research.

However MPEX3 is available under Protools. Simplest solution would be to do what Chienworks has suggested as option 2). Then send the shifted audio file back to the audio engineers who did the mastering and who have the tools to pitch shift back for you.

Bob.
megabit wrote on 1/19/2009, 3:00 AM
Although I appreciate Chienworks' advice very much, I fully agree with Bob that:

- option 1) is not viable with lots of closeups of the guitarist's hands becoming jerky or not quite synchronized with audio
- option 2) could probably be accepted in broadcasting a fragment of the concert, but certainly not as the final music DVD/BD; if I myself can hear the pitch change, the performer would certainly find it unacceptable, and never record with me any more....

AMD TR 2990WX CPU | MSI X399 CARBON AC | 64GB RAM@XMP2933  | 2x RTX 2080Ti GPU | 4x 3TB WD Black RAID0 media drive | 3x 1TB NVMe RAID0 cache drive | SSD SATA system drive | AX1600i PSU | Decklink 12G Extreme | Samsung UHD reference monitor (calibrated)

Chienworks wrote on 1/19/2009, 4:42 AM
I wouldn't discount option 1 without at least trying it. It's not as bad as you think it will be. Keep in mind that already sync drifts back and forth half a frame *every frame*! I suspect that the jerkiness will be nearly unnoticeable, and will be far less of an issue than speed changes, ghosting, or pitch adjustments.

For that matter, someone sitting 7 metres from the stage is already hearing the sound half a frame behind what they see. Even in the back of the "A" seats the delay is probably up to a couple of frames. People are used to such things and their brains adjust remarkably well.
megabit wrote on 1/19/2009, 5:24 AM
"Keep in mind that already sync drifts back and forth half a frame *every frame*!

Yes, but this is a continuous process, and as such can be very easily adapted to.

"someone sitting 7 metres from the stage is already hearing the sound half a frame behind what they see. Even in the back of the "A" seats the delay is probably up to a couple of frames. People are used to such things and their brains adjust remarkably well"

But not when watching close-ups on a 50" screen from 2-3 m distance!

Oh, and of course I did try this option out, as well. But thanks, anyway...

AMD TR 2990WX CPU | MSI X399 CARBON AC | 64GB RAM@XMP2933  | 2x RTX 2080Ti GPU | 4x 3TB WD Black RAID0 media drive | 3x 1TB NVMe RAID0 cache drive | SSD SATA system drive | AX1600i PSU | Decklink 12G Extreme | Samsung UHD reference monitor (calibrated)

farss wrote on 1/19/2009, 6:03 AM
I too did test this, I was more interested in the impact on motion.
With the footage I tried, a mcu of a cello it was not noticeable. However Vimeo uses the same frame dropping conversion for 25p and on pans the 'cogging' effect is very noticebale. I'll test again time permitting with full frame motion to see how it looks.

Given the total cost of this production including BD mastering and licencing I'd think anything required to get the best possible result would be money well spent. I haven't asked anyone with a Pyramix or Protools system what they'd charge but I can't imagine it'd take more than 4 hours for them to do and $500 would seem a reasonable charge. Compared to $10k for BD mastering you might as well get it spot on.

Bob.