1 4 SF, pipeline, & interested - 32bit float audio and vegas 'sound'

stakeoutstudios wrote on 8/21/2001, 5:44 AM
I think I may have found the answer to your questions about the vegas sound.. but I also think Peter from SF has partially answered this as well. I still have questions about what Peter had to say about the subject though. Here's a bit of an explanation of what happens to digital audio when processed.

If you've got time to read the whole thing, please do... if you want my questions, skim to the bottom.

There are two methods of dramatically improving the quality of digital audio systems: increasing the sampling frequency, and increasing the number of bits used to sample the audio. I won't discuss the first here, as increasing the sampling fequency is a trend that is likely to continue for some time, and there are legitimate reasons to think that even the new higher frequency standard of 96khz is not enough.

What is of interest to us is what happens when you increase the number of bits. Adding more bits has a very predictable effect on sound quality. In very rough and ready terms, you could say that each extra bit effectively doubles the volume range that you can get between the quietest and the loudest sounds possible when the playback volume level is set so that you cannot hear background noise. To double the volume of something, you simply increaseit's level by 6db. This means that for 16-bit audio the best performance you can theoretically achieve is around 96db (16x6db) this is more than good enough for most purposes, but in practice, it's hard to achieve. For 24-bit however, we can get 144db (24x6db) This would seem like more than enough for any purpose, because the human ear can only handle a maximum SPL of about 120db before it will start to rupture internally. You can examine such ear damage under a microscope, and it's not pleasant to see.

However, the above arguments are not practical when we're dealing with signal processing during record production, where we may be working with tracks that have been under-recorded, bady EQ'd or even subjected to extreme signal processing. We need a system that substantially exceeds these boundaries to give us the freedom of expression we need to create great sounding records, and this is where 32-bit digital audio comes in.

8-bit, 16-bit, 24-bit, 32-bit... they all sound like variations on a theme, don't they? But in practice, modern 32-bit processing is normally used in a very different way from the other systems. There are two basic types of digital audio systems: integer-based and floating point. Although these two systems are very different in how they represent numbers, both require two sorts of operations to perform basic mixing funtions. The first is adjusting the level of the sounds, the second, mixing the sounds together. Both of these operations are easy to do, there is no rocket science invloved. To adjust the level of a sound, you simply mulitply the samples by a number called the gain factor, and to mix the sounds together you just add up each individual sample value, much as you would do for a simple shopping list.

For basic operations, computer programs and processors use integer processing (whole numbers in other words) This makes perfect sense becausemost computer operations are just simle counting of individual discrete items, and digital sound converters input and output integer values as well.

Integer based processors can be made very cheaply, and designed to operate extrmely fast - exaxtly what you need for digital audio processing - but there is a problem. When you use integer based processing for arithmetic, you run into a problem called 'loss of precision', and even if you try to simulate decimal point arithmetic, the problem doesn't go away. For example, if you use a simple four funtion calculator to perform multiplication and division (exactly the kind of operations involved in digital sound processing), it isn't too long before you start getting numbers like 24.9999998 when you were expecting a whole number answer.

Floating point processing is not simply 'decimal arithmetic'. With floating point processingm the decimal point can move freely around, and a separate number - called the exponent - is used to track where the decimal point is meant to be.

If you use a scientific calculator instead of a cheap four-funtion one, you will be familiar with this number system already. For example, if you multiply two large numbers together (such as 14352345 x 7453425), a scientific calculator will give you an answer in the form "1.06974127032E+14". What this means is "take the answer 1.06974127032E and move the decimal 14 places to the right. This gives the calculator an enourmous number range - for all practical calculations, it will never show an overflow error - and the same is true for digital audio systems as well. Under normal circumstances, internally, the signal can never be too loud to distort, and it will never be so quiet that there will be noise or distortion when the volume is boosted back up. So as you can see, the move from 24bit to 32-bit audio on destop PCs is not simply the addition of another eight bits to make the number somehow bigger. The whole way the numbers are treated is completely different, and you can't compare the two in a meaningful way!

The most important point to remember is that the signal always maintains 24-bit precision no matter how quiet it gets (within reason, eventually it will hit absolute zero.)

In other words we have an enourmous amount of dynamic range, far more than we need for even the most extreme signal processing.

So, for example, if there's a particularly quiet section in your mix, and you've put the mix trough a digital compressor to boost the level, you won't get any significant added noise or distortion as the compressor goes about it's work.

Of course, the compressor only works if the signal has been kept in 32bit floating point digital audio format throughout the entire chain. If it was previously converted to 24-bit integer by (for example) going through an optical cable in S/PDIF format, or perhaps by being stored as an integer based .wav file (as most .wav files currently are), then the detail in the quiet sections is lost forever.

It's for this reason that the latest digital multitrack software packages allow you to store your file in 32bit floating point format (does Vegas? I don't know!!!)


----------------------------------
QUESTIONS
----------------------------------

I can't find any option in vegas for storing files as 32-bit float, so I figure this may be the reason people like 'Pipeline' and myself are hearing subtle changes of sonic quality. If anyone can enlighten me further, please do.

I also have a feeling that investing in some very good quality A/D converters would have a very significant impact on the overall sonic quality of what I do. At the moment I'm using a Gadgetlabs Wave 8*24.. which is pretty good as it goes (same converters on card as Makie D8B digital mixer!) but I plan to upgrade to the combination of the MOTU 828 firewire interface, coupled up with an Apogee AD8000 to hook up to the digital part of the MOTU.... 16 ins and outs, 8 of which the bets A/D conversion money can buy.... any comments as to whether or not this will be a good system with Vegas (3 hopefully then!)? Will I have sync problems if recording all 16 tracks? thanks for your help SF.. and everyone on this forum...

Jason

Comments

Rednroll wrote on 8/21/2001, 9:13 PM
That's some good to know information. Some of it is kind of confusing. The guy who wrote that lost some credibility in the beginning. When he mentioned the 144dB's of 24bit and 120dB SPL in reference to each other. The dynamic range of a 24bit system is 144dB, The Threshold of pain is 120dB SPL. Although the units are the same on these 2 items, they have 2 entirely different meanings. SPL is Sound Pressure Level. 144dB is the DB difference in dynamics in the audio system. Sound Pressure Level is a atmospheric force measurement. Dynamics range is a measure between 2 points...the quitest point and the loudest point. The 2 things really don't relate on a 1 to 1 basis as he trying to show.

Oh well, that's just technical debate. I wrote a post a while back that explains bitrate and sampling frequency awhile back that might be helpful, when someone asked how they relate to each other. I wrote it in simple terms, so even someone like me can understand it.

Hope this helps:
Take a waveform and lets draw it on an X/Y Graph...imagine your
waveform being on a piece of graph paper. When we take a sample of
something what we do is actually assign it a numerical value the
waveform is closest to on the Y axis at a particular Time (the
X-axis). In this example the X axis represents Time (which will
correspond with your sampling Frequency). The Y-axis corresponds with
Amplitude/Volume(which will correspond with the number of bits.) The
more bits there are on the Y axis the more accurate we can be to
assigning the waveform a number which is close to the waveform at that
particular spot that a "Sample" is taken of the Waveform. Thus the
more bits there are,the more numbers there are to assign to the
waveform making it have better resolution on the Y-axis. 2^24 to be
the exact numbers there are on this Y axis if it's 24 bit and 2^16
numbers if it's 16 bit. The Sampling Frequency Is how many times we
take a "Sample" or snapshot of that waveform in 1 Second. So the
higher the Sampling Frequency, the more numbers we assign to the
X-axis/Second (ie higher resolution, more divisions the X axis gets
divided into in 1 second.) Well I know this might be hard to picture
or understand from my explaination, but if I was able to draw on this
forum, it would be pretty simple for you to understand, If anyone at
SF wants to give a better explaination at it, please do so.
Basically in a nut shell, they have no direct relation to each
other, the X axis is independent of the Y-axis.

Hope this helps,
Brian Franz
PipelineAudio wrote on 8/21/2001, 9:26 PM
great post, but here is where I think is maybe the " sound "

you say

"floating point. Although these two systems are very different in how they represent numbers, both require two sorts of operations to perform basic mixing funtions. The first is adjusting the level of the sounds, the second, mixing the sounds together. Both of these operations are easy to do, there is no rocket science invloved. To adjust the level of a sound, you simply mulitply the samples by a number called the gain factor, and to mix the sounds together you just add up each individual sample value, much as you would do for a simple shopping list."

You speak of mixing by adding the sample values together. I dont think this is how sounds come together in the real world, or even on an analog console. It seems right in theory, but I dont think thats how it actually happens.
I am trying to devise a test to see how sounds are mixed together on a console, tricky, but hopefully someone will think of something.