Analysis of Rendering to a Target Bit Rate vs Constant Rate Factor

john_dennis wrote on 11/22/2017, 6:38 PM

The practice of rendering to a constant bit rate target or average variable bit rate target makes Vegas users feel warm and fuzzy since, at the end of the render, one might have some idea of the size of the file. Upload sites usually specify a target bit rate not a quality measurement. Less intuitive is the idea of allowing the encoder to apply bits to achieve a target picture quality based on the characteristics of the source video.

I've wondered what the bit rates would be for x264 Constant Rate Factor renders based on a "typical" type of video. I'm not naive enough to think there is a universal "typical" type of video, but I shoot sporting events where there is a lot of standing around, constant camera movement over water to follow the swimmers as well as monochrome titles that require low bit rates.

I picked a "typical" high action 06:30 (MM:SS) project, rendered it thirty-six times and plotted the resulting bit rates against the Constant Rate Factors 1-36. All of the underlying data is in this PDF.

I hope you're half as entertained by this kind of nonsense as I am.

2017-11-23 10:18 PST

Per Nick added difference Video scopes video.

Disclaimer: No ASIC hardware was used or harmed in the making of any of these videos.

Comments

Musicvid wrote on 11/22/2017, 9:04 PM

Now, if we plot this against quality metrics, we are told that optimal quality levels off around RF18, creating a huge repository for useless bits below that.

john_dennis wrote on 11/22/2017, 9:44 PM

As much as anything else, I wanted to see the curve.

NickHope wrote on 11/22/2017, 10:06 PM

Nice work John!

Now, if we plot this against quality metrics, we are told that optimal quality levels off around RF18, creating a huge repository for useless bits below that.

It would be very interesting to put each of those renders on a track above the original, do a "difference" composite against the original for each, then plot the maximum luminance that the histogram reaches, averaged across a few different representative points in the video.

Former user wrote on 11/22/2017, 11:18 PM

Yes a graphical or numerical representation of a loss of quality. Could also be used to show if hardware encoding actually is inferior to software encoding. I can see that it is, but I don't think an individual's perception is very scientific.

phil-d wrote on 11/23/2017, 1:48 AM

Of course the difference depends on the software encoder and the hardware encoder and access to and how various related settings are made.

I'm using a Kaby Lake CPU and using TMPGEnc Video Mastering software I tried with the exact same settings encoding a 4K clip at high and low bit-rates in both 264 and 265 using software encoding X264/X265 then Intel QuickSync. I then took a couple of the exact same frames as uncompressed images, and on Windows using ALT-TAB so you can quickly flick between the two for an A-B comparison and set about comparing them. At decent bit-rates I wasn't sure the ALT-TAB was switching between the images as they were identical, there were no visible differences, so I needed to write on each frame to identify them, indeed they were switching. At lower bit-rates, where both were starting to show signs of struggling, the differences were noticable, but what I saw was a rough bit from the hardware encoding looked okay on the software encoding, but then you could find a rough bit on the software version that was fine on the hardware version, so this was just the encoders making different decisions on where to save bits. Watching the clips made these differences harder to spot. Unless I marked on the frame which encoder was responsible, I couldn't pick them apart or say one was better than the other.

The main difference was the hardware encoder was super fast, and the software encoder was slow and made the PC sound like a jet engine as the fans ramped up to get rid of 70 watts or more of waste heat.

The main issue with hardware encoding is the variations between the hardware used causing quality differences, whereas a software encoder, if the same type and version will produce pretty much identical results with the same clip and setting on any machine, the only difference will be the speed. Although even software encoders may vary on different CPUs depending on how they are optimised, but it is much less likely to show large differences in outcome except perhaps for speed.

So for me, Intel QuickSync gives identical results to one of the better software encoders you can get (X264/X265) in a fraction of the time without burning through electricity, plus I can carry on using my PC whilst encoding as though nothing was happening.

Regards

Phil

Former user wrote on 11/23/2017, 2:04 AM

It may depend on the scene. If you use a shot that taxes a camera sensor, such as very detailed moving object that's under exposed and encode that you are more likely to see a difference between software and hardware. That's where it became obvious with both h264 Nvidia & Quicksync to me. (no h265 hardware) .

But instead of a human trying to perceive the differences surely there's something already that can see the differences between 2 frames. Something in phtotshop etc. Show green or red dots where things aren't the same.

NickHope wrote on 11/23/2017, 2:52 AM

...But instead of a human trying to perceive the differences surely there's something already that can see the differences between 2 frames. Something in phtotshop etc. Show green or red dots where things aren't the same.

There's a good technique of using a "difference" layer blend illustrated in this post.

A method in Vegas is the "difference" composite mode method in PeterDuke's post here. You just look at the size of the dot in the vectorscope (or the length of the histogram, as I described above).

phil-d wrote on 11/23/2017, 4:13 AM

The problem using a computer to compare the difference is that a computer doesn't replicated what our eyes/brain will see, and encoders work on the basis of hiding losses in areas a human eye/brain wouldn't notice, or would find the least objectionable to see lower detail, such as shadow areas. So the links show a good technique for spotting codecs that are suppose to be lossless for not being, or bugs in various colour space conversions, but for lossy encoding it doesn't show the whole picture.

The goal of course of lossy encoding is to remove detail to reduce the file size, that is either by spotting redundant information and only encoding what's changed, and/or by throwing away information and making a good judgement on where information can be lost without it being obvious.

When comparing lossy compression, the proof is in the viewing by the human eye, as it humans watching the video not computers. The encoders are optimised to trick the human visual cortex into thinking all the original information and detail is still present, and not for computer comparisons.

Intel QuickSync in the latest processors is pretty good now and will match the best software encoders, the only exception is when using really constrained bit-rates, below what is sensible, where it will just be a case of all encoders are looking pretty bad and just picking the best of bad bunch.

As for Nvidia, depending on the graphics card and software, the encoding is often simply just software encoding, usually not written very well and tuned for speed, i.e. OpenCL, where this software runs on the GPU cores rather than the CPU. What we need to get the full benefit on Nvidia cards is an encoder API that uses the dedicated hardware based encoder, which should give considerably better results. Intel QuickSync by definition is using the CPU dedicated hardware encoder, but for graphics cards, often it isn't the dedicated hardware being used at all but some form of OpenCL, which isn't the same thing as a true hardware encoder. https://developer.nvidia.com/nvidia-video-codec-sdk

Regards

Phil

Musicvid wrote on 11/23/2017, 9:50 AM

Here's a plot of SSIM vs bitrate in x264/265.

If you visualize flipping and superimposing this over john_dennis' chart, it's easy to see why values below RF18 are considered wasteful.

SSIM is a visual quality metric that takes temporal data into account to simulate human perception.

astar wrote on 11/23/2017, 10:22 AM

"encoding is often simply just software encoding, usually not written very well and tuned for speed, i.e. OpenCL, where this software runs on the GPU cores rather than the CPU. "

What?

Please do not confuse people with a lack of understanding of what OpenCL is, and the difference between an ASIC encoder like quicksync/NVENC/VCE at al. OpenCL is not a render codec that "runs on the GPU." Get it right. OpenCL is a hybrid of compute units on both CPU and GPU, that excels at certain types of math like floating point. If your codec / project settings / scaler needs that type of calculation then OpenCL will be used. I think you are mistaking some crapware Mainconcept encoder that tried to run on GPU using CUDA cores. This was determined to be crap half a decade ago.

ASIC encoders work by dropping certain elements of the MPEG4 encoding format in favor of speed. If your bitrate is high enough this compensates for the missing formats. That is why when the bit rate gets low the image falls apart, when compared to something like handbrake. ASIC encoders are very similar to the ones use on cameras and phones, they are just now being soldered on to GPUs and included in CPUs. Hardware video codecs are nothing new, see AVID 1999 or others like it.

phil-d wrote on 11/23/2017, 11:10 AM

"That is why when the bit rate gets low the image falls apart, when compared to something like handbrake."

This isn't what I'm seeing anymore using Intel QuickSync on Kaby Lake, it holds up very well even at low bit-rates, lower than most would think sensible to try or use for good quality. Compared to X264/265 then QuickSync is hard to tell apart, and get bit-rates up to sensible rates and the only difference is time to encode and the waste heat generated.

Of course hardware codecs are nothing new what is new are the improvements in quality from the latest generation of Intel CPUs and what is on die in the graphics cards, plus of course the software that enables them are making better use of various exposed settings in the SDKs now. When I first compared Intel QuickSync several years ago it was absolutely clear, stick with X264 even if it did take all day and all night as the difference was fairly easy to spot, now that simply isn't the case, at least from what I'm seeing. X264 wasn't without faults then either, I had contact with one of the developers of working on X264 on a few occasions to fix glitches I was seeing that they were able to replicate. Nothing worse than encoding for hours and hours only to find in the first minutes glitches in the footage. 😟

Regards

Phil

Kinvermark wrote on 11/23/2017, 11:18 AM

I am not completely convinced by this move towards hardware encoding - at least for final rendering; timeline acceleration is another matter - seems too inflexible for long term hardware investment.

Also, it seems like QSV support is omitted from Intel's best CPU's, so now you have to compromise one kind of computing power to get another more limited one - albeit at a lower price.

john_dennis wrote on 11/23/2017, 12:17 PM

Nick said,

"It would be very interesting to put each of those renders on a track above the original, do a "difference" composite against the original for each, then plot the maximum luminance that the histogram reaches, averaged across a few different representative points in the video."

After rendering and compiling all the data I'm not up to the tedium of trying to find the peak luminance in 36 6:30 (MM:SS) videos. I did capture the difference compared to the original project for six representative samples of those renders. I synced and added them to a video in the original post.

john_dennis wrote on 11/23/2017, 12:24 PM

Big Note!

I did not use any ASIC hardware in the encoding of any of these videos. It was CPU Only. The intent of the thread is to analyze the results of rendering using the same methodology using different Constant Rate Factor settings.

john_dennis wrote on 11/23/2017, 12:42 PM

Musicvid said:

"Here's a plot of SSIM vs bitrate in x264/265."

Maybe it's just scaling, but it appears that the h.265 curves have a more distinct knee than the h.264 curves. Above the knee one can pick the point of diminishing returns to higher bitrate.

john_dennis wrote on 11/23/2017, 12:46 PM

I'm just "sleeping with the enemy". I want my grandchildren to experience uncompressed video over their holographic visual stimulators.

Musicvid wrote on 11/23/2017, 1:42 PM

Here's the x265 SSIM hand-scaled over your amazing graphic. Lends some sense to the RF18-22 stock advice.

SSIM .98 (mislabeled in the graphic) is actually pretty lousy, while SSIM 1.0 is an uncompressed copy.

Musicvid wrote on 11/23/2017, 2:13 PM

Also, at about RF18, rendered bitrates begin to surpass source bitrates, sometimes dramatically.

john_dennis wrote on 11/23/2017, 2:43 PM

This particular source would intersect at ~ CRF 17.

NickHope wrote on 11/23/2017, 11:21 PM

Nick said,

"It would be very interesting to put each of those renders on a track above the original, do a "difference" composite against the original for each, then plot the maximum luminance that the histogram reaches, averaged across a few different representative points in the video."

After rendering and compiling all the data I'm not up to the tedium of trying to find the peak luminance in 36 6:30 (MM:SS) videos. I did capture the difference compared to the original project for six representative samples of those renders. I synced and added them to a video in the original post.

I didn't mean search for a peak through the whole video, which would be tiresome and not so useful. I just meant sampling the maximum luminance shown on the "difference" histogram at say 4 or 5 points in time, using the same points on each rendered version, then take an average of those values. It's the only empirical metric of quality that I could think of within Vegas. However I suppose it's also flawed, since a few rogue pixels could sway it. And ugly artifacts such as pixellation don't get revealed by it.

Anyway thanks for what you did, which was useful, and indicates to me that "quality" based on "difference" probably changes pretty smoothly through the crf numbers, without significant knees.

However the first 20 seconds was a bit strange, where the CRF24 behaved differently from the others, and was the "best" at 0:17. I guess this was some titling? After 20 secs they all seem to behave as one would expect.

Would be very interesting to some comparison of this type against the best that the Vegas AVC encoders can do.

What happens at CRF 0?

Musicvid wrote on 11/24/2017, 8:53 AM

CRF 0 is lossless High10p profile, which doesn't open in Vegas AFAIK.

CRF1 is just as good.

Musicvid wrote on 11/24/2017, 8:57 AM

Would be very interesting to some comparison of this type against the best that the Vegas AVC encoders can do.

They're all pretty much the same above 12 Mbps.

john_dennis wrote on 11/24/2017, 9:20 PM

Nick said: "What happens at CRF 0?"

The short answer is that I couldn't get it to render.

The longer answer is that I have since gotten it to work, but

1) Returning the file to Vegas shows artifacts that make it useless. I still haven't sorted that.

2) If I wanted lossless, I wouldn't use x.264. I would use a format that allows LPCM audio, also.