CPU render vs VCE

BruceUSA wrote on 12/8/2018, 10:00 AM

There are quite often I hear that GPU render is bad and at time its true. But since then GPU render has improved. I made a short video and want to ask any one of your eagle eyes out there can see tell the different in quality?

https://vimeo.com/305223909

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

Comments

Kinvermark wrote on 12/8/2018, 10:42 AM

I cannot see a quality difference, but my concern is more about reliability of render. In the past I have had problems with GPU renders not properly processing composited titles, etc. My rock-solid solution is to render to MagicYUV and then encode to final using Handbrake.

Any thoughts about how reliable the GPU renders are in Vegas 16?

 

Kinvermark wrote on 12/8/2018, 10:44 AM

Also, zooms and pans of large still images sometimes had quality issues in the past. Now?

BruceUSA wrote on 12/8/2018, 12:10 PM

Also, zooms and pans of large still images sometimes had quality issues in the past. Now?

No probelm here with GPU rendering in VP15/16. I often do slideshow with full size jpeg of my 5D mark III and have no issue with pan and zoom.

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

john_dennis wrote on 12/8/2018, 1:34 PM

I don't trust my eyes to see anything except the most blatant of differences between two videos. To give myself a sense of the difference between two files, I nest the source project on one track of a new project and add the rendered output on a track beneath the nested source project. Setting track compositing mode to "difference" allows me to detect subtle differences by watching the Videoscopes and the Preview.

Comparison of Source to Magic YUV

Comparison of Source to Happy Otter render at CRF 20

Notes:

  1. With Magic YUV there is nothing to see in the scopes or the preview.
  2. With Happy Otter render, one can see differences in the scope but not always in the preview.
  3. Happy Otter uses VCE if the hardware is available.

BruceUSA wrote on 12/8/2018, 3:26 PM

John. Sure you will see a slight different when place it on the scope also known as pixel peeping :) but when watching the two sample video by the human eyes. You won't see the noticeable different. That in my view is good enough for me and am willing to accept that quality with the awesomeness of AMD VCE.

Last changed by BruceUSA on 12/8/2018, 3:27 PM, changed a total of 1 times.

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

fifonik wrote on 12/8/2018, 5:04 PM

It is very hard to compare results with your eyes. You can try to use diff as already suggested, however it is also hard as you have to watch it all frame by frame to spot bad rendered frames.

While investigating Magix AVC quality issues, I've discovered MSU Quality Measurement Tool and it is great for such analysis. The tool builds you quality metric graph (different metrics available). Then you can with one look understand which video have better quality (closer to original) and you can easily spot frames with issues (you can investigate closely with your eyes or diff). You can go to the thread I mentioned and looks at the very last image in it. It is SIMM quality metrics comparison for Magix AVC CPU vs x264 with the very similar bitrates. SIMM mertic gives you difference between two video frames (1 means frames are equal). So one graphs represents how magix avc encoded video differ from original video, another one -- how x264 encoded video differ from original video. From the graphs you can see that x264 have better quality metric across all frames and that metrics fluctuations between frames for x264 if smaller (more consistent quality). You can use my project and scripts with small modifications to compare your preferred encoders/settings.

Back to your original question. I did analyses for Magix AVC NVEnc/QVS/VCE and found that with videos/settings I used (1080-50p, average bitrate about 20 000 000), they gives even better quality than Magix AVC CPU (still not as good as x264). I have not published results as it requires quite a lot of work and I've realised that people in the forum prefer speed over quality anyway.

Last changed by fifonik on 12/8/2018, 5:15 PM, changed a total of 1 times.

Camcorder: Panasonic X1500 + Panasonic X920 + GoPro Hero 11 Black

Desktop: MB: MSI B450M MORTAR TITANIUM, CPU: AMD Ryzen 5700X, RAM: G'Skill 16 GB DDR4@3200, Graphics card: MSI RX6600 8GB, SSD: Samsung 970 Evo+ 1TB (NVMe, OS), Samsung 870 Evo, HDD WD 4TB, HDD Toshiba 4TB, OS: Windows 10 Pro 22H2

NLE: Vegas Pro [Edit] 11, 12, 13, 15, 17, 18, 19

Kinvermark wrote on 12/8/2018, 5:35 PM

Like you, I prefer to sacrifice speed in order to get quality, so Handbrake (ie x264/265) is my preferred finishing method. I figure a few hours spent rendering is nothing compared with 40 hours of editing work - I just go on to something else while it renders. However, when making proxies I like to speed things along so sacrificing quality is OK.

Former user wrote on 12/8/2018, 5:36 PM

It is very hard to compare results with your eyes. You can try to use diff as already suggested, however it is also hard as you have to watch it all frame by frame to spot bad rendered frames.

While investigating Magix AVC quality issues, I've discovered MSU Quality Measurement Tool and it is great for such analysis. The tool builds you quality metric graph (different metrics available).

I downloaded a chart a long time ago from a streaming group/forum, where the quality of encodes was rated for 3500kbit/s and lower (streaming bitrates at that time) VCE (h.264) does exceptionally poorly. You could could take the optimistic view that sure it's crap at low bitrates but may be exceptional if you give it enough bandwidth.

Not sure if that quality number comes out of MSU or something else

Kinvermark wrote on 12/8/2018, 5:43 PM

So x264 is still tops.

Practical Question: What real time savings are there when rendering such low bitrate files (CPU vs ASIC)? Surely any good modern CPU will render these quite quickly anyway?

Former user wrote on 12/8/2018, 7:07 PM

In the live streaming scenario it's purely to take the load off the cpu & reduce latency, if you don't have a really good cpu you may only be able to choose x264 superfast or ultrafast. Nvenc would give superior encoding. X264 faster is the usual compromise (quality vs latency) but with the 20 series cards Nvidia claim Nvenc at 6000kbit/s is equal to x264 fast, a setting that generally is too taxing on the cpu to use.

NickHope wrote on 12/9/2018, 12:48 AM

What version of VCE is "VCE1414"? The quality of VCE rendering from "a long time ago" does not have much relevance to current GPUs. I believe BruceUSA's results are with VCE 4.0 from his Vega GPU.

@BruceUSA Any chance of sharing the original clip and your VCE-rendered clip to eliminate the downscaling and Vimeo's recompression? Could make a nice little benchmark for others to try NVENC, QSV, x264 etc..

Former user wrote on 12/9/2018, 6:00 AM

Some interesting information here ... https://superuser.com/questions/338725/compare-two-video-files-to-find-out-which-has-best-quality

and here ... https://tools.ietf.org/id/draft-ietf-netvc-testing-06.html

BruceUSA wrote on 12/9/2018, 8:58 AM

What version of VCE is "VCE1414"? The quality of VCE rendering from "a long time ago" does not have much relevance to current GPUs. I believe BruceUSA's results are with VCE 4.0 from his Vega GPU.

@BruceUSA Any chance of sharing the original clip and your VCE-rendered clip to eliminate the downscaling and Vimeo's recompression? Could make a nice little benchmark for others to try NVENC, QSV, x264 etc..

Nick. Original clip uploaded. Link below.

https://www.dropbox.com/s/5tdlgyfk3tiul26/P1111109.MP4?dl=0

 

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

BruceUSA wrote on 12/9/2018, 10:10 AM

 

This is a AMD VCE rendered straight up. No color correction or level apply.

 

https://www.dropbox.com/s/wuit9fqx9yml8c2/AMD%20VCE%20%20test.mp4?dl=0

 

Last changed by BruceUSA on 12/9/2018, 10:37 AM, changed a total of 1 times.

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

Kinvermark wrote on 12/9/2018, 2:08 PM

@Former user

Good post. I like this sentence from the Netflix study:

"Subjective testing is the preferable method of testing video codecs"

That gets at the idea that, in the case of presenting content to viewers, perception is reality.

This is a bit of a difficult idea to accept for us technical types who prefer hard facts & numbers.

john_dennis wrote on 12/9/2018, 2:11 PM

I took your "AMD VCE rendered" file and matched the bits/pixel frame within 10% as well as the IDR/P-frame cadence (there are no B-frames in your file) by rendering with Happy Otter Scripts on a machine that has VCE. It required that I use CRF24.5 zerolatency and force the GOP to a fixed number.

Since I can't see the results I measured (by pixel-peeping). Your results are slightly better when viewed on the Videoscope. Graphics to follow later or I'll get a lump of coal for Christmas. 

Disclaimer:

I don't prepare videos for streaming and have no reason to encode at these low bit/pixel-frame numbers.

Former user wrote on 12/9/2018, 3:17 PM

@Former user

Good post. I like this sentence from the Netflix study:

"Subjective testing is the preferable method of testing video codecs"

That gets at the idea that, in the case of presenting content to viewers, perception is reality.

This is a bit of a difficult idea to accept for us technical types who prefer hard facts & numbers.

Yes, agreed. Unfortunately humans being inserted into the mix can require this. Such is life.

Former user wrote on 12/9/2018, 4:18 PM

Ok, this is fun.  If I'm not doing this the right way, please advise.  I put a copy of the 3 original pieces above 3 newly rendered pieces. The rendered pieces are BruceUSA's AMD VCE, My Nvenc piece and my CPU only piece.

I then set the compositing mode to "Difference" on both tracks.

Nothing shows on preview for no. 1 AMD Vce or no. 3 CPU, the histogram for CPU test is only half 6 vs 12 for AMD VCE.

The Nvenc piece no. 2 … The Nvenc histogram is huge and the preview displays stuff, see image upload.

All done on my laptop, I'll do some more tomorrow on my PC.

 This 2nd image was about the worst example using Nvenc, I used a high data rate but it made no difference, get it!

Ok, update.  I checked out if disabling HW Acc. made any difference, none. PC results for Nvenc still different to VCE, CPU and QSV.

The size of the histogram from 0 .. n is given below, scale 0-255.  I found the most extreme point below, the average was less.

VCE = 16 … no preview visible

Nvenc = 207 … preview is visible

QSV = 16 … no preview visible

CPU = 14 … no preview visible

zaheer-abbas wrote on 12/9/2018, 5:09 PM

There are quite often I hear that GPU render is bad and at time its true. But since then GPU render has improved. I made a short video and want to ask any one of your eagle eyes out there can see tell the different in quality?

https://vimeo.com/305223909

Hi Bruce whats amd vce, is it related to threadripper and how much time are you saving with it as compared to cpu only rendering?

 

BruceUSA wrote on 12/9/2018, 5:38 PM

Zaheer abbas.

AMD VCE render 4K project to 4K MP4 5X+ faster then CPU only.

 

JN.

I did a composite test the VCE rendered and CPU rendered file. There is nothing to see on the preview windows. And on the Vectorscope, there is a slight different white area showing up. Other then that. It is pretty to me.

One thing to note is this. CPU render only, The 16 core hit 100% full throttle start to finish.

With AMD VCE. CPU 78%+ and 100% GPU full throttle start to finish.

Last changed by BruceUSA on 12/9/2018, 5:43 PM, changed a total of 1 times.

Intel i7 12700k @5.2Ghz all P Cores, 5.3@ 6 Core, Turbo boost 3 Cores @5.4Ghz. 4.1Ghz All E Cores.                                          

MSI Z690 MPG Edge DDR5 Wifi                                                     

TEAMGROUP T-Force Delta RGB 32GB DDR5 -6200                     

Samsung 980 Pro x4 Nvme .M2 1tb Pcie Gen 4                                     

ASRock RX 6900XT Phantom 16GB                                                        

PSU Eva Supernova G2 1300w                                                     

Black Ice GTX 480mm radiator top mount push/pull                    

MCP35X dual pump w/ dual pump housing.                                

Corsair RGB water block. RGB Fan thru out                           

Phanteks Enthoo full tower

Windows 11 Pro

fifonik wrote on 12/9/2018, 7:16 PM

> Ok, this is fun.  If I'm not doing this the right way, please advise.

Some encoders produces frames shifts. So when doing diff you should check if you are comparing exactly the same frames.

Camcorder: Panasonic X1500 + Panasonic X920 + GoPro Hero 11 Black

Desktop: MB: MSI B450M MORTAR TITANIUM, CPU: AMD Ryzen 5700X, RAM: G'Skill 16 GB DDR4@3200, Graphics card: MSI RX6600 8GB, SSD: Samsung 970 Evo+ 1TB (NVMe, OS), Samsung 870 Evo, HDD WD 4TB, HDD Toshiba 4TB, OS: Windows 10 Pro 22H2

NLE: Vegas Pro [Edit] 11, 12, 13, 15, 17, 18, 19

wwaag wrote on 12/9/2018, 8:37 PM

If anyone is interested in getting "into the weeds" of the VCE encoder, Happy Otter Scripts at http://tools4vegas.com provides easy command line access to pretty much all settings. Here are the video settings:

-u,--quality <string>           set quality preset
                                 balanced(default), fast, slow
   --cqp <int> or               encode in Constant QP, default 22:24:27
         <int>:<int>:<int>      set qp value for i:p:b frame
   --cbr <int>                  set bitrate in CBR mode (kbps)
   --vbr <int>                  set bitrate in VBR mode (kbps)
   --qp-max <int>               set max qp
   --qp-min <int>               set min qp
-b,--bframes <int>              set consecutive b frames (default: 0)
   --(no-)b-pyramid             enable b-pyramid feature
   --b-deltaqp <int>            set qp offset for non-ref b frames
   --bref-deltaqp <int>         set qp offset for ref b frames
   --ref <int>                  set num of reference frames (default: 2)
   --ltr <int>                  set num of long term reference frames (default: 0)
   --max-bitrate <int>          set max bitrate (kbps) (default: 20000)
   --vbv-bufsize <int>          set vbv buffer size (kbps) (default: 20000)
   --slices <int>               set num of slices per frame (default: 1)
   --(no-)skip-frame            enable skip frame feature
   --motion-est                 set motion estimation precision
                                 - full-pel (fast)
                                 - half-pel
                                 - q-pel (best) = default
   --vbaq                       enable VBAQ
   --pre-analysis <string>      set pre-analysis mode
                      H.264: none (default), full (best), half, quarter (fast)
                      HEVC:  none (default), auto
   --gop-len <int>              set length of gop (default: auto)
   --level <string>             set codec level
                                - H.264: auto(default), 1, 1b, 1.1, 1.2, 1.3
                                         2, 2.1, 2.2, 3, 3.1, 3.2, 4, 4.1, 4.2
                                         5, 5.1, 5.2
                                - HEVC:  auto(default), 1, 2, 2.1, 3, 3.1, 4
                                         4.1, 5, 5.1, 5.2, 6, 6.1, 6.2
   --profile <string>           set codec profile
                                - H.264: Baseline(default), Main, High
                                - HEVC:  main(default)
   --tier <string>              set codec tier
                                - HEVC: main(default), high

   --sar <int>:<int>            set Sample Aspect Ratio
   --dar <int>:<int>            set Display Aspect Ratio
   --fullrange                  set yuv is fullrange (H.264 only)

   --crop <int>,<int>,<int>,<int>
                                set crop pixels of left, up, right, bottom.

   --enforce-hrd                enforce hrd compatibility of bitstream
   --filler                     use filler data
   --videoformat <string>       undef, ntsc, component, pal, secam, mac
                                 default: undef
   --colormatrix <string>       undef, auto, bt709, smpte170m, bt470bg
                                smpte240m, YCgCo, fcc, GBR
                                 default: undef
   --colorprim <string>         undef, auto, bt709, smpte170m, bt470m
                                bt470bg, smpte240m, film
                                 default: undef
   --transfer <string>          undef, auto, bt709, smpte170m, bt470m
                                bt470bg, smpte240m, linear, log100, log316
                                 default: undef

AKA the HappyOtter at https://tools4vegas.com/. System 1: Intel i7-8700k with HD 630 graphics plus an Nvidia RTX4070 graphics card. System 2: Intel i7-3770k with HD 4000 graphics plus an AMD RX550 graphics card. System 3: Laptop. Dell Inspiron Plus 16. Intel i7-11800H, Intel Graphics. Current cameras include Panasonic FZ2500, GoPro Hero11 and Hero8 Black plus a myriad of smartPhone, pocket cameras, video cameras and film cameras going back to the original Nikon S.

NickHope wrote on 12/9/2018, 11:53 PM

Here are some results from my rig, which can only do CPU rendering. I concentrated on high-quality settings rather than speed, since that's what interests me more.

I used a similar method to that described in JN_'s comment above. I sampled the length of the histogram at 4 points in the video and took an average. So lower number = less difference from the original.

BruceUSA's VCE render is up there in quality with other methods that are slower and larger file size.

I thought the x264 renders would be superior to the Magix AVC renders of similar size, but they weren't in this test. I'm surprised by how good the 135-100Mbps Magix AVC render is compared to the x264 crf 18 render.

The 2 legacy MainConcept single-pass renders at the bottom were basically corrupted with nasty visible artifacts.

It's important to note that this is only 1 test, and the clip is not very challenging because there is not much movement or noise. A "busy" clip with lots of movement (e.g. water surface) might produce very different results.

Former user wrote on 12/10/2018, 1:07 AM

I took your "AMD VCE rendered" file and matched the bits/pixel frame within 10% as well as the IDR/P-frame cadence (there are no B-frames in your file) by rendering with Happy Otter Scripts on a machine that has VCE. It required that I use CRF24.5 zerolatency and force the GOP to a fixed number.

There are no B-frames (h.264) in VCE 3.4. Only in VCE 2.0,3.0,3.1 (from memory) It's another reason people say vce took a step backwards with VCE 3.4 for h.264 encoding although as people have been discussing here, if you forget the technicalities and just look at the finished product, maybe b-frames don't matter that much