CPU-only AI-FX performance

Howard-Vigorita wrote on 3/1/2023, 1:43 PM

Continuing a discussion from b326 bug reports...

@Deathspawner  wrote on 2/27/2023, 4:12 PM

Your computer hardware/OS/Driver specs:

Multiple machines with Windows 11 Pro (22621.1265)
GPU doesn't matter, as this is CPU-only encoding

A description of the problem:

I'm a benchmarker, not a regular user, but I did notice CPU-only AI encode speeds changed their behavior in 326. While Intel saw a dramatic improvement in its encode times (specific to chips with P and E cores), AMD's performance worsened. 

Colorize

(b214) AMD Ryzen 9 7950X: 2m 1s (https://i.imgur.com/nQnlJqX.png)
(b326) AMD Ryzen 9 7950X: 3m 5s (https://i.imgur.com/jcwzPz5.png)

(b214) Intel Core i9-13900K: 9m 40s (https://i.imgur.com/gJuXeSC.png)
(b326) Intel Core i9-13900K: 3m 28s (https://i.imgur.com/4MGXHyY.png)

Style Transfer

(b214) AMD Ryzen 9 7950X: 1m 7s (https://i.imgur.com/Z9RxhuN.png)
(b326) AMD Ryzen 9 7950X: 2m 6s (https://i.imgur.com/zzziBlp.png)

(b214) Intel Core i9-13900K: 6m 36s (https://i.imgur.com/b94Lwum.png)
(b326) Intel Core i9-13900K: 1m 26s (https://i.imgur.com/KdxAx1w.png)

The Colorize output is the same between builds, but Style Transfer changed the resulting output (possibly expected?). 

A description of what effects, if any, are being used in your project:

Colorize and Style Transfer, with GPU disabled in settings.

Your project settings (screenshots work great here):

3840x2160
Field order: None (progressive scan)
Pixel Aspect: 1.0 (Square)
Output Rotation: 0º
Frame Rate: 59.940 (Double NTSC)
Pixel Format: Legacy 8-bit (video levels)
Compositing gamma: 2.222 (Video)
ACES version: 1.2
ACES color space: Default (ACES2065-1)
View transform: Off
Look modification transform: None
Full-res rendering quality: Good
Motion blur type: Gaussian
Deinterlace method: Blend fields
Resample mode: Frame Blend

What source your media came from...

4K/60 AVC captured from a OnePlus 9 Pro smartphone:

Format/Info                              : Advanced Video Codec
Format profile                           : High@L5.2
Format settings                          : CABAC / 1 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 1 frame
Format settings, GOP                     : M=1, N=30
Codec ID                                 : avc1
Bit rate                                 : 120 Mb/s
Width                                    : 3 840 pixels
Height                                   : 2 160 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Variable
Frame rate                               : 59.560 FPS
Minimum frame rate                       : 29.821 FPS
Maximum frame rate                       : 59.682 FPS
Real frame rate                          : 60.000 FPS
Standard                                 : NTSC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.244

Side question...

With b326 adding GPU support for some AI Fx, is it recommended to just use that instead of CPU-only? It does seem to make a considerable difference, but I've only tested one vendor so far.

Comments

Howard-Vigorita wrote on 3/1/2023, 2:21 PM

@fr0sty wrote on 2/27/2023, 8:09 PM

I can't speak for why AMD CPUs got slower, so I'll let that count as a bug report... but Intel getting faster is likely due to its built in GPU accelerating the AI.

Not clear how or if he achieved cpu-only AI operation. Also, near as I can tell, AI load normally only goes to the igpu if it's selected in Video Prefs. The Intel igpus don't seem to be great performers when I force processing to them but they seem to be improved in reliability with this build.

@Deathspawner AI in build 326 now uses the gpu set in video prefs. If the gpu is set to off in Video Prefs, that should do it to force AI processing to the cpu. But doing that will also turn off the gpu for timeline and other fx processing which might overshadow the AI processing. Only way I see to force AI onto the CPU without forcing other load going there too is to force the internal setting for the dnn device to a value of -1 without altering the normal Video Prefs gpu setting. Easier said than done. Btw, build 214 was different. AI did not follow the video prefs gpu but allowed the internal dnn mode to be set manually to a value of 'CPU'. That manual dnn setting doesn't work in build 326 because the dnn mode normally gets reset on startup based on the Video Prefs setting.

btw, you might be interested in contributing to the ai-fx torture test data I'm collecting.

bitman wrote on 3/2/2023, 3:15 AM

@Howard-Vigorita

I will give it a try, just a question, maybe it is me or because it is morning, but "dnn" does not ring a bell with me. What is it?

APPS: VIDEO: VP 365 suite (VP 22 build 194) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K (upgraded my former CPU i9-12900K),
  • Air Cooler: Noctua NH-D15 G2 HBC (September 2024 upgrade from Noctua NH-D15s)
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

RogerS wrote on 3/2/2023, 3:48 AM

It's in internal prefs for controlling how the AI works with GPUs but I'd stay out of there as it's not intuitive nor meant for ordinary users to alter.

Howard-Vigorita wrote on 3/2/2023, 1:04 PM

Dnn stands for Deep Neural Networks. The phrase is used allot with ai libs commonly developed in Python. The Vegas dnn mode and device should be automatically controlled by the gpu selection in Video Prefs. But it can also be thrown off by Windows settings so it pays to at least inspect it to verify what's happening.

RogerS wrote on 3/3/2023, 1:34 AM

The TechGage (Deathspawner) article with test results is here: https://techgage.com/article/cpus-for-creators-amds-zen-4-vs-intels-raptor-lake/2/#BM02

The AI data got cut as "It’s worth noting that with this latest 326 build, MAGIX developers have shifted some AI focus to the GPU, so it could very well be that the GPU would be the recommended device for AI encodes going forward – we just haven’t been able to explore that too much yet."

I don't think that's right- the AI have always been GPU-enabled, just on Intel platforms.

Howard-Vigorita wrote on 3/3/2023, 10:27 AM

All large-scale AI processing everywhere has always been done on gpus. Most commonly on truckloads of Nvidia 40gb A100's that go for about $7.5K a piece... used on ebay.

AI is generally done with a combination of matrix-array arithmetic and math transforms. Video data from an image sensor is natively in array format already. So a gpu whose vram is optimized to process it is a natural for ai. Algorithms that employ math transforms, like convolution, benefit even more since math co-processors are now on-board in high-end gpus. They used to be present is some cpus, like xeons, but that's becoming less common and Vegas has never taken advantage of the one in mine for any purpose.

CPUs and system ram can also do ai processing, just not as quickly. But they're increasing in speed/efficiency too. And generally there's more system ram available that vram. There are differences between the Intel and Ryzen cpu-to-memory designs which would likely impact their relative ai performance. Also, Intel's P/E core-thing might give it an advantage doing floating point when cpu processing is the only way to do it. Investigating those possibilities seems to be the thrust of the original post.

If all things were equal, I wouldn't care anything about cpu ai performance. Would just use super gpus like the big boys. But my gpu resources are limited and Vegas appears to fall over to cpu/ram ai processing when mine get exhausted... happens pretty quickly with Upscale FX which seems to be the most memory-hungry. I think relative cpu performance is relevant and glad it's being looked at. But relevant gpu performance precedes it. And it's looking like one CUDA do better with Navi.

Btw, I'm not at all convinced that an Intel igpu does squat for ai performance as it seems to be the slowest of all the gpus doing ai. Maybe because it uses regular ram for it's operations. I think the biggest fault with the Vegas b214 implementation was that it latched onto an Intel igpu if one was seen and the OpenVino libs were not only poor performers but crashed allot. The Onnx libs are much more stable and perform better with Intel igpus than Intel's own software. Go figure.

I've always favored Intel cpus with igpus but am not as enthusiastic since disabling them in favor of discrete Arc boards... blind loyalty is not my strong suit. Have not tried a 13900k yet. So maybe that'll bring me back into the fold.

Musicvid wrote on 3/3/2023, 10:35 AM

I tested Upscale with GPU vs. CPU. With CPU only, the output was the same with or without the Uplscale plugin on the media, meaning AI is not active without GPU.

Former user wrote on 3/3/2023, 8:20 PM
 

If all things were equal, I wouldn't care anything about cpu ai performance. Would just use super gpus like the big boys. But my gpu resources are limited and Vegas appears to fall over to cpu/ram ai processing when mine get exhausted... happens pretty quickly with Upscale FX which seems to be the most memory-hungry.

@Howard-Vigorita Can you give more detail about this, what causes this switch over, is it that your VRAM is full?

 

Howard-Vigorita wrote on 3/4/2023, 8:14 AM

@Former user Looks like when I use Upscale that vram fills up to around 5.8 of 6 gb on my laptop's 3060 and cpu and ram utilization starts going up. Might just be a reflection of the graphics driver switching over to shared memory which is system ram... don't know how Task Manager accounting on that works. By comparison, Vram rendering Sample Project never hits 2gb. Have not had a chance to observe it on my other machines yet.

Former user wrote on 3/4/2023, 8:35 PM

@Howard-Vigorita This is upscaling a 10minute 720P video to 1440P using the sharp preset, everything is stable, but I see it's using 8gb Vram, so maybe it's a vram thing.

bitman wrote on 3/5/2023, 9:44 AM

I have added my test results (brand new i9-13900K, upgraded my i9-12900K on the same MB). My QSV render (i9-13900K and rtx3090, memory DDR5 64GB 5600) is currently the fasted overall render on Howard's list.

The test result is en par with Lolasassy's result, the only other 13900K (albeit Lolasassy has a newer gen rtx4080).

To my surprise, I would have expected my own NVIDEA render to be faster than my own QSV rendering, but that does not seem to be the case, and it is not due to a lack of GPU memory (an rtx3090 has 24GB).

I have also added mainconcept rendering, which is surprisingly fast.

 

Last changed by bitman on 3/5/2023, 9:46 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 suite (VP 22 build 194) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K (upgraded my former CPU i9-12900K),
  • Air Cooler: Noctua NH-D15 G2 HBC (September 2024 upgrade from Noctua NH-D15s)
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Howard-Vigorita wrote on 3/6/2023, 3:53 PM

Thanks, @bitman ... just re-sorted the chart. Those results are impressive. As is a 4090 time that got posted after yours. The render load is relatively low but sending it somewhere else probably lets the main gpu run a little bit faster before the thermal load throttles it.

Howard-Vigorita wrote on 3/8/2023, 8:58 PM
I think the biggest fault with the Vegas b214 implementation was that it latched onto an Intel igpu if one was seen and the OpenVino libs were not only poor performers but crashed allot. The Onnx libs are much more stable and perform better with Intel igpus than Intel's own software. Go figure.

Was just looking though the vp20 Program folder and I see both Onnx and OpenVino dlls in there! Looks like they're still using OpenVino for Intel cpus and gpus. And probably using Onnx for everything else.

Former user wrote on 5/4/2023, 6:48 AM

@Howard-Vigorita Can you confirm you are still getting GPU AI processing with your Nvidia GPU and latest VP20 402/403?

With my 40 series GPU, it is only doing CPU processing, so most probably Vegas is somehow not compatible with 40 series GPU's 7 months after launch.

Howard-Vigorita wrote on 5/4/2023, 4:14 PM

Sure seems that way based on comparative performance on my laptop with Vegas vp20 v403 which I just landed. This is what I get with the Hiker-video version of the torture test setting dnn to use the cpu... looks like it was heading for over a 10 minute run time before I killed it.

Quite a bit better with Intel IrisXe doing the ai looking like it was totally maxed out:

But best with the Nvidia 3060 doing the ai and not even breaking a sweat:

Seems to be using dedicated vram more than anything which the IrisXe does not have. The ramp-up in vram usage as it nears the end is when Upscale scaling goes above 1.1x and everything slows down. Btw, I set the raw processing to Cuda which might make a difference. I also have the Nvidia Control panel set to use compatible OpenGL. Do not have a newer Nvidia but will give it a go on my 11900k/6900xt/Arc770 in a few minutes.

Howard-Vigorita wrote on 5/4/2023, 5:31 PM

Here's with build 403 on a water cooled 11900k cpu rendering the Hiker:

And with Intel a770 doing the ai... quite a bit faster than the laptop uhd770 but not making the most of the 16gb vram:

The Amd 6900xt not quite as quick as the Nvidia 3060 at it and not using much vram... seems a little slower than with the older version with generated media:

edit: the 6900xt is definitely faster with generated media:

Former user wrote on 5/4/2023, 6:51 PM

@Howard-Vigorita Thanks. Was very interested in the new colorization GPU models that apparently have been improved, but can't even use them. It is interesting to see your GPU activity with the 3060, doesn't look to be doing much and yet it's performance shows it's working well. I don't think GPU activity would show in another engine, pretty certain it's all been consolidated to 3d engine, 70degree's C seems like it's doing a lot of work, but stuck in a laptop that temperature maybe reasonably meaningless.

6900XT has the GPU activity that looks more normal. 36c is rather amazing

 

 

Howard-Vigorita wrote on 5/5/2023, 9:46 AM

The 6900xt has a liquid 360-aio for cooling... probably overkill since it's a pretty power efficient gpu to start with.