1950x Threadripper CPU to Nvidia NVENC Render Benchmark

wilri001 wrote on 10/19/2017, 7:58 PM

Input is single track AVCHD HD1080 60 fps progressive, 28 Mb/sec, 1 minute, unedited file (from Canon G40). (Resample turned off, of course.) The new Magix AVC CODEC was used.

Fury x graphics card didn't matter, since it's AMD and there's no VCE support. Memory and SSD probably don't matter either, but 16GB (2x8GB) and 500GB M.2.

Render time is 47 seconds (0.78 realtime). At worst case (1 processing thread and 1 slice), it was 50 seconds. Processing threads seems to be ignored, but slices made a little difference. CPU was about 80% busy.

With, or without, GPU selected in Preferences, the time is the same.

Comparison to Nvidia NVENC

Same file, but PC is my old one: AMD 8350 CPU (4 core, 8 threads) with Nvidia 750ti (2GB).

WITHOUT GPU selected in Preferences: Quality: 41 (0.68 realtime); Performance: 38 (0.63 realtime).

WITH GPU: 56 seconds (0.93 realtime).

Comparison to AMD 8350 CPU only

124 seconds (2.07 realtime)

So you can spend $2,500 on a new Threadripper PC, or $125 on an old Nvidia card on Ebay, and upgrade to Vegas Pro 15, and render 20% faster than the Threadripper.

Waiting for AMD VCE support!

 

 

Comments

john_dennis wrote on 10/19/2017, 8:47 PM

I'm waiting for some mention of improved RX480 performance before I start a Vegas Pro 15 trial. Minus a new camera, or a need to output different formats, I have no need to sharpen my axe. I need to cut more wood with the axe that I have.

NickHope wrote on 10/19/2017, 9:23 PM

@wilri001 How does the bitrate and quality of the rendered video compare?

wilri001 wrote on 10/19/2017, 10:55 PM

Nick: The software and Nvidia default/quality are just a few bytes different. The Nvidia "performance" file is 2% smaller. My sample video doesn't offer a really good test for quality, but from what I can see, there's no difference in quality. This is set at the default 12Mb/sec VBR. And the input file isn't a professional 4:2:2 codec, and HD1080. If you want to send me a message with a dropbox type link to 4:2:2 4k (or whatever you want), I'd be glad to render both ways and upload the result for you.

bravof wrote on 10/20/2017, 2:23 AM

Am I missing something: you render faster without GPU (41s) than with GPU (56s)?

wilri001 wrote on 10/20/2017, 6:21 AM

Am I missing something: you render faster without GPU (41s) than with GPU (56s)?

Yes. Perhaps it's using the GPU for things other than render, like decoding the input, and that takes away from the resources it has to do the render? I hope Magix fixes this because it's a pain to turn GPU off/on in Preferences when switching between editing and rendering.

wilri001 wrote on 10/20/2017, 6:25 AM

Does anyone have an Intel CPU with QSV that would care to post render time?

OldSmoke wrote on 10/20/2017, 7:04 AM

Input is single track AVCHD HD1080 60 fps progressive, 28 Mb/sec, 1 minute, unedited file (from Canon G40). (Resample turned off, of course.) The new Magix AVC CODEC was used.

In my opinion this is all meaningless. As long as you are not testing the exact same project that everyone else can do tests on, your test will relate to your system and your system only.

All pervious attempts in finding a good system, combination of CPU & GPU, RAM and so on, were done with the SCS Benchmark project, sometimes also referred to as the Red Car project. It might be worth while to dig the old test result Excel sheet out and update it with the new Magix AVC encoder and new hardware. Otherwise we end up with endless test spread all over numerous threads in the forum making them almost impossible to find and draw proper conclusions.

In my own testing with the SCS Benchmark project, comparing NVENC to CUDA and CPU only, I found that the GTX1080Ti with NVENC was on par with GTX580 and CUDA, in some cases even slower. Needless to say I sent the GTX1080Ti back. I also have a Fury X in my system and the combination of the two makes for very fast MCAVC CUDA renders. Quality between NVENC and CUDA is similar but I do admit that NVENC has better quality below 10Mbps but still not as good as frame serving to Handbrake.

Last changed by OldSmoke on 10/20/2017, 7:06 AM, changed a total of 1 times.

Proud owner of Sony Vegas Pro 7, 8, 9, 10, 11, 12 & 13 and now Magix VP15&16.

System Spec.:
Motherboard: ASUS X299 Prime-A

Ram: G.Skill 4x8GB DDR4 2666 XMP

CPU: i7-9800x @ 4.6GHz (custom water cooling system)
GPU: 1x AMD Vega Pro Frontier Edition (water cooled)
Hard drives: System Samsung 970Pro NVME, AV-Projects 1TB (4x Intel P7600 512GB VROC), 4x 2.5" Hotswap bays, 1x 3.5" Hotswap Bay, 1x LG BluRay Burner

PSU: Corsair 1200W
Monitor: 2x Dell Ultrasharp U2713HM (2560x1440)

wilri001 wrote on 10/20/2017, 7:34 AM

OldSmoke: I completely agree that we should all be using the same benchmark. Why Magix isn't interested enough in performance input from their customers enough to provide one is surprising.

So I've just spent the last 10 minutes or so trying to find the SCS/Red Car project you mention, but all links I can find point to non-existent URLs. If you can point me to what I need to run this benchmark, I'd be happy to run it and post my results. And it should be easier to find standard benchmarks in a forum search, yes?

OldSmoke wrote on 10/20/2017, 7:42 AM

It seems that even the link on Sony's website doesn't work anymore. I'll see on with cloud drive I have sufficient space to upload it and I will share the link here. Keep in mind the project and media is rather large 2GB and that is the HD version. We also made our onw 4K version of it and maybe we can update it with some of the plug ins found in VP14 and VP15, note that I don't have VP15 and my trial expired already.

Proud owner of Sony Vegas Pro 7, 8, 9, 10, 11, 12 & 13 and now Magix VP15&16.

System Spec.:
Motherboard: ASUS X299 Prime-A

Ram: G.Skill 4x8GB DDR4 2666 XMP

CPU: i7-9800x @ 4.6GHz (custom water cooling system)
GPU: 1x AMD Vega Pro Frontier Edition (water cooled)
Hard drives: System Samsung 970Pro NVME, AV-Projects 1TB (4x Intel P7600 512GB VROC), 4x 2.5" Hotswap bays, 1x 3.5" Hotswap Bay, 1x LG BluRay Burner

PSU: Corsair 1200W
Monitor: 2x Dell Ultrasharp U2713HM (2560x1440)

Wolfgang S. wrote on 10/20/2017, 8:15 AM

So you can spend $2,500 on a new Threadripper PC, or $125 on an old Nvidia card on Ebay, and upgrade to Vegas Pro 15, and render 20% faster than the Threadripper.

Seems to be very true at the Moment.

But be aware that the treahdripper is not supported in an even appropriate way in Vegas at the moment. The core utilization is poor as shown in that test during the preview - why should it be better during rendering?

We do not need a great render performance only - what we need is a great preview performance too.

 

Desktop: PC AMD 3960X, 24x3,8 Mhz * RTX 3080 Ti (12 GB)* Blackmagic Extreme 4K 12G * QNAP Max8 10 Gb Lan * Resolve Studio 18 * Edius X* Blackmagic Pocket 6K/6K Pro, EVA1, FS7

Laptop: ProArt Studiobook 16 OLED * internal HDR preview * i9 12900H with i-GPU Iris XE * 32 GB Ram) * Geforce RTX 3070 TI 8GB * internal HDR preview on the laptop monitor * Blackmagic Ultrastudio 4K mini

HDR monitor: ProArt Monitor PA32 UCG-K 1600 nits, Atomos Sumo

Others: Edius NX (Canopus NX)-card in an old XP-System. Edius 4.6 and other systems

wilri001 wrote on 10/20/2017, 8:28 AM

So you can spend $2,500 on a new Threadripper PC, or $125 on an old Nvidia card on Ebay, and upgrade to Vegas Pro 15, and render 20% faster than the Threadripper.

Seems to be very true at the Moment.

But be aware that the treahdripper is not supported in an even appropriate way in Vegas at the moment. The core utilization is poor as shown in that test during the preview - why should it be better during rendering?

We do not need a great render performance only - what we need is a great preview performance too.

 

Sorry, Wolfgang, I'm a little confused by your comment about Threadripper. I didn't post any stats for editing. My main editing concern was editing slows down over time, especially noticeable when opening the pan/zoom window. That's still there, but much, much better. I don't use a lot of compositing, but coloration and transitions are smoother. Of course, a lot of that depends on the graphics card, it is still better when I had the same Fury x in the AMD 8350 box.

Kinvermark wrote on 10/20/2017, 8:52 AM

I don't have the hardware to test this and now my v15 trial has expired so hopefully someone can help out...

Question: which formats are supported by each of the three hardware encoding systems (as implemented in Vegas)?

CUDA

QSV

NVENC

NickHope wrote on 10/20/2017, 9:00 AM
In my opinion this is all meaningless. As long as you are not testing the exact same project that everyone else can do tests on, your test will relate to your system and your system only.

All pervious attempts in finding a good system, combination of CPU & GPU, RAM and so on, were done with the SCS Benchmark project, sometimes also referred to as the Red Car project. It might be worth while to dig the old test result Excel sheet out and update it with the new Magix AVC encoder and new hardware. Otherwise we end up with endless test spread all over numerous threads in the forum making them almost impossible to find and draw proper conclusions.

Agree, and this was discussed here. I would like to set this up properly but I am waiting for one or two more VP15 updates, which are likely to bring changes. Support and performance is in a state of flux at the moment, so if we all went off and tested VP15 build 216, the results might soon be irrelevant.

Links to the benchmark download can always be found at the end of the GPU FAQ post. Here is the Sony Vegas Pro 11 (Red Car) Benchmark download.

OldSmoke wrote on 10/20/2017, 10:11 AM

@Nick. Thank you for the link, saves me from uploading it.

Proud owner of Sony Vegas Pro 7, 8, 9, 10, 11, 12 & 13 and now Magix VP15&16.

System Spec.:
Motherboard: ASUS X299 Prime-A

Ram: G.Skill 4x8GB DDR4 2666 XMP

CPU: i7-9800x @ 4.6GHz (custom water cooling system)
GPU: 1x AMD Vega Pro Frontier Edition (water cooled)
Hard drives: System Samsung 970Pro NVME, AV-Projects 1TB (4x Intel P7600 512GB VROC), 4x 2.5" Hotswap bays, 1x 3.5" Hotswap Bay, 1x LG BluRay Burner

PSU: Corsair 1200W
Monitor: 2x Dell Ultrasharp U2713HM (2560x1440)

Wolfgang S. wrote on 10/20/2017, 12:59 PM

 

Sorry, Wolfgang, I'm a little confused by your comment about Threadripper. I didn't post any stats for editing. My main editing concern was editing slows down over time, especially noticeable when opening the pan/zoom window. That's still there, but much, much better. I don't use a lot of compositing, but coloration and transitions are smoother. Of course, a lot of that depends on the graphics card, it is still better when I had the same Fury x in the AMD 8350 box.

I know that this may confuse you - but the point is that the preview capabilities will become more and more important in the future. While we have solutions for rendering, even the best processors we have today in combination with the to best GPUs that we have today shows some limitations. Sure, depending on the footage you edit.

So I think it are the preview capabilities that are more important then the rendering capabilities, or will become more important for a lot of users in the future.

Desktop: PC AMD 3960X, 24x3,8 Mhz * RTX 3080 Ti (12 GB)* Blackmagic Extreme 4K 12G * QNAP Max8 10 Gb Lan * Resolve Studio 18 * Edius X* Blackmagic Pocket 6K/6K Pro, EVA1, FS7

Laptop: ProArt Studiobook 16 OLED * internal HDR preview * i9 12900H with i-GPU Iris XE * 32 GB Ram) * Geforce RTX 3070 TI 8GB * internal HDR preview on the laptop monitor * Blackmagic Ultrastudio 4K mini

HDR monitor: ProArt Monitor PA32 UCG-K 1600 nits, Atomos Sumo

Others: Edius NX (Canopus NX)-card in an old XP-System. Edius 4.6 and other systems

wilri001 wrote on 10/20/2017, 7:41 PM

 

Sorry, Wolfgang, I'm a little confused by your comment about Threadripper. I didn't post any stats for editing. My main editing concern was editing slows down over time, especially noticeable when opening the pan/zoom window. That's still there, but much, much better. I don't use a lot of compositing, but coloration and transitions are smoother. Of course, a lot of that depends on the graphics card, it is still better when I had the same Fury x in the AMD 8350 box.

I know that this may confuse you - but the point is that the preview capabilities will become more and more important in the future. While we have solutions for rendering, even the best processors we have today in combination with the to best GPUs that we have today shows some limitations. Sure, depending on the footage you edit.

So I think it are the preview capabilities that are more important then the rendering capabilities, or will become more important for a lot of users in the future.

Wolgang: Yes, I 100% agree! We can walk away while a render happens, but we have to wait on a sluggish edit or low frame rate in multicam edit. For example, I was hoping to not have to build proxies for 4k when there are several 4k tracks, but even with the Fury x and Threadripper, I still have to. I don't know what is possible, but you are absolutely correct that editing is more important than rendering.

wilri001 wrote on 10/21/2017, 12:01 AM

Regarding edit preview frame rate, a project with 4 HD1080 files (1 is GoPro which is usually slower) displays at 30fps WITHOUT GPU support selected in Preferences. WITH GPU selected the frame rate is 20fps. The graphics card is AMD R9 Fury x, with 1950x Threadripper CPU.

Can anyone explain this?

NickHope wrote on 10/21/2017, 8:40 AM

Regarding edit preview frame rate, a project with 4 HD1080 files (1 is GoPro which is usually slower) displays at 30fps WITHOUT GPU support selected in Preferences. WITH GPU selected the frame rate is 20fps. The graphics card is AMD R9 Fury x, with 1950x Threadripper CPU.

Can anyone explain this?

What format media and what codec is Vegas decoding it with? https://www.vegascreativesoftware.info/us/forum/faq-how-to-post-mediainfo-and-vegas-pro-file-properties--104561/

wilri001 wrote on 10/21/2017, 10:56 AM
General
ID                                       : 0 (0x0)
Complete name                            : E:\CSL\2017\Misc\2017-10-08 Paul Scheele Workshop\aMAIN\00323.MTS
Format                                   : BDAV
Format/Info                              : Blu-ray Video
File size                                : 3.89 GiB
Duration                                 : 20 min 3 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 27.7 Mb/s
Maximum Overall bit rate                 : 28.0 Mb/s

Video
ID                                       : 4113 (0x1011)
Menu ID                                  : 1 (0x1)
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.2
Format settings                          : CABAC / 2 Ref Frames
Format settings, CABAC                   : Yes
Format settings, RefFrames               : 2 frames
Format settings, GOP                     : M=3, N=30
Codec ID                                 : 27
Duration                                 : 20 min 3 s
Bit rate mode                            : Variable
Bit rate                                 : 26.4 Mb/s
Maximum bit rate                         : 26.7 Mb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 59.940 (60000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.212
Stream size                              : 3.70 GiB (95%)

Audio
ID                                       : 4352 (0x1100)
Menu ID                                  : 1 (0x1)
Format                                   : AC-3
Format/Info                              : Audio Coding 3
Format settings, Endianness              : Big
Codec ID                                 : 129
Duration                                 : 20 min 3 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 31.250 FPS (1536 SPF)
Bit depth                                : 16 bits
Compression mode                         : Lossy
Delay relative to video                  : -34 ms
Stream size                              : 36.7 MiB (1%)
Service kind                             : Complete Main
General
ID                                       : 0 (0x0)
Complete name                            : E:\CSL\2017\Misc\2017-10-08 Paul Scheele Workshop\bWIDE\00021.MTS
Format                                   : BDAV
Format/Info                              : Blu-ray Video
File size                                : 1.91 GiB
Duration                                 : 11 min 21 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 24.0 Mb/s
Maximum Overall bit rate                 : 24.0 Mb/s

Video
ID                                       : 4113 (0x1011)
Menu ID                                  : 1 (0x1)
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4
Format settings                          : CABAC / 2 Ref Frames
Format settings, CABAC                   : Yes
Format settings, RefFrames               : 2 frames
Format settings, GOP                     : M=3, N=15
Codec ID                                 : 27
Duration                                 : 11 min 21 s
Bit rate mode                            : Variable
Bit rate                                 : 22.7 Mb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 29.970 (30000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Interlaced
Scan type, store method                  : Separated fields
Scan order                               : Top Field First
Bits/(Pixel*Frame)                       : 0.365
Stream size                              : 1.81 GiB (95%)

Audio
ID                                       : 4352 (0x1100)
Menu ID                                  : 1 (0x1)
Format                                   : AC-3
Format/Info                              : Audio Coding 3
Format settings, Endianness              : Big
Codec ID                                 : 129
Duration                                 : 11 min 21 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 31.250 FPS (1536 SPF)
Bit depth                                : 16 bits
Compression mode                         : Lossy
Delay relative to video                  : -67 ms
Stream size                              : 20.8 MiB (1%)
Service kind                             : Complete Main
General
ID                                       : 0 (0x0)
Complete name                            : E:\CSL\2017\Misc\2017-10-08 Paul Scheele Workshop\cLEFT\00120.MTS
Format                                   : BDAV
Format/Info                              : Blu-ray Video
File size                                : 1.91 GiB
Duration                                 : 11 min 24 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 23.9 Mb/s
Maximum Overall bit rate                 : 24.0 Mb/s

Video
ID                                       : 4113 (0x1011)
Menu ID                                  : 1 (0x1)
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4
Format settings                          : CABAC / 2 Ref Frames
Format settings, CABAC                   : Yes
Format settings, RefFrames               : 2 frames
Format settings, GOP                     : M=3, N=15
Codec ID                                 : 27
Duration                                 : 11 min 24 s
Bit rate mode                            : Variable
Bit rate                                 : 22.7 Mb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 29.970 (30000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Interlaced
Scan type, store method                  : Separated fields
Scan order                               : Top Field First
Bits/(Pixel*Frame)                       : 0.365
Stream size                              : 1.81 GiB (95%)

Audio
ID                                       : 4352 (0x1100)
Menu ID                                  : 1 (0x1)
Format                                   : AC-3
Format/Info                              : Audio Coding 3
Format settings, Endianness              : Big
Codec ID                                 : 129
Duration                                 : 11 min 24 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 31.250 FPS (1536 SPF)
Bit depth                                : 16 bits
Compression mode                         : Lossy
Delay relative to video                  : -67 ms
Stream size                              : 20.9 MiB (1%)
Service kind                             : Complete Main

General
Complete name                            : E:\CSL\2017\Misc\2017-10-08 Paul Scheele Workshop\dGoPro\GOPR0311.MP4
Format                                   : MPEG-4
Format profile                           : Base Media / Version 1
Codec ID                                 : mp41 (mp41)
File size                                : 874 MiB
Duration                                 : 4 min 3 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 30.2 Mb/s
Encoded date                             : UTC 2017-10-08 13:31:23
Tagged date                              : UTC 2017-10-08 13:31:23
AMBA                                     : x

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.1
Format settings                          : CABAC / 1 Ref Frames
Format settings, CABAC                   : Yes
Format settings, RefFrames               : 1 frame
Format settings, GOP                     : M=1, N=15
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 4 min 3 s
Bit rate mode                            : Variable
Bit rate                                 : 30.0 Mb/s
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 29.970 (30000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.483
Stream size                              : 869 MiB (99%)
Title                                    : GoPro AVC
Language                                 : English
Encoded date                             : UTC 2017-10-08 13:31:23
Tagged date                              : UTC 2017-10-08 13:31:23
Color range                              : Full
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709

Audio
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : mp4a-40-2
Duration                                 : 4 min 3 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 kHz
Frame rate                               : 46.875 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 3.71 MiB (0%)
Title                                    : GoPro AAC
Language                                 : English
Encoded date                             : UTC 2017-10-08 13:31:23
Tagged date                              : UTC 2017-10-08 13:31:23

Other #1
ID                                       : 3
Type                                     : Time code
Format                                   : QuickTime TC
Duration                                 : 4 min 3 s
Time code of first frame                 : 13:58:32:17
Time code, striped                       : Yes
Title                                    : GoPro TCD
Language                                 : English
Encoded date                             : UTC 2017-10-08 13:31:23
Tagged date                              : UTC 2017-10-08 13:31:23
Bit rate mode                            : CBR

Other #2
Type                                     : meta
Duration                                 : 4 min 2 s
Bit rate mode                            : VBR

Other #3
Type                                     : meta
mdhd_Duration                            : 243110
Bit rate mode                            : VBR

File properties shows plugin  so4compoundplug.dll being used for all, which you would expect.

So a mix of AVC progressive and interlaced and GoPro.

NickHope wrote on 10/21/2017, 11:25 AM

Regarding edit preview frame rate, a project with 4 HD1080 files (1 is GoPro which is usually slower) displays at 30fps WITHOUT GPU support selected in Preferences. WITH GPU selected the frame rate is 20fps. The graphics card is AMD R9 Fury x, with 1950x Threadripper CPU.

Can anyone explain this?

File properties shows plugin  so4compoundplug.dll being used for all, which you would expect.

So a mix of AVC progressive and interlaced and GoPro.

It may be just teething troubles with so4compoundplug.dll, which is still particularly bad with GoPro footage. Have you tried this? If you go ahead and disable so4compoundplug.dll, so that those files get decoded by compoundplug.dll, I'd be interested in whether the R9 Fury x still gives great results in VP15, as it did in earlier versions.

wilri001 wrote on 10/21/2017, 11:41 AM

Nick, excellent call! It's back to 30fps with GPU selected. I hadn't suspected this since the CPU loop was fixed in build 216, but it seems to still need some work.

Also, now the GPU helps a little bit: 46 seconds with it, and 48 seconds without it. Still, about the same as the 47 seconds with the new dll. This is with 8 slices.