RAID experiment - For the Experts

Pachanga schrieb am 05.02.2008 um 23:07 Uhr

It is a long one, but stay with me.

After a lot of number crunching I concluded that I did not need the performance of RAID 0. BUT, after a few test runs on my new system I am not so sure.
HP XW8600, 1 x Xeon 3.0 Quad (with open socket), 2 GB RAM, 7200.11 SATA drives : 250 O/S, 750 Capture/Projects, 750 Renders, 1 TB backup.

Since I only work with compressed video (DV or m2t), tape to PC transfers are no brainers (my HP 41CV calc can do it). For renders I need to read and write as fast as 4 cores (8 later) can chew the data.

The more complex the render, the more work the processors have to do, the lesser the demand on the drives (the CPUs choke).

I ran John Cline's rendertest at 1:51 sec. with all 4 cores at 98%. Both the read and write disks barely registered above 4 GB/s. So I made the right decision in not striping. Did I?

I made a validation test for the drives using large file transfers and all 7200.11 drives showed that they can sustain 100 MB/s and spiked to 108 MB/s (Their spec is 105 MB/s)

BUT ... When I ran simpler projects in DV, rendering to MainConcept MPEG2 using standard DVD template something interesting showed up.

The 4 cores only run at 70%, meaning they can do more, so I suspected the drives were not reading or writing fast enough. Oh no, I needed the RAID 0 after all. But did I?

Further testing showed that the "read" drive only would work at 11 MB/s while the "write" drive only worked at 4 MB/s. That is a lot less than the drive's capacity of 100 MB/s I had just tested.

I setup a series of tests. 1) with separate drives for read and write. 2) "read" and "write" in a single striped volume. 3) "read" on the striped disk. 4) "write" on the striped disk. You still with me?

INTERESTING RESULTS: With "read" and "write" from a single RAID 0 was slower than separate non-striped disks (R =7 MB/s, W=4 MB/s), all other tests had identical results wheather or not a striped drive was used (R=11 MB/s, W=4 MB/s). By-the-way, Vegas only uses 300 MB of memory on all tests.

THE MYSTERY: If the drives can work 10x faster than being used, and the 4 cores are only working at 75%, WHAT IS THE HOLD UP?
Where is the bottleneck.

Kommentare

farss schrieb am 05.02.2008 um 23:18 Uhr

Don't have the answer but seen exactly this with Vegas going back a couple of versions on my now rather dated dual Xeon system. I'm using a Highpoint hardware RAID controller, going from the RAID 0 array to a non RAID drive and the 4 cores don't go over 70%. Strange.

Doesn't really bother me that much as I batch encode overnight and it's still very fast.

Bob.

Pachanga schrieb am 05.02.2008 um 23:24 Uhr

Don't get me wrong, my previous system was a single core Xeon 3.06, so I am very happy with the great performance and absolutely no deployment problems or crashes, etc.

BUT, I would like to know, In the relentless pursue of TRUTH :-).

Kennymusicman schrieb am 05.02.2008 um 23:59 Uhr

Different codecs have different optimisations and will run at different CPU utilisation.

As for Raid and performance - when rendering HD I find my system does a render to RAM as a buffer, and then a block write to HDD, and repeat as necessary. So the render test for example was all RAM, and then a quick HDD write on completion.

marks27 schrieb am 06.02.2008 um 00:03 Uhr

I do not speak with authority, but my hunch would be something along the lines of the following:

1. The RAID does not have a major impact here as, once you seperate source and data destinations, the process is not primarily disk I/O bound.

2. The 4 cores are running slower in DV->MPEG2 because (as I understand it), with MPEG2 being a compressed format, there is a certain dependency of any given frames on the frames before, or at least back to the last full frame. Of course, I don't know the actual algorithm being used, so I am just guessing. I would not be surprised to find that Windows is context switching and page faulting all about the place, but again that is only a guess.

The HD test would drive the CPU harder because of the much higher volume of data, and dealing with the associated compression.

It would be interesting to see what your figures were if you were to render a DV project out to DV (no compression) so that there would probably be lower CPU processing but higher disk I/O volumes. Also, is Vegas the only process taking cycles, or is something else getting in there too (anti-virus, etc). Could also be possible memory contention.

Interesting work. Good on you.

marks

Pachanga schrieb am 06.02.2008 um 00:44 Uhr

Interesting notes. Memory is supect because no matter what test I ran, Vegas only used 300 MB (plus page files), which gives an idea.
If Vegas renders to RAM, would performance be connected to the page file? If the page file is on a slow disk (system disk like I have) may be that is the bottleneck.
I will try the render DV to DV and see what happens.

farss schrieb am 06.02.2008 um 01:50 Uhr

Keep in mind that even with a hardware RAID controller the driver will still be using CPU cycles. I suspect that might not be counted in the idle time. There are some rather expensive RAID controllers that run with their own OS etc which could change that.

Bob.

4eyes schrieb am 06.02.2008 um 04:26 Uhr

THE MYSTERY: If the drives can work 10x faster than being used, and the 4 cores are only working at 75%, WHAT IS THE HOLD UP?I'm confused by your post, if your rendering from DV to SD and the data write rate is 4MB/s then that's 240Megabytes per minute. Since the average SD file encoded at 8000kbs or 1MB/s is approx 60MB/per minute it tells me your rendering approx 4X realtime with the Xeon quad.
So the 4Mbs would be correct, if you could render at 8X realtime it would be 8MB/s or 480MB/per minute.
Sounds correct to me, maybe I misread the post.

Pachanga schrieb am 06.02.2008 um 15:09 Uhr

You are correct. The rendering experienced was about 4x realtime, whichis why ALL writes (with and without RAID) where only 4 MB/s (8 Mb/s / 8 b/B - 1 MB/s x 4 = 4 MB/s write).
So we can conclude that RAID 0 is hardly needed for the "write" drive.

Now to the bottleneck. If the drives are barely breaking sweat, what is holding the 4 cores at only 70% ?

I ran another test and rendered the same project to SD DV (very little encoding) and the "read" drive rate when up to about 40 MB/s, but the cores where still well below 70%.

Now to the bottleneck. If the drives are barely breaking sweat, what is holding the 4 cores at only 70% ?

rmack350 schrieb am 06.02.2008 um 17:07 Uhr

You can conclude that RAID 0 is hardly needed if all you ever plan to do is read/write highly compressed media. You would see a difference if you want to use lightly compressed intermediates, or uncompressed media, or lots of media all at once.

As to the bottleneck...good luck getting to the bottom of that. I've seen other programs completely gum up a system even while they were using very little of the CPU. I don't know why.

If I were to guess, I'd probably blame it on the overhead of trying to divide this amongst 4 cores. Also, it seems like audio renders as a separate thread and it's a comparatively easy job for the CPU. Maybe you're getting a lower average CPU load because the audio thread is also getting divided up amongst the cores.

Or maybe (and all this is wild speculation) the process of Render/Wait Your turn to Write/Verify/Repeat is slowing things down.

If it's significantly faster than two cores then you're still doing well.

Rob Mack

4eyes schrieb am 06.02.2008 um 17:54 Uhr

On my Q6600 it's also approx 70% when going DV -> SD.

If I add an FX effect, like sharpen = .25 usage jumps to approx 98%.
Under Vista monitoring and customized report screen all 4 cpu's are right up there.

I think you'll find you can still damn near render at 4X even with effects.
Pretty damn fast machine.

Pachanga schrieb am 06.02.2008 um 19:28 Uhr

Rob,
I would agree with you except that when you have an intensive render, then you get the 4 cores up to 98%, so the overhead of splitting the work does not show.

As to working with uncompressed work, JUST DURING RENDERING, wouldn't the CPUs max out and keep the drives waiting ?

Pachanga schrieb am 06.02.2008 um 19:29 Uhr

farss schrieb am 06.02.2008 um 19:51 Uhr

Uncompressed is less load on the CPUs and more on the HDDs.

The answer probably lies in the MC encoder, it's not multithreaded.
I saw a similar result in the test results for a 8 core system. The DivX encoding test ran no faster than on a 4 core system of the same CPU clock speed. I think it was a vector rasterising test though that did run almost exactly 2x faster with 8 cores than 4.

Bob.

MH_Stevens schrieb am 06.02.2008 um 19:55 Uhr

Also there is a trade-off in that the RAID system draws processing capacity from the CPU. To alleviate this if you go RAID do not use a software raid but have a hardware RAID card to control the process.

rmack350 schrieb am 06.02.2008 um 22:19 Uhr

As Bob said, Uncompressed puts less load on the CPU and more on the drives.

I honestly don't know why a render would just have the CPU running about 70%. You're probably right that it isn't related to disk I/O in any way.

You know, you might have a render going at 4x real time yet only using 70% of the CPU. A more intense render might be running the CPU flat out and yet only be rendering at 3x real time. If that were the case then I'd think the CPU is rendering as fast as it can ask for frames to render.

I'm not sure I'd obsess about this too much.

Bob, this review of Intel's Skulltrail motherboard might be what you were thinking about. It's an 8-core board.

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3216

Rob Mack

Pachanga schrieb am 06.02.2008 um 23:25 Uhr

I agree on the sofware RAID. I tested both and there was a substantial difference. My M/B like most new ones offer built in RAID chips which relieves the CPU.

Rob mine is an 8-core board (with 4 now) and i'll be cehcking your link, thanks. I guess we have to wait to Vegas 64 and Vista 64 for using the 8 cores.

What about RAM though? Are you guys experiencing Vegas only using 300 MB? John Cline wrote that 1.5 MB for rendertest was about right.

rmack350 schrieb am 07.02.2008 um 00:54 Uhr

Depends on the project, the Preview RAM settings, and probably the render format.

I recently dropped about 2200 random stills on the timeline to see if I could break Vegas. Even with crossfades and after allowing it to play a bit I was only seeing about 300MB used, and another 300MB paged. Preview RAM was set to about 64 MB.

I wouldn't say that an onboard RAID chip necessarily takes the load off a CPU, but if you're using a 2-socket board it's a little more likely to have a reasonably decent third party controller chip.

The results of the tests of the Skulltrail board said that many applications, especially things like codecs, aren't built to use more than 4 cores. So it wouldn't be just Vegas that was a bottleneck with 8 cores.

Rob Mack

RBartlett schrieb am 07.02.2008 um 22:17 Uhr

Taking the rendering pipeline considerations out of the picture for a minute.

There are (IMO) 3 main types of RAID-0 striping controller in today's PC.

1. Firmware RAID
2. Windows OS based software RAID
3. CPU offload RAID.

While Highpoint, RaidCore, AMD and Intel Matrix Storage technologies are sympathetic to the resource isolation required to hold up your RAID (even at the POST/BIOS level). The performance of types 1 and 2 comes down to the software elements.

RAID-0 isn't quite as sympathetic to seek requirements as some implementations of RAID-10 or RAID-5 or 6 might be. So many RAID solutions are built for the enterprise and small file and database silo seeks. These are quite different apps in what they demand than what we have here with our video workstation requirements.

Hopefully SSD and HDD+SSD solutions will be continue to be developed and made more economic. Until then, don't assume too much. One vendor's RAID can be quite different to the next. Read reviews by all means, but look for constant duty cycle and random access performance along with low CPU overheads.

Ironically the 2nd type I mentioned, Windows OS striping, is actually one of the best performers across both enterprise and creative content workstation workloads. Nothing knows how to resequence the reads and writes of a 'session' quite as well as the OS does. The engine is under the hood/bonnet rather than banged straight onto the wheels - if you see my point? RAID-0 and (where OS version/model permits) RAID-5 perform in quite a scalable fashion.

True CPU-offload RAID (ie going beyond just the parity generation for truly redundant RAID levels) cost hundreds of dollars more and usually have a hot CPU or DSP sat with the controller chipset. RAIDCore and Highpoint have proven that you don't need these cpu-offload devices but when, as you've mentioned, you need to get the best overall render and edit performance. You might opt to indulge on such a rig after some research. Typically the best scalable forms come as SAS controllers with CPU-offload matched with a backplane that supports SATA concurrently.

Another RAID sticky point is whether you need to keep with the same brand if you ever have a controller break on you. An OS derived RAID will usually re-import your storage irrespective of what controller you attach the drive too.

Hopefully that helps.... ?

Pachanga schrieb am 08.02.2008 um 00:49 Uhr

RB and Rob,
I guess that explains the dramatic price difference in some controllers which seemingly have the same functionality, but their in-board chips and cache makes more self-sufficient.
It makes sense then that the entry-level controllers may just load the CPU as much as the software-based RAID.

To me it is still anti-intuitive that I could have the hard disks working at 1/10th their capacity and the 4 cores at only 70%. It is almots like the machine is taking a nap somewhere inthere.

Terje schrieb am 08.02.2008 um 02:53 Uhr

To me it is still anti-intuitive that I could have the hard disks working at 1/10th their capacity and the 4 cores at only 70%.

It's not all that strange, and it is a known phenomenon in the computer industry. As you add CPUs you will see diminishing returns for two reasons. The first is that the task may not be all that easy to parallelize, the second is that once it is parallelized the OS and the computer spends more and more time switching and managing the tasks, at the expense of actually running the tasks. That is not the main thing you are seeing though.

As you said, getting the cores to run at high speed with the render-test is straightforward, but the render-test is not a good indication of what will happen when you actually render, it's only an indication of how fast your CPU is. The reason is simple, the render-test is a relatively short piece of "movie" with a lot of things added in to tax the CPU. The memory usage is very, very small relative to a real project.

How is a real render different? Well, let's say you are working with an hour of SD, that is about 20G of data, right? So, the data has, as you measure with your RAID, to be moved from drive to memory. This taxes the CPU a little bit on a PC with a SCSI drive, it taxes it a bit more on an ATA/SATA configuration. Not enough to bring the CPU down to 70% though. Getting the data to memory isn't enough, once it is in memory, each chunk of it is probably accessed, digested, brought onto the CPU and off the CPU etc, a significant number of times. That is a lot of data to be throwing around, and it won't fit in the CPU cache (most of the render-test data most likely does).

Voila, you have your explanation. Your computer is simply doing a lot of other things than run code on the CPU. Some of them are (relatively) slow, like fetching 20G of data repeatedly from memory and playing around with it.

Pachanga schrieb am 08.02.2008 um 03:16 Uhr

Terje,

That is a great explanation. I guess I was under the impression that everything that took place in the computer had be handled by the CPUs, thus if they were not being used, nothing was happening.

If I understood you correctly, these time-consuming tasks are taking place without much CPU intervention, hence the CPU's relatively low usage, and the hard disk relative inactivity.

I guess I was correct in my intial number-crunching results of not needing RAID-0 for performance in my limited "compressed" environment of SD DV and HDV m2t files.

I will try next having multiple instances of Vegas rendering sections of the project. I would like to see what effect, if any, this has on memory, HardDisk and CPU usage. ANY PREDICTIONS ?

Terje schrieb am 08.02.2008 um 04:56 Uhr

these time-consuming tasks are taking place without much CPU intervention

This is indeed correct. These time consuming tasks are requested by the CPU but handled by other components in the computer.

I guess I was correct in my intial number-crunching results of not needing RAID-0 for performance in my limited "compressed" environment of SD DV and HDV m2t files.

Probably. Most speedy hard drives today should be able to feed all the data needed for these types of operations. The main concern is when using huge uncompressed video files.

I still use RAID though, no need to make things slower than they are, and at least if and when the harddrive comes in to play for some reason, it is as little of a bottleneck as possible. Also, I find it easier to deal with only one (logical) drive. The disadvantage is, obviously, that if one of my three RAID'ed drives go out, I lose the data on all of them and will have to restore.

Pachanga schrieb am 08.02.2008 um 14:48 Uhr

Using your example of the law of diminishing results when parallelizing more than 4 CPUs, wouldn't working with uncompressed video and large transfer rates overwhelm the CPUs during rendering?
If so, I would think that would limit the throughput regardless of the RAID capacity.

Thanks for your comments. It has been educational.

rmack350 schrieb am 08.02.2008 um 16:36 Uhr

I think you're fishing for an answer that suits you.

Generally, uncompressed is too much data for most hard drives to support in real time. You need a bit of throughput from the discs, and generally you can get reliable SD output from a 10k Raptor disc or a pair of 7.2k discs in a striped array. If you want more streams, or HD streams, you may find yourself wanting more discs in an array. However, that same phenomena of diminishing returns also applies to multidisk arrays - an 8 disc array is not 8 times faster than a single disk.

Given the size of uncompressed video it's not really practical to use throughout a project, but it's a very good choice for graphics that need to remain sharp and crisp, and also maintain an alpha channel.

Traditionally, edit systems have settled for 4:2:2 color sampling and some level of intraframe compression (the same sort of compression used in DV but with more color sampling).

A CPU doesn't have much trouble passing uncompressed or lightly compressed data. It's not much work and a decent single core processor should be able to do it. It's a LOT more work to uncompress mpeg streams, and also a LOT more work to process an effect on an mpeg stream. So even though there's more overhead involved in dividing up a processing load amongst 4 cores, there's just a lot less work for the cores to do after that.

It's common practice for an edit system to render upwards to some codec that is efficient to use. If it needs to render transitions, it usually does so to some 8 or 10-bit internal 4:2:2 codec. Vegas is an exception because, for better or worse, it doesn't force, guide, or even encourage you to do prerenders. In some ways Vegas gives you more rope to hang yourself with by not guiding the prerender process.

Generally, the practice of using less-compressed media and requiring higher throughput storage goes back to days when CPUs were far less powerful and could never have hoped to work with today's long GOP formats. It's much less CPU intensive.

Rob Mack