Introduce 16-bit Floating Point Pixel Format in VEGAS Pro 23

Abdali wrote on 7/5/2025, 10:05 PM

Dear VEGAS Pro Development Team and Community,

I'd like to propose a feature request for the inclusion of a 16-bit floating-point pixel format option within VEGAS Pro 23 Project Properties.

Currently, VEGAS Pro offers 8-bit and 32-bit floating-point options for internal pixel processing. While 32-bit float provides the absolute highest precision and is essential for top-tier HDR and extremely demanding color work, it can be quite system resource-intensive, leading to slower playback and rendering, especially on mid-range machines or with complex projects. On the other hand, 8-bit, while performant, can sometimes introduce banding or clipping artifacts during heavy color grading or compositing.

A 16-bit floating-point pixel format would offer a crucial and much-needed "sweet spot":

Optimal Balance for Modern Footage: For the vast majority of users working with contemporary camera footage, such as 10-bit Log or even 12-bit video, 16-bit floating point offers more than sufficient precision to preserve dynamic range and color fidelity. This format is a very effective solution for preventing banding and maintaining detail without the significant performance overhead of 32-bit float.

Improved Workflow & Resource Efficiency: It would allow users with less powerful hardware to achieve significantly higher quality output than 8-bit, without grinding their system to a halt. This could empower a broader user base to work with Log and higher bit-depth footage more efficiently directly within VEGAS Pro.

This intermediate bit depth is a common and highly effective standard in many professional video pipelines and it is already available in dedicated compositing applications like VEGAS Effects, showcasing its utility.

Implementing a 16-bit floating-point option would empower VEGAS Pro users with greater flexibility in balancing performance and quality, enhancing the overall professional capability and user experience of the software.

Thank you

Comments

Alex-Pitel wrote on 7/8/2025, 2:50 PM

Yes! I'm fully agree!!

john_dennis wrote on 7/8/2025, 5:16 PM

I asked the Shade-Tree Software Engineer what he thought of this proposal.

Here is his response: "It looks like a good idea except for the number of engineer years of coding it would take to rewrite the Vegas code."

RogerS wrote on 7/8/2025, 9:34 PM

I think it's a good idea and maybe could be part of the current code rewrite? Upgrade 8 to 16 bit and keep 32-bit for high precision renders?

Howard-Vigorita wrote on 7/9/2025, 10:31 AM

I personally would go the other way. 64bit cpus have taken over the world making 32bit processing more medium than high resolution, whether they are float or int. Quality analysis I've done with Vegas renders indicates 8bit-int projects render only slightly lower in quality than 32bit-floats. Easily made up for by a small increase in bitrate. For example, 28mbps 32bit-float quality equals roughly 32mbps 8bit-int for projects rendered with MainConcept 10bit hevc.

Since int is native and so much faster processing than float, 64bit-int would be a better way to go than retro 16bit-float, which might yield worse quality and performance than 8bit-int. In multi-track audio mastering, 64bit-int processing makes a world of difference in both performance and audible mixdown quality.

Abdali wrote on 7/10/2025, 8:09 PM

These discussions are exactly why I wanted to share my thoughts – to gather diverse opinions and expertise Thank you all for your valuable input and perspectives on my suggestions.

Alan-Smithee wrote on 7/11/2025, 1:21 AM

I asked the Shade-Tree Software Engineer what he thought of this proposal.

Here is his response: "It looks like a good idea except for the number of engineer years of coding it would take to rewrite the Vegas code."

I asked my good friend Grok this is his response:

1. Problem Identification

The performance degradation observed in 32-bit floating-point (FP32) mode relative to 8-bit integer mode is not a result of insufficient FP32 compute performance on modern GPUs. Your RTX 3090, for example, is architected for high-throughput FP32 operations. The primary bottleneck is data throughput, which is amplified by the software's legacy architecture.

2. Architectural Analysis of the Bottleneck

Data Volume: An uncompressed 4K (3840x2160) frame in 32-bit float RGBA format is approximately 128 MB, a 400% increase over the ~32 MB size of an 8-bit frame. This quadruples the data load on the system bus and GPU memory bandwidth.
Legacy Data Flow: Software architected around the CPU as the primary processor often relies on a high-latency, sequential workflow for GPU acceleration. This process involves multiple data round trips:

The CPU sends the full frame from system RAM to the GPU's VRAM via the PCIe bus.
The GPU executes a single effect or a small group of effects.
The GPU returns the entire processed frame to system RAM for CPU-based management.
The process repeats for the next effect in the chain.

Latency vs. Compute: Each round trip introduces significant latency. The GPU spends a majority of its time idle, waiting for the CPU to manage and transfer the large data packets. The performance limitation is therefore not the speed of the GPU's computation but the efficiency of the data flow.

3. Evaluation of the FP16 Proposal

Implementing a 16-bit float (FP16) pipeline is a workaround, not a fundamental solution.
Mechanism: This approach reduces the data packet size from ~128 MB to ~64 MB, which lessens the load on the inefficient round-trip process, resulting in a performance increase.

Architectural Flaw: It does not address the core issue, which is the existence of the data round-trips themselves. The high-latency workflow remains.
Quality Compromise: FP16 lacks the precision of FP32. While sufficient for representing 10-bit source data, it is susceptible to cumulative rounding errors during complex, multi-stage image processing. Each mathematical operation in an effects chain introduces a small error, which can compound to create visible artifacts like banding or color shifts in the final output.
Invalidity as a "Quick Fix": The argument that an FP16 pipeline is a faster or simpler implementation than a full engine rewrite is incorrect. It is not a patch; it is a major architectural project with immense hidden costs:
Code Path Duplication: It requires creating a third, parallel processing path alongside the existing 8-bit and 32-bit pipelines. Every function handling pixel data would require a new, dedicated FP16 version to be written, tested, and maintained.
Third-Party Plugin Inefficiency: The OFX plugin ecosystem expects data in 8-bit or 32-bit. To support these in an FP16 pipeline, the host must perform constant, on-the-fly data conversions (FP16 -> FP32 -> Plugin -> FP32 -> FP16), which introduces new performance bottlenecks.
Massive Validation Overhead: The engineering team would have to engage in a massive quality assurance effort, comparing thousands of renders to the FP32 pipeline to identify and fix every visual artifact caused by the lower precision. This validation can be as resource-intensive as the development itself.

4. Optimal Engineering Solution

The technically superior solution is the modernization of the existing FP32 pipeline to a GPU-native model.

Mechanism: This architecture sends the FP32 data to VRAM once. The CPU then dispatches a single, compiled program (a shader or compute kernel) that contains the entire chain of effects.
Efficiency: The GPU executes the full chain without returning data to the CPU, eliminating the round-trip latency. All intermediate data is stored and moved at high speed between the GPU's own memory and cores.
Conclusion: This approach leverages the native FP32 performance of the hardware, addresses the root cause of the performance bottleneck (data flow latency), and maintains the mathematical accuracy required for professional, artifact-free image processing.

Reyfox wrote on 7/11/2025, 3:54 AM

So, I guess the "shade tree" Software Engineer was right.

RogerS wrote on 7/11/2025, 7:29 AM

Very interesting analysis.

I'm pretty sure the updates to VEGAS are around the core architecture including keeping frames in the GPU VRAM to minimize roundtrips over the bus. I see more caching and ram usage these days.

We've seen some good throughput increases with NVIDIA for decoding and more Fx GPU accelerated. Hopefully the render pipeline is next- tired of seeing the mountain range spikes for NVENC.

Fundamental changes to the core would lend itself to high bitrate, high data rate and high dynamic range workflows which VEGAS currently struggles with.

Howard-Vigorita wrote on 7/11/2025, 11:59 AM

I'm pretty sure the updates to VEGAS are around the core architecture including keeping frames in the GPU VRAM to minimize roundtrips over the bus. I see more caching and ram usage these days.

We've seen some good throughput increases with NVIDIA for decoding and more Fx GPU accelerated. Hopefully the render pipeline is next- tired of seeing the mountain range spikes for NVENC.

That's exactly what I've observed in my testing. Up until vp21 b108, there was optimal rendering performance to disk with load-splitting to 2 gpus, one assigned as video-gpu for timeline processing and the other decoding. Since then, I've observed enhanced preview performance when time-line and decoding gpus are the same, particularly when the preview monitor is plugged into that same gpu. But in vp22, performance falls apart rendering to disk, perhaps because disk access is strictly via the pci bus and the data needs to be transferred from vram to ram first, which is a pci/gpu double whammy. This is evidenced by a massive increase in gpu copy-utilization during disk rendering. I don't use multiple monitors plugged into different gpus myself, but I imagine things also fall apart previewing that way too. While steady increases in pci bus, cpu, and ram transfer rates make gpu-centric processing less advantageous. Which might also explain continued lack of Vegas implementation of multi-gpu qsv rendering which has been supported by Resolve and ffmpeg for a few years now. I think given the direction Vegas has taken, the best approach for the future may be a dual-path strategy using different optimal schemes for previewing and rendering... but that opens the door for possibly different results.

rgr wrote on 7/17/2025, 2:30 AM

I asked the Shade-Tree Software Engineer what he thought of this proposal.

Here is his response: "It looks like a good idea except for the number of engineer years of coding it would take to rewrite the Vegas code."

Probably by the time this is finished, 32-bit accuracy won't be an issue anymore :)