Daz Studio Iray - Rendering Hardware Benchmarking

LenioTG · November 2020

RTX 2070 Super + RTX 2060

(with the new stable build of Daz)

System Configuration
System/Motherboard: Asus TUF X570
CPU: Ryzen 5 3600
GPU: RTX 2070 Super + RTX 2060
System Memory: 64GB 3600MHz C16
OS Drive: Sabrent Rocket 250GB
Asset Drive: Sabrent Rocket Q 1TB
Operating System: Windows 10 Pro 10.0.19041
Nvidia Drivers Version: Studio Driver 456.71
Daz Studio Version: 4.14.0.8

Benchmark Results
DAZ_STATS
2020-11-11 10:57:19.078 Finished Rendering
2020-11-11 10:57:19.104 Total Rendering Time: 2 minutes 50.87 seconds
IRAY_STATS
2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 2060): 697 iterations, 2.551s init, 165.563s render
2020-11-11 10:57:25.654 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 2070 SUPER): 1103 iterations, 1.930s init, 165.939s render
Iteration Rate: 10.53 iterations/s
Loading Time: 4.93 seconds

chrislb · November 2020

outrider42 said:

chrislb said:
Are you going to pick up a Nvlink for these?

Maybe. Nvidia and EVGA don't have 3 slot spacing NVLink bridges and have no ETA for availability. Yet most motherboards use 3 slot spacing for PCIEx16 slots. Also, one fo the 3090's is on loan to me for a short time to do some testing.

chrislb · November 2020

I did one more test. I didn't have enough room inside my case to fit three graphics cards, so I used a PCIEx16 riser cable from a vertical GPU mount kit to add in the 2080 Super graphics card(one of the ones I used in previous benchmarks posted here) with both 3090s. I also used two power supplies(1300 watt and 760 watt) to power all three cards.

I was able to get the benchmark render time under 1 minute.

System Configuration:

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.12.2.60 Public Beta

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and one EVGA RTX 2080 Super card only no CPU rendering

2020-11-11 14:08:20.869 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-11 14:08:21.434 Finished Rendering

2020-11-11 14:08:21.477 Total Rendering Time: 58.96 seconds

2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 318 iterations, 2.349s init, 53.674s render

2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 752 iterations, 1.891s init, 53.981s render

2020-11-11 14:08:24.652 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 730 iterations, 1.995s init, 53.197s render

Iteration Rate: (1800/53.981) = 33.3450 iterations per second

Loading Time: ((58.96 seconds) - 53.981) = 4.979 seconds

TheMysteryIsThePoint · November 2020

@chrislb That's crazy! You broke the equivalent of the 4 minute mile! Congratulations!

RayDAnt · November 2020

outrider42 said:

chrislb said:

It looks like using the CPU with a pair of 3090's actually increases render time. The 3090 cards I used were the EVGA GTX 3090 FTW3 Ultra cards. https://www.evga.com/products/product.aspx?pn=24G-P5-3987-KR

They have a higher base and boost clock than the Nvidia founder's edition cards and a power limit of 420 per card watts with the stock BIOS. Both cards hit 2000+ MHz during the render.

If you use PX1 or MSI Afterburner, you can increase the stock power limit to 450 watts per card. EVGA also has a BIOS for the cards which can increase the power limit to 500 watts per card. I may be able to get the render time for this benchmark under 1 minute if I use the 500 watt BIOS and the software to raise the power limit to 500 watts. I actually hit the card's 420 watt power limit during the render.

System Configuration

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.12.2.60 Public Beta

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

2020-11-10 20:45:43.570 Finished Rendering

2020-11-10 20:45:43.619 Total Rendering Time: 1 minutes 8.35 seconds

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 896 iterations, 1.863s init, 63.361s render

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 904 iterations, 1.812s init, 63.570s render

Iteration Rate: (1800/63.570) = 28.315 iterations per second

Loading Time: ((68.35 seconds) - 63.570) = 4.78 seconds

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra and 3950X CPU

2020-11-10 20:56:09.682 Finished Rendering

2020-11-10 20:56:09.728 Total Rendering Time: 1 minutes 11.95 seconds

2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 868 iterations, 2.582s init, 66.549s render

2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 851 iterations, 2.511s init, 65.870s render

2020-11-10 20:56:12.284 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 81 iterations, 2.107s init, 66.197s render

Iteration Rate: (1800/ 66.549 ) = 27.047 iterations per second

Loading Time: ((71.95 seconds) - 66.549) = 5.401 seconds

Are you going to pick up a Nvlink for these?

Yeah! Imo you might as well (not that I would expect you to find it very useful - NVLink VRAM pooled rendering in Iray is always going to be slower than just using the cards independently. Therefore the only usecase in which it makes sense if you've got >24GB scenes to render...)

chrislb · November 2020

@chrislb That's crazy! You broke the equivalent of the 4 minute mile! Congratulations!

Thanks!

It looks like I hit a point of diminishing returns. Adding another 2080 Super didn't help much. I'm not sure what the bottleneck is at this point. Maybe RAM speed or SSD read speed?

System Configuration:

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.12.2.60 Public Beta

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

2020-11-11 15:22:00.949 Finished Rendering

2020-11-11 15:22:00.997 Total Rendering Time: 53.45 seconds

2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 269 iterations, 2.792s init, 47.407s render

2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 3 (GeForce RTX 2080 SUPER): 270 iterations, 2.394s init, 47.577s render

2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 643 iterations, 1.926s init, 47.737s render

2020-11-11 15:22:04.634 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 618 iterations, 2.108s init, 47.181s render

Iteration Rate: (1800/47.737) = 37.706 iterations per second

Loading Time: ((53.45 seconds) - 47.737) = 5.713 seconds

RayDAnt · November 2020

@chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

outrider42 · November 2020

It is still scaling decently pretty well, part of it is just that the 3090s are so much faster it doesn't feel like the 2080 Supers add much in comparison. One 3090 is running 14.1 iterations per second. On the bench list, a 2080 Super hit 6.3. The combined total is 40.8 in theory. That you hit 37.7 is not so bad in my book. Iray does not scale perfectly, you will find scaling more difficult with more cards. I can tell you it is not SSD, as Iray doesn't touch SSD. This is purely in the communication between the cards, and maybe the CPU as it connects them, but you have a high end 5950. So we are talking pcie and bandwidth. You might only be able to go faster with a workstation class CPU as it provides more lanes, but even that is not likely to gain much performance.

chrislb · November 2020

RayDAnt said:

@chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

I already took everything apart. It was just a temporary test setup to see what would happen with all that GPU power until I get more parts that I need to fit everything inside the case. I had to use PCIEx16 extension cables because the thickness of the 3090 heatsinks won't allow them to fit inside the case with the 2080 Supers. When I can get water blocks for them, the 3090s will only take up one PCIE slot width and I can put in at least one 2080 Super or both 2080 supers and a 3090. Also two 3090's and two 2080 Supers are near the limit of a 15 amp circuit breaker in the U.S. If the cards hit full power that's 450-500 watts per 3090 and 330 watts per 2080 super (1560 to 1660 watts) plus the power draw of the rest of the system.

Here is the test setup and why I didn't leave it up for long.

I'm wondering if the issue of diminishing returns is not enough bandwidth on the PCI Express bus for 4 GPUs on an x570 chipset board.

outrider42 · November 2020

That would be my guess, there just isn't enough bandwidth to share with 4 powerful cards. The big Nvlink switch (not to be confused with our consumer Nvlink) that powers Nvidia's DG boxes works by bypassing most of it. The switch has its own processor to control the traffic from the 16 GPUs.

Very cool experiment.

Jason Galterio · November 2020

That is a terrifying setup.

RayDAnt · November 2020

chrislb said:

RayDAnt said:

@chrislb mind doing another benchmark run with both 2080 Super's but just a single 3090? That way we have a complete spread of how all the card combos work out for you.

I already took everything apart. It was just a temporary test setup to see what would happen with all that GPU power until I get more parts that I need to fit everything inside the case. I had to use PCIEx16 extension cables because the thickness of the 3090 heatsinks won't allow them to fit inside the case with the 2080 Supers. When I can get water blocks for them, the 3090s will only take up one PCIE slot width and I can put in at least one 2080 Super or both 2080 supers and a 3090. Also two 3090's and two 2080 Supers are near the limit of a 15 amp circuit breaker in the U.S. If the cards hit full power that's 450-500 watts per 3090 and 330 watts per 2080 super (1560 to 1660 watts) plus the power draw of the rest of the system.

Here is the test setup and why I didn't leave it up for long.

I'm wondering if the issue of diminishing returns is not enough bandwidth on the PCI Express bus for 4 GPUs on an x570 chipset board.

In non-memory-pooled setups, once Iray is finished with the initial loading of assets onto each participating GPU, the only traffic that goes on between each of those GPUs and the rest of the system is occasional low-bandwidth command/control messages (going from Iray's master rendering thread, located on the CPU, to the native Cuda Iray kernel running on each GPU die) and periodic pixel value updates (going from each card to the final rendered image's master framebuffer, also located on CPU.) Meaning that you are most likely looking at - at most - dozens of megabits of PCI-E traffic (ie. basically nothing) going on once the render truly starts. So it is extremely unlikely that lack of PCI-E bandwidth - even on such an overloaded setup such as this - is the weakest link.

Imo it's much more likely that the reason why the 2080 Supers don't seem to scale too well while in company of 3090s is because of the same Iray scheduler behavior previously brought up here that causes CPUs to actually detract from overall rendering performance. Notice that in the last test you ran, each 2080 Super was credited with completing less than 300 iterations, whereas each 3090 was credited with greater than 600. That's a ballpark greater than 1/2 performance gap between the two cards. Which is right where Iray's scheduler is designed to start taking things into its own hands.

outrider42 · November 2020

It is possible, but that doc is from 2017. Iray RTX changes things around a bit. While CUDA almost always scales the same between different GPUs, that is, if GPU A is twice as fast as GPU B in one scene it is almost always twice as fast at every scene, this doesn't hold true for Iray RTX. We have seen that the RT cores can wildly alter the performance depending on the geometry present. That would create a situation where the a GPU might be twice as fast in one scene, but in another scene it becomes 3 times faster. Or maybe in another scene it is only 1.5 times faster. For users with performance gaps between GPUs, this could lead to erratic performance scaling as one card might work sometimes, but then not others. Additionally, we have plenty of GPU tests where two GPUs are far more mismatched than a 3090 versus a 2080 Super. Like the 2080ti and 980ti test.

Looking at the data here, we can examine how the performance scaled. We have a test with 3 GPUs and then the 4 GPUs. I think these two tests show something else is at play.

So the 3 GPU test, with two 3090s and one 2080 Super.

Iteration Rate: (1800/53.981) = 33.3450 iterations per second

We established that the 3090 in this rig hits 14.1 iterations per second. So, just do some math. 33.345 - 14.1 - 14.1 = 5.145

If we were to assume the 3090s were hitting that number, that leaves the 2080 Super with 5.145 iterations per second. This 5.145 is indeed lower than what a 2080 Super should be doing, by over a whole iteration.

The 4 GPU test.

Iteration Rate: (1800/47.737) = 37.706 iterations per second

Again, if we do the math, 37.706 - 14.1 - 14.1 = 9.506.

Hold up! If you divide that by two, you now only get 4.753 iterations per second. The iteration rate dropped more.

If what you said about Iray scheduling was correct, then I would assume we should see similar performance drops in the 3 GPU test as well. The 2080 Super, at just over 6 iterations per second by itself, is well below the 50% threshold, and thus should have been hit harder in its pairing.

I believe if we tested one 3090 with one 2080 Super, we would see both cards run near their solo speeds. I believe what are seeing here is scaling decreasing from 3+ GPUs in play. I don't think I have seen a situation where 3 or 4 GPUs ran the same speeds they do when solo or paired in twos. Just look at the chart. We have a test with four 2080tis together. In that test, they ran for a total of 26.699 iterations. If you split this by 4, that would give give performance of 6.67 iterations for each card. However, a single 2080ti gets 7.4 iterations with Iray RTX. Even the Pre-RTX Iray was slightly faster. So we have clear evidence that scaling is indeed an issue, as with 100% scaling they should be getting over 29.6 iterations with four 2080tis, and they are not. In fact, notice the difference is about 3 iterations off the theoretical peak, which is actually right about the same amount chrislb's setup lost compared to its theoretical peak of 40.8. That does not seem like a coincidence to me. Given all this information, I have to conclude we are not dealing with Iray scheduling. We are hitting some kind of performance limit from using 4 GPUs in one system.

WestKraven · November 2020

Delete please

WestKraven · November 2020

System Configuration (outdated PC with "decentish" GPUs)
System/Motherboard: Gigabyte Z87X-D3H Intel
CPU: Intel i7-4770 3.5 GHz
GPU: EVGA RTX 2080 Ti @ SPEED/stock , GPU2 EVGA RTX 2070 @ SPEED/stock
System Memory: Corsair Vengeance 32 GB DDR3
OS Drive: Samsung 870 QVO 4TB SSD
Asset Drive: Crucial MX500 2TB SSD
Operating System: Windows 10 Pro v1909 Build 18363.1139
Nvidia Drivers Version: 451.48
Daz Studio Version: 4.14
Optix Prime Acceleration: Not Applicable

Benchmark Results

2020-11-11 20:16:54.165 Finished Rendering

2020-11-11 20:16:54.207 Total Rendering Time: 2 minutes 26.49 seconds

2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 2080 Ti): 1094 iterations, 3.321s init, 139.617s render

2020-11-11 20:16:59.827 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 2070): 706 iterations, 3.516s init, 139.654s render

Rendering Performance: 1800/143.058 = 12.58 Iterations/second
Loading Time: 146.49 - 143.058 = 3.432 seconds

chrislb · November 2020

I decided to give the last combination a try this afternoon.

System Configuration:

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.12.2.60 Public Beta

Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

2020-11-12 17:05:55.553 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-12 17:05:56.123 Finished Rendering

2020-11-12 17:05:56.169 Total Rendering Time: 1 minutes 13.2 seconds

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 410 iterations, 2.203s init, 67.856s render

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 976 iterations, 1.939s init, 67.594s render

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 414 iterations, 1.865s init, 67.755s render

Iteration Rate: (1800/67.856) = 26.5267 iterations per second

Loading Time: ((73.2 seconds) - 67.856) = 5.344 seconds

chrislb · November 2020

RayDAnt said:

chrislb said:

RayDAnt said:

In non-memory-pooled setups, once Iray is finished with the initial loading of assets onto each participating GPU, the only traffic that goes on between each of those GPUs and the rest of the system is occasional low-bandwidth command/control messages (going from Iray's master rendering thread, located on the CPU, to the native Cuda Iray kernel running on each GPU die) and periodic pixel value updates (going from each card to the final rendered image's master framebuffer, also located on CPU.) Meaning that you are most likely looking at - at most - dozens of megabits of PCI-E traffic (ie. basically nothing) going on once the render truly starts. So it is extremely unlikely that lack of PCI-E bandwidth - even on such an overloaded setup such as this - is the weakest link.

Imo it's much more likely that the reason why the 2080 Supers don't seem to scale too well while in company of 3090s is because of the same Iray scheduler behavior previously brought up here that causes CPUs to actually detract from overall rendering performance. Notice that in the last test you ran, each 2080 Super was credited with completing less than 300 iterations, whereas each 3090 was credited with greater than 600. That's a ballpark greater than 1/2 performance gap between the two cards. Which is right where Iray's scheduler is designed to start taking things into its own hands.

This afternoon, I did some other testing. I tried the 3090 along with the 2080 Supers on high quality PCIEx16 extension cables(from Fractal Design's vertical GPU mounts for its computer cases) and also on cheaper PCIEx16 extensions that use USB cables and offer PCIE 3.0 X1 to PCIE 3.0 X4 bandwidth. The different in render times was less than 2%. That's probably within margin of error.

Total render times were 67.856 seconds with the higher quality cables and 68.870 seconds with the cheap extensions. The 2080 Supers did 414 iterations and 412 iterations with the low bandwidth cables vs 410 iterations and 414 iterations with the high bandwidth cables.

WestKraven · November 2020

chrislb said:

I decided to give the last combination a try this afternoon.

System Configuration:

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.12.2.60 Public Beta

Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two RTX 2080 Super cards no CPU rendering

2020-11-12 17:05:55.553 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-12 17:05:56.123 Finished Rendering

2020-11-12 17:05:56.169 Total Rendering Time: 1 minutes 13.2 seconds

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 410 iterations, 2.203s init, 67.856s render

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 976 iterations, 1.939s init, 67.594s render

2020-11-12 17:05:59.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 414 iterations, 1.865s init, 67.755s render

Iteration Rate: (1800/67.856) = 26.5267 iterations per second

Loading Time: ((73.2 seconds) - 67.856) = 5.344 seconds

That's insane!!!

outrider42 · November 2020

Whoa, my 1080ti's speed is back! With vengeance! I downloaded the beta 4.14 and installed the latest Nvidia game drivers. If you have followed me, I have been seeing a reduction in render speed since somewhere around Drive 441.66+. Daz 4.14 has a minor Iray update, too, so it very hard to say with certainty which was the issue without testing previous versions of Daz with this new driver. I still have the general 4.12 release, so I will try that version, That should prove whether this was Iray or Nvidia drivers. But not only did I get my speed back, I rendered the bench faster than I ever did before. Before driver 441, before OptiX 6.0. And the difference is beyond any margin of error.

Daz 4.14.0.8

Windows 10 2004

CPU: i5 4690K

GPU #1: EVGA 1080ti SC2

GPU #2: MSI 1080ti Gaming <--this is my display, yes, I use GPU 2 for display.

RAM 32GB HyperX

OS Drive Samsung 860 EVO 1TB

Asset Drive: Samsung 860 EVO 1TB and WB 4TB Black HDD

2020-11-13 00:56:54.938 Total Rendering Time: 3 minutes 36.73 seconds

2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 1080 Ti): 906 iterations, 2.587s init, 211.195s render

2020-11-13 00:57:01.544 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce GTX 1080 Ti): 894 iterations, 3.025s init, 211.058s render

For reference, ever since driver 441.66 this same benchmark has been getting right at 4 minutes 30 seconds with every test and every version of Daz that supports that driver. Before this driver, my times right at 4 minutes, and sometimes just under 4 minutes. But here I blow that time away. My device render times also show this big change. My device times were around the 230 second range, or a few seconds less. But here I scored 211 seconds, shaving a pretty solid 15-20 seconds off those times. device render times with 441.66+ were in the 267 second range. So just yesterday I was rendering this benchmark nearly a full minute slower than today. By any standard that is a huge swing. I am stoked, LOL.

One more thing, back to the monster rig that chrislb has tested, bandwidth may not be quite the word I am looking for. I think the real issue is simply down to synchronizing 4 big GPUs together. That and the physical distance they are from each other. There is probably an element of latency in their communication that bandwidth alone cannot overcome. That latency is the performance bottleneck, and that is something that will get worse as more GPUs get added to a given system. You could perhaps place some blame on scheduling, but not Iray's scheduling (at least I don't think so). Perhaps there is a way to improve multiGPU performance with better scheduling in the hardware, but I don't think it is related to the performance difference between the 3090 and the 2080 Super. We have seen combinations of GPUs with far greater performance gaps than that work fine with each other in pairs. We do not have many benchmarks with 4 pairs of GPUs, this is certainly rare, so getting this kind of data is not easy. But in the systems that do, we can see from the numbers that the 4 GPUs do not scale as well. I think this is a lesson that you will get diminishing returns trying to build super rigs. The DGX-2 Nvlink Switch connecting 16 GPUs is not just a cable, it is powered by its own processor guiding the system so that all the GPUs can talk to each other as fast as possible. Having tech like that is probably the only way to overcome this.

outrider42 · November 2020

So I fired up my trusty 4.12.0.86 and the result is interesting indeed.

2020-11-13 01:51:22.134 Total Rendering Time: 3 minutes 58.76 seconds

2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 1080 Ti): 903 iterations, 6.132s init, 229.027s render

2020-11-13 01:51:26.445 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce GTX 1080 Ti): 897 iterations, 6.437s init, 228.384s render

This time here is basically what I used to get before driver 441.66. I ran this time super consistently. So it looks like new drivers have restored my speed. BUT it also looks like the updated Iray is ALSO a little bit faster. Now it is possible this could be just a change in what Iray considers to be an "iteration". But I do not believe so. The resulting images are identical. So this is quite an interesting result. If you lost some performance, then try updating your drivers. And maybe try the latest Daz Iray. I would love to know if others can get results. FYI, if anybody is concerned about updating, the beta can be installed side by side with the general release, so do not worry about it overwriting your current version. Currently the beta is exactly the same as the general release, too, so do not worry about it being a "beta". Each time the general release is updated, the beta is updated to that same version as well.

This makes me wonder...might chrislb's monster rig score even faster with Daz 4.14? Is it possible???

nonesuch00 · November 2020

outrider42 said:

So I fired up my trusty 4.12.0.86 and the result is interesting indeed.

2020-11-13 01:51:22.134 Total Rendering Time: 3 minutes 58.76 seconds

2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-13 01:51:26.440 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 1080 Ti): 903 iterations, 6.132s init, 229.027s render

2020-11-13 01:51:26.445 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce GTX 1080 Ti): 897 iterations, 6.437s init, 228.384s render

This time here is basically what I used to get before driver 441.66. I ran this time super consistently. So it looks like new drivers have restored my speed. BUT it also looks like the updated Iray is ALSO a little bit faster. Now it is possible this could be just a change in what Iray considers to be an "iteration". But I do not believe so. The resulting images are identical. So this is quite an interesting result. If you lost some performance, then try updating your drivers. And maybe try the latest Daz Iray. I would love to know if others can get results. FYI, if anybody is concerned about updating, the beta can be installed side by side with the general release, so do not worry about it overwriting your current version. Currently the beta is exactly the same as the general release, too, so do not worry about it being a "beta". Each time the general release is updated, the beta is updated to that same version as well.

This makes me wonder...might chrislb's monster rig score even faster with Daz 4.14? Is it possible???

Well, my mouse rig sped up by 3 minutes:

OLD:

Specifications

System/Motherboard: Gigabyte B450M DS3H WiFi
CPU: AMD Ryzen 7 2700 32GB
GPU: PNY GeForce GTX 1650 Super 4GB
System Memory: 32 GB (2x16GB 2666 MHz) Patriot
OS Drive: Crucial 2TB Sata III SSD
Operating System: Windows 10 build 2004 64bit
Nvidia Drivers Version: Gaming 456.71
Daz Studio Version: DAZ Studio Pro Public Beta 4.12.2.51
Optix Prime Acceleration: N/A

+++++ Benchmark +++++

2020-10-14 18:27:08.309 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-10-14 18:27:08.309 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 1650 SUPER): 1800 iterations, 0.286s init, 908.227s render

+++++ Benchmark +++++

So the render took about 15 minutes & 8.227 seconds.

NEW:

Specifications

System/Motherboard: Gigabyte B450M DS3H WiFi
CPU: AMD Ryzen 7 2700 32GB
GPU: PNY GeForce GTX 1650 Super 4GB
System Memory: 32 GB (2x16GB 2666 MHz) Patriot
OS Drive: Crucial 2TB Sata III SSD
Operating System: Windows 10 build 20H2 64bit
Nvidia Drivers Version: Gaming 457.30
Daz Studio Version: DAZ Studio Pro Public Beta 4.14.0.8
Optix Prime Acceleration: N/A

+++++ Benchmark +++++

2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 1650 SUPER): 1485 iterations, 3.399s init, 719.417s render

2020-11-13 09:31:31.505 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 315 iterations, 2.117s init, 721.770s render

+++++ Benchmark +++++

So the render took about 12 minutes & 1.770 seconds. (since the CPU rendering ran almost 2 seconds longer after the GPU rendering had finished).

There has been about a 20% performance gain with DAZ Studio 4.14.0.8 compared to DAZ Studio 4.12.2.51.

TheMysteryIsThePoint · November 2020

@chrislb My electrician finally made it around to my house, and I've got two additional 20A circuits, one for my existing rig, and one for the new rig. Because of my poor experience in the past with parts compatibility, I think I'm going to go with something as close to your exact 2 x 3090 setup (assuming I can get the parts) as I can get.

But with the wisdom that you've gained from actually doing it, is there anything you would have done differently, had you known then what you know now? Any general system building pointers?

Anything you can suggest would be appreciated.

Danny · November 2020

I have a GTX 1050 Ti (which is terrible, im just w8ing for the 30 series to be available) and I cannot even get it to work. If I choose GPU only for rendering, not even the Iray preview loads. What can I do about that?

A different wuestion: With which CPU and main board alongside the grafic card were the benmark tests made? Is that important or do cpu and main board not matter at all?

outrider42 · November 2020

danielpjonas said:

I have a GTX 1050 Ti (which is terrible, im just w8ing for the 30 series to be available) and I cannot even get it to work. If I choose GPU only for rendering, not even the Iray preview loads. What can I do about that?

A different wuestion: With which CPU and main board alongside the grafic card were the benmark tests made? Is that important or do cpu and main board not matter at all?

Are your drivers up to date? The new Iray needs new drivers, get them from Nvidia, not Windows. The 1050ti is supported, so it should work. Besides being slow by modern standards, its biggest issue is VRAM. I am assuming it is 4gb, and that is hard to work with. It is possible you may not be able to run the Iray viewport simply because of the VRAM.

RayDAnt has tested his scene on a MS Surface that has a 1050ti. The bench scene is designed to use only a small amount of VRAM so that lower capacity cards can still run it. So should work with that spec. But I don't know if he has used the Iray viewport on it. I would say it is possible, but honestly not advisable to use Iray viewport with a 4gb GPU today.

With Iray you really don't need to worry about CPU spec much at all. The only concern is that your CPU can handle the Daz application itself, which is independent from Iray. So as long as Daz is running ok for you when you build your scenes, there is no need to upgrade CPU/motherboard for Iray rendering. Just get the best GPU you can. The only other spec to concern with is possibly system RAM, it needs to be enough to handle your your creations. You probably need around twice as much RAM as you do VRAM if you are going to utilize that VRAM, possibly a bit more. It just depends on what you want to do.

chrislb · November 2020

I ran a single 3090 with the new release version of Daz and saw an improvement voer the beta version.

System Configuration

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.14.0.8

Benchmark Results - One EVGA RTX 3090 FTW3 Ultra card only no CPU rendering

2020-11-14 16:15:40.509 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-14 16:15:41.079 Finished Rendering

2020-11-14 16:15:41.117 Total Rendering Time: 1 minutes 37.16 seconds

2020-11-14 16:15:48.002 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-14 16:15:48.003 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 1800 iterations, 1.372s init, 92.952s render

Iteration Rate: (1800/92.952) = 19.365 iterations per second

Loading Time: ((97.16 seconds) - 92.952) = 4.208 seconds

chrislb · November 2020

TheMysteryIsThePoint said:

@chrislb My electrician finally made it around to my house, and I've got two additional 20A circuits, one for my existing rig, and one for the new rig. Because of my poor experience in the past with parts compatibility, I think I'm going to go with something as close to your exact 2 x 3090 setup (assuming I can get the parts) as I can get.

But with the wisdom that you've gained from actually doing it, is there anything you would have done differently, had you known then what you know now? Any general system building pointers?

Anything you can suggest would be appreciated.

I'll send youa PM with more info. Its probably too far off topic for here.

chrislb · November 2020

chrislb said:

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

2020-11-10 20:45:43.570 Finished Rendering

2020-11-10 20:45:43.619 Total Rendering Time: 1 minutes 8.35 seconds

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 896 iterations, 1.863s init, 63.361s render

2020-11-10 20:45:47.129 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 904 iterations, 1.812s init, 63.570s render

Iteration Rate: (1800/63.570) = 28.315 iterations per second

Loading Time: ((68.35 seconds) - 63.570) = 4.78 seconds

The new version is faster than the beta version.

System Configuration

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.14.0.8

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards only no CPU rendering

2020-11-14 16:56:18.336 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-14 16:56:18.985 Total Rendering Time: 56.81 seconds

2020-11-14 16:56:23.050 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-14 16:56:23.050 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 920 iterations, 1.621s init, 52.099s render

2020-11-14 16:56:23.051 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 880 iterations, 1.307s init, 52.028s render

Iteration Rate: (1800/52.099) = 34.549 iterations per second

Loading Time: ((56.81 seconds) - 52.099) = 4.711 seconds

outrider42 · November 2020

Whoa, the hype is real! That is a big jump...going from 28 to 35 iterations per second with two 3090s. That is FASTER than what you did with 3 GPUs when you added the 2080 Super to the mix, and that is only a bit over 2 iterations slower than when you had four GPUs together. You are getting performance right now that you needed a fairly large hardware upgrade to do just a few days ago, and all from just a software update. Not to mention the crazy amount of power draw that 4 GPUs took.

This is really incredible and unexpected. We have not seen this kind of boost in performance from software alone aside from the addition of RTX support. But unlike that change this new speed is coming to everybody, not just RTX owners. I am truly curious as to what changes were made to Iray in this update. Perhaps this is the result of better optimization with OptiX, as it is still rather new for Iray. Or perhaps OptiX itself has improved, and this in turn trickled into Iray.

Regardless, I think the evidence is clear, it is time for people to update to 4.14. This performance gain is not trivial and I would expect it to scale up across the board. As we can see from chrislb, this can be a performance gain on par with upgrading the hardware itself, but unlike buying new hardware, this a free upgrade!

I suppose one other thing to look at might be VRAM use, if 4.14 is using a different amount of VRAM compared to 4.12, that might be tied to the increase in performance.

chrislb · November 2020

I tried two more combinations

System Configuration:

System/Motherboard: MSI MEG X570 ACE

CPU: AMD R9 3950X @ Stock with PBO +200

GPU: EVGA GeForce RTX 3090 FTW3 ULTRA (24G-P5-3987-KR) and EVGA RTX 2080 Super @ Stock speed and stock power limits

System Memory: Corsair Vengeance RGB Pro 64 GB @ 3600 MHz CAS18

OS Drive: 1TB Sabrent Rocket NVMe 4.0 SB-ROCKET-NVMe4-1TB

Asset Drive: XPG SX 8100 NVMe SSD

Operating System: Windows 10 Pro version 2004 Build 19041.450

Nvidia Drivers Version: Version 457.30

Daz Studio Version: 4.14.0.8

Benchmark Results - Two EVGA RTX 3090 FTW3 Ultra cards and two 2080 Super cards only no CPU rendering

2020-11-14 17:28:14.542 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-14 17:28:15.161 Total Rendering Time: 42.8 seconds

2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 219 iterations, 2.958s init, 35.883s render

2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 3 (GeForce RTX 2080 SUPER): 219 iterations, 2.797s init, 36.047s render

2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 677 iterations, 1.627s init, 37.237s render

2020-11-14 17:28:16.737 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 3090): 685 iterations, 1.748s init, 36.907s render

Iteration Rate: (1800/37.237) = 48.3390 iterations per second

Loading Time: ((42.8 seconds) - 37.237) = 5.563 seconds

Benchmark Results - One EVGA RTX 3090 FTW3 Ultra cards and two 2080 Super cards only no CPU rendering

2020-11-14 17:41:34.711 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.

2020-11-14 17:41:35.338 Total Rendering Time: 1 minutes 2.35 seconds

2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:

2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (GeForce RTX 2080 SUPER): 356 iterations, 3.158s init, 56.002s render

2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (GeForce RTX 2080 SUPER): 353 iterations, 2.771s init, 55.775s render

2020-11-14 17:41:36.943 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce RTX 3090): 1091 iterations, 2.051s init, 56.615s render

Iteration Rate: (1800/56.615) = 31.793 iterations per second

Loading Time: ((62.5 seconds) - 356.615) = 5.885 seconds

outrider42 · November 2020

Holy crap, it went from 37 to over 48 iterations per second, that is a monstrous increase. The amount of performance you gained is more than a lot of people's total performance. Just think about that. to put that in perspective, a 3080 ran the bench at 12 iterations. This performance gain is like adding a 5th GPU...a 3080 class GPU...to your rig. This is seriously impressive, and the 4 GPU setup was already impressive!

Since you have access, would you mind running a single 2080 Super so we can compare it to its previous time? This would also be helpful for looking at scaling. Since we have a single 3090 on 4.14, we would be able to calculate a theoretical peak, and compare that number to what you actually scored to see how many iterations may be lost. For all we know, the scaling might just have improved as well.

We are going to have to draw a hard line on the benchmark chart now. I think we should carefully note all benchmarks that come after 4.14 because the gap is so big that it may confuse people who compare a 4.14 bench to a 4.12 bench, they are simply not comparable anymore. Obviously benches from one version to the next are generally not comparable, but in practice they really have been consistent over time, with only minor differences aside from when Iray switched to OptiX and RTX support.

I am surprised that Daz has not advertised this speed upgrade. Advertising a speed increase in the update would get a lot of people on board the choo choo train.

Notifications

Daz Studio Iray - Rendering Hardware Benchmarking

Comments

RTX 2070 Super + RTX 2060

Adding to Cart…