Plugged into my office system to test it out (I had to remove the HDD cage to accommodate the gpu - hence, dirty n bad cable management)
Until my new power supply arrives - the card is rock steady with a lower tier 450w psu!
Nice upgrade! Looks like you are rendering about 5 times faster. The "RTX" 1660 Super you mention I believe was actually a GTX card though. The A5000 has so many perks: low power, 2 slot height, exhaust style cooling, cleaner wiring, binned ram modules and no flashy lights. Very good cards IMO.
Nice upgrade! Looks like you are rendering about 5 times faster. The "RTX" 1660 Super you mention I believe was actually a GTX card though. The A5000 has so many perks: low power, 2 slot height, exhaust style cooling, cleaner wiring, binned ram modules and no flashy lights. Very good cards IMO.
Thank you. Ah yes my bad (its a typo), 1660 super is GTX. A Zotac Amp card (2 slot) and with a backplate!
The VRAM amount was very low with my old card. (6 GB vs 24 GB in A5000)
I'm happy with the upgrade and enjoying the render performance.
I do wish A5000 had a backplate of some sort as it has some crazy SMT Components back there!
I got a good deal on a used Threadripper 3970X and motherbaord recently and decided to see what the CPU can do. I've been beta testing CTR and Hydra for Ryzen CPUs. I used CTR with this run. Average clock speed was 4.1 GHz at 1.1v. I'll try some other tests later at stock settings and PBO.
System Configuration
System/Motherboard: MSI TRX40 Pro Wifi
CPU: BRAND MODEL @ AMD Threadripper 3970X running CTR
GPU: BRAND MODEL Not used for render
System Memory: 64GB G.Skill TridentZ DDR4 3600 MHz CAS 16
OS Drive: Crucial P5 NVMe 1 TB
Asset Drive: Same
Power Supply: EVGA G2 1300 watt
Operating System: Windows 10 Pro 21H2 Build 19044.1469
Nvidia Drivers Version: N/A
Daz Studio Version: .16.03
Benchmark Results
2022-01-22 09:28:45.484 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Received update to 01800 iterations after 638.686s.
2022-01-22 09:28:45.490 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.
2022-01-22 09:28:46.088 Total Rendering Time: 10 minutes 40.31 seconds
2022-01-22 09:34:42.135 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2022-01-22 09:34:42.135 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 1800 iterations, 1.372s init, 637.313s render
Iteration Rate: (1800 / 637.313) 2.824 iterations per second
As promised, here be watercooled RTX A5000 benchmarks - as well as some hard-earned tidbits on various things (like poorly documented waterblock compatibilities and NVLink intricacies.)
System Configuration System/Motherboard: Gigabyte Z370 Aorus Gaming 7 CPU: Intel 8700K @ stock (MCE enabled) - onboard graphics used for all display functionality GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia Titan RTX @ stock (custom watercoooled) System Memory: Corsair Vengeance LPX 32GB DDR4 @ 3000Mhz OS Drive: Samsung Pro 980 512GB NVME SSD Asset Drive: Sandisk Extreme Portable SSD 1TB Power Supply: Corsair AX1500i 1500 watts Operating System: Windows 10 Pro version 21H2 build 19044.1466 Nvidia Drivers Version: 472.12 SRD Daz Studio Version: 4.16.0.3 64-bit
Long story short, in terms of Iray rendering performance specifically this upgrade amounts to an approximate 12% gain in rendering performance when using the two A5000s alone (no Titan RTX active) versus before the watercooling upgrade (see here.) And an approxiamtely 3% gain in rendering performance when using all three GPUs versus prior. Other than that, everything else is pretty much flat (eg. individual GPUs, when used for rendering solo, are within margin of error of how they performed pre-upgrade.)
However, where this upgrade has had a huge impact across the board is regarding (unsurprisingly) thermals. Here's a set of before/after graphs depicting the running temperatures of various key elements in this machine's GPU pipeline during a timed stress test (this thread's benchmarking scene, modified to run for 60 minutes rather than 1800 iterations):
Notice that the temperatures given in these graphs are relative to ambient room temperature (as determined by a temperature probe set up in the case's lower front air intake, and sampled on a per second basis.) Meaning that the approximatley +75 degree C temps for VRAM seen on the left were in reality approximately 100 degrees C as read directly from the GPU's sensors. As a quick reminder of the unusual case design of the Tower 900 in play here, here are before/after upgrade pics:
Why are these temperature reductions important given how little effect they demonstrably have on actual operational performance? Two reasons:
This is eventually going to be a 4-GPU system (motherboard upgrade permitting.)
Although GPU/CPU/DDR6 processing dies may be rated for 100+ degrees C continous use, the chips surrounding them may or may not.
Having this headroom means that I basically never have to worry about operating temperatures ever again (assuming no component failures.) especially for weeks' long render/ai inferencing/protein folding sessions.
Some notes on NVLink memory pooling functionality:
All three GPUs in this system (two A5000s, one Titan RTX) have a single functional physical NVLink connector onboard (which fwiw are not inter-generationally physically compatible.) Meaning that a maximum of two cards from the same generation may be linked together for VRAM sharing and the like. This means that if you have a system like mine with an odd number of GPUs, there is no way for you to be able to use all of them in a functional NVLink configuration. In fact, the only way to get NVLink to boot successfully is by disabling any not-NVLinked-GPUs on the hardware level (disabling it via Device Manager is sufficient.) Otherwise you get error messages from Iray like this:
2022-01-22 15:53:57.940 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA RTX A5000): compute capability 8.6, 23.988 GiB total, 22.833 GiB available2022-01-22 15:53:57.944 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA RTX A5000): compute capability 8.6, 23.988 GiB total, 22.835 GiB available2022-01-22 15:53:57.948 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (NVIDIA TITAN RTX): compute capability 7.5, 24.000 GiB total, 22.877 GiB available2022-01-22 15:53:57.953 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(359): Iray [ERROR] - IRAY:RENDER :: 1.0 IRAY rend error: Invalid NVLINK peer group size (2) for 3 devices 2022-01-22 15:53:57.953 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Failed to establish NVLINK peer group size 2, retrying old size (0)
And an NVLink peer group size of 0 is the same as turning it off. So...
Also, as has been reported by others such as PugetSystems, getting NVLink to function properly - at least on A series GPUs (consumer 3000 series GPUs are said to be different...) - requires Nvidia driver mode being set on those cards to TCC. Although, interestingly, this is not effective at preventing the problem of odd numbers of NVLink capable GPUs being in the saystem resulting in NVLink setup failing. Ie. even with the two A5000s phyiscally linked together with an NVLink bridge and set TCC driver mode, and the Titan RTX neither physically linked nor set to TCC mode (driver mode is changeable on a per-card basis under Windows) the errors about failing to establish a link between all 3 GPUs persist. Makes me think there really is some arbitrarily feature siloing going on at the Nvidia driver level. Shocking, I know...
ETA: One other thing to keep in mind about NVLink on A series GPUs. Since getting vram pooling to work means having to have TCC mode active on those GPUs, this means that there is no way to use those GPUs for other things like running dForce simulations without having to reboot your system in between. Just something to keep in mind.
As promised, here be watercooled RTX A5000 benchmarks - as well as some hard-earned tidbits on various things (like poorly documented waterblock compatibilities and NVLink intricacies.)
System Configuration System/Motherboard: Gigabyte Z370 Aorus Gaming 7 CPU: Intel 8700K @ stock (MCE enabled) - onboard graphics used for all display functionality GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia Titan RTX @ stock (custom watercoooled) System Memory: Corsair Vengeance LPX 32GB DDR4 @ 3000Mhz OS Drive: Samsung Pro 980 512GB NVME SSD Asset Drive: Sandisk Extreme Portable SSD 1TB Power Supply: Corsair AX1500i 1500 watts Operating System: Windows 10 Pro version 21H2 build 19044.1466 Nvidia Drivers Version: 472.12 SRD Daz Studio Version: 4.16.0.3 64-bit
Great data and very helpful. I found the NVLink to be a pain point. With four cards and 2x pairs in NVLink mode, this was especially difficult to work with. Perhaps I will revisit one day. For now I am just using bigger cards. I would add that your 32 GB system RAM might create a bottleneck here, even without the pooling.If you are getting ready for a (4) card mainboard though, it makes perfect sense to wait until you switch that out before upgrading the modules.
Very impressive temps by the way. I'm interested to see how you eventually run the 4 card loop.
5th A6000 installed. Running it off the NVMe PCIe 4x bridge. CPU keeps throttling though when I crank up the cards. Just a 120mm AIO for the CPU cooler and no room left inside the chasis. I can see the render bar slug when the CPU throttles.. If I clock down the CPU my scores drop considerably, even without the CPU involved in the render. I think with a better cooler on the CPU, this config could get close to 100 iterations/second.
As promised, here be watercooled RTX A5000 benchmarks - as well as some hard-earned tidbits on various things (like poorly documented waterblock compatibilities and NVLink intricacies.)
System Configuration System/Motherboard: Gigabyte Z370 Aorus Gaming 7 CPU: Intel 8700K @ stock (MCE enabled) - onboard graphics used for all display functionality GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia RTX A5000 @ stock (custom watercoooled) GPU: Nvidia Titan RTX @ stock (custom watercoooled) System Memory: Corsair Vengeance LPX 32GB DDR4 @ 3000Mhz OS Drive: Samsung Pro 980 512GB NVME SSD Asset Drive: Sandisk Extreme Portable SSD 1TB Power Supply: Corsair AX1500i 1500 watts Operating System: Windows 10 Pro version 21H2 build 19044.1466 Nvidia Drivers Version: 472.12 SRD Daz Studio Version: 4.16.0.3 64-bit
Great data and very helpful. I found the NVLink to be a pain point. With four cards and 2x pairs in NVLink mode, this was especially difficult to work with. Perhaps I will revisit one day. For now I am just using bigger cards. I would add that your 32 GB system RAM might create a bottleneck here, even without the pooling.If you are getting ready for a (4) card mainboard though, it makes perfect sense to wait until you switch that out before upgrading the modules.
Yeah, currently this system isn't actually capable of loading a scene into Daz Studio big enough to need +24Gb of vram for rendering. Imo it's a very bad time right now to be breaking into the 3+ PCi-E x-16 slot motherboard market. Between PCI-E 4.0 x16 4-slot boards with iGPU processor compatibility - much less DDR5 support - not yet being a thing and the general price craziness surrounding all the new platforms right now, I'm pretty much set on picking up a 2nd 32GB set of DDR4 sticks (the max this motherboard will allow) on the used market to tide things over for the time being.
Very impressive temps by the way. I'm interested to see how you eventually run the 4 card loop.
My long-term upgrade plan is to eventually replace the Titan RTX with two more A5000 class or better pro-level GPUs (possibly of the RTX 4000 series - see how things land with that later this year) with matching additional bitspower waterblocks, switch to a 4-slot evenly spaced PCI-E x16 motherboard/CPU/RAM combo, and simply resorting to some variation of these to get the GPU cooling loop under control in an as elegant way as possible.
Simple, really. Just a matter of waiting for the appropriate amount of money/products to fit the task to accumulate...
5th A6000 installed. Running it off the NVMe PCIe 4x bridge. CPU keeps throttling though when I crank up the cards. Just a 120mm AIO for the CPU cooler and no room left inside the chasis. I can see the render bar slug when the CPU throttles.. If I clock down the CPU my scores drop considerably, even without the CPU involved in the render. I think with a better cooler on the CPU, this config could get close to 100 iterations/second.
Wouldn't be surprised. While rendering with all 5 A6000s active and CPU inactive, if you pop open task manager and look at per-core CPU activity you should be able to notice 5 cores exhibiting very similar usage patterns. This is because Iray's in-built scheduler, during rendering, dedicates an entire CPU thread per active GPU in the system to active management of each GPU's contributions to the final render. Meaning that poor overall CPU performance is very much capable of adversely effecting GPU rendering performance even when the CPU itself is not officially being used as a rendering device.
Wouldn't be surprised. While rendering with all 5 A6000s active and CPU inactive, if you pop open task manager and look at per-core CPU activity you should be able to notice 5 cores exhibiting very similar usage patterns. This is because Iray's in-built scheduler, during rendering, dedicates an entire CPU thread per active GPU in the system to active management of each GPU's contributions to the final render. Meaning that poor overall CPU performance is very much capable of adversely effecting GPU rendering performance even when the CPU itself is not officially being used as a rendering device.
Ok, this is very helpful. I ran the test scene at a higher resolution w/5000 iterations to see the CPU cores in action (byproduct 4k render attached). The i9-10980xe has 18 cores/36 threads with 48 PCIE lanes total. Looking at the idle cores, if I chill the CPU a bit, I should be able to do 5 more cards @ 4 lanes each.
Wouldn't be surprised. While rendering with all 5 A6000s active and CPU inactive, if you pop open task manager and look at per-core CPU activity you should be able to notice 5 cores exhibiting very similar usage patterns. This is because Iray's in-built scheduler, during rendering, dedicates an entire CPU thread per active GPU in the system to active management of each GPU's contributions to the final render. Meaning that poor overall CPU performance is very much capable of adversely effecting GPU rendering performance even when the CPU itself is not officially being used as a rendering device.
Ok, this is very helpful. I ran the test scene at a higher resolution w/5000 iterations to see the CPU cores in action (byproduct 4k render attached). The i9-10980xe has 18 cores/36 threads with 48 PCIE lanes total. Looking at the idle cores, if I chill the CPU a bit, I should be able to do 5 more cards @ 4 lanes each.
See this link for the official nitty-gritty of how Iray plays out here, particularly the following:
Generally, Iray Photoreal uses all available CPU and GPU resources by default. Iray Photoreal employs a dynamic load balancing scheme that aims at making optimal use of a heterogeneous set of compute units. This includes balancing work to GPUs with different performance characteristics and control over the use of a display GPU.
Iray Photoreal uses one CPU core per GPU to manage rendering on the GPU. When the CPU has been enabled for rendering, all remaining unused CPU cores are used for rendering on the CPU. Note that Iray Photoreal can only manage as many GPUs as there are CPU cores in the system. If the number of enabled GPU resourses exceeds the number of available CPU cores in a system Iray Photoreal will ignore the excess GPUs.
The 1660Ti (Mobile) iterations per second were about 3, compared to about 0.2 for the CPU, due to rounding, it was about 14 times faster on GPU. Loading time was about the same on both at around 5 seconds.
Notes:
Been using it to learn Daz3D for about a year, bought it when I was only a Programmer & SysAdmin (Still am), so not intended for this in away, so figured I'd give one final torture test. One of the GPU, then one of the CPU, just adding to the database, even if I'd recommend anyone use these for Daz3D. Obviously this is my laptop, but it's about to be replaced by my workstation which I'll post results for soon.
Combining the 1660Ti and the 4600H on a Laptop did technically increase iterations per second compared to just a 1660Ti, but adding them together, you'll see the number of iterations if you add them together individually is less, so it's diminished returns. Also being a laptop, I'd never do it. Why? Simple, most laptops will thermal throttle and/or get way to hot to be worth the small extra gains. On this specific laptop, even though the 1660Ti has a Higher TDP about double this CPU, it cools much better, even with it half sharing it's cooling solution. The only CPU I'd consider doing this on is maybe a Threadripper where it has GPU like performance and you had the system RAM to spare.
Testing ECC on/off and TCC/WDDM mode, all 4 combinations proved quite a large performance difference between them all. Both Turning Off ECC, and switching to TCC mode, had a positive impact to Interations per second and Loading times. Mostly having ECC off was a huge increase in performance, so it's just a good mental note, only have it on as and when needed. There was margin of error between loading times, it might be better, but not by much. As you can see, in theory, if you wanted to reduce loading times, go with TCC mode, it definitely is better. But you are looking at 2.8 seconds compared to 3.7 seconds, scaling up will only notice if this matters. TCC mode also increases Iterations, and free's up some memory, though for this GPU, not a concern. I was impressed how the 1080ti overclocks and performs in generally, but it is in pristine condition, had almost no use since purchase.
Incase your wondering the CPU is watercooled with Arctic Liquid Freezer II 360 so along with the 1080ti and it's extra fan on it's radiator, don't go over 45c. The A6000 with it's adjusted fan curve stay under 65c, I use a temp probe to kick in the case fans, when the A6000 gets warm, aswell as other components on the motherboard. It handles all night renders really well. I'm still getting my settings right, so this is far from a best setup.
See this link for the official nitty-gritty of how Iray plays out here, particularly the following:
Generally, Iray Photoreal uses all available CPU and GPU resources by default. Iray Photoreal employs a dynamic load balancing scheme that aims at making optimal use of a heterogeneous set of compute units. This includes balancing work to GPUs with different performance characteristics and control over the use of a display GPU.
Iray Photoreal uses one CPU core per GPU to manage rendering on the GPU. When the CPU has been enabled for rendering, all remaining unused CPU cores are used for rendering on the CPU. Note that Iray Photoreal can only manage as many GPUs as there are CPU cores in the system. If the number of enabled GPU resourses exceeds the number of available CPU cores in a system Iray Photoreal will ignore the excess GPUs.
Thank you. This was very helpful. This info encouraged me to combine (2) systems and clear up some clutter.
9 gpu? how do you connect them? with all that pci lane drama?
Most systems have somewhere between 16 and 24 lanes. The I9-10980XE has up to 48. On this board you can use all 48 lanes through four PCIe slots (x16/x8/x16/x8). This board also allows for splitting up the lanes into groups of four via active bifurcation. This lets you run each card at 4 GB/s, which is essentially the same bandwidth provided by a Gen 3 M.2 interface.
To extract the 4x speeds for each card, I used two 4x4 MaxCloudOn daughterboards. For now, I have the 9th card attached to an 8x port. I did a test run with 11 cards, adding in two 3090s to the mix, but I ran into stability issues. I may have another go with 11 or 12 cards sometime. I think 10 A-Series cards should be very managable, skipping the bifurcation on cards 9 & 10. This hinders on me tuning the overclock on the CPU properly, or just moving the cards to a different platform.
The active bifurcation used here is more common on server and HEDT boards; I think it is a bit rare on consumer boards since you don’t have that many lanes to work with. On a consumer motherboard, you could achieve something similar using 1x4 PCIe splitters and 1x risers, but you would be capped at 1 lane per GPU (1 GB/s transfer speeds).
I remember some were curious about the A100 at one point. While those were shot down by Octane benchmarks, I just thought I would chime in with a Linus Tech Tips video on the A100. At one point he tests it with Blender, in CUDA mode it is close to a 3090 (but still slower). However in OptiX mode, which Iray uses, the A100 was way off the pace of the 3090. So while not an Iray bench, it is something we can compare performance to.
Additionally there were lots of other nagging issues with the A100 that would make it very impractical for GPU rendering. Without ray tracing cores it just makes no sense. Like I said previously it is a card built for more specific machine learning applications. Still, it is a cool video for people who like seeing this kind of hardware. He takes it apart, too.
Linus also has a mining GPU based on the A100 chip in his possession, if anyone was curious. But that card only has 8gb.
Also, I may be late to comment since i dont get notifications anymore, it is pretty cool the Threadripper 3970 can hit 3 iterations per second. So it does match up with a GTX 1080, and maybe even a touch faster. That is pretty neat, until you consider every RTX in existence is significantly faster than that. Ray tracing cores have totally changed the game, and the A100 only makes this even more clear.
Also, I may be late to comment since i dont get notifications anymore, it is pretty cool the Threadripper 3970 can hit 3 iterations per second. So it does match up with a GTX 1080, and maybe even a touch faster. That is pretty neat, until you consider every RTX in existence is significantly faster than that. Ray tracing cores have totally changed the game, and the A100 only makes this even more clear.
My Threadripper results at 4.1 GHz were without any sort of extreme overclock or signifigant power limit increase. In other benchmarks, I had most of the cores stable at 4.60 or 4.65 GHz and a few cores at 4.5 GHz with just ambient water cooling. I forgot to run the Daz benchmark in that configuration. However, in Cinebench R23 I managed to hit almsot 52,000 points in that configuration.
I don't think that would change the results too much. Even in the best case it would still be less than half the speed of a 3060 or an A4000. The only way it changes is if the CPU renderer itself changes. I read somewhere that the newest Iray in 4.20 may actually do that, so there may be some hope. I have no plans to update, though, so I can't test this.
You know what's insane right now? The street price of teh RTX 3080 is so high you can buy an A5000 for very close to the same money here in the good old USA. Completely nuts.
You know what's insane right now? The street price of teh RTX 3080 is so high you can buy an A5000 for very close to the same money here in the good old USA. Completely nuts.
Yes. But this may be changing. Prices are starting to come down a bit, at least in the used market. At retail, we are starting to see cards actually in stock more often, but they are sitting on shelves longer because people are not as interested in paying those sky high prices like they were last year...when crypto was at its peak and trucking along. Now crypto is on a downward trend, and has been for about 2 or 3 months. There are those who are trying to wait it out, but if things do not bounce back they will be forced to reduce inventory as well.
If this happens (and I stress I am not claiming it will, just that it could), we may start to see a reset on pricing. At some point the retail units have to compete with the used market. Up until now the used market has been insane. Then everybody in the supply chain wanted a piece of this swelling pie, like AIBs and distributors. These guys are charging crazy prices to the retailers, so in some situations the retailers are not in a great position to just drop prices. They could end up losing money on some cards, unless the AIBs and/or distributors start to give them some form of rebate to work with. So this is why prices may not drop as fast as some people hope they would when crypto crashes, the companies involved have become used to this inflation and not so willing to dial it back.
But this does not impact the second hand market the same way, as there is not the same pressure to keep a price high...with the exception of scalpers. However, scalpers are limited in scope. While they certainly have hurt the market, they also have not gobbled that many cards.
I have a video that some may find interesting about the second hand market. This fellow pulled a lot of data together to try and see how much scalping has been going on. Now the thing I want you guys to remember is that while the numbers in the video are indeed high, keep in mind that Nvidia has shipped and sold tens of millions of new GPUs over the course of time this data was taken. For example, Nvidia stated they sold 9.1 million GPUs in Jan of 2021. Ampere was just 3 months old! Since that time They have sold a good 8 to 9 million every 3 months, and this is being conservative. All of his numbers are based on the US Ebay, but even so this is the largest 2nd hand market by far, dwarfing all the rest, possibly even all combined. It is like comparing California's economy to the rest of the US, where Cali alone would be the 5th largest economy in the world.
Another thing to put the numbers he gives into perspective, one single mining firm in China was running 485,000 GPUs. They were older RX 470s, yes, but this just shows the massive and mind boggling scale that these mining operations can be at. This was just ONE of them. There are many more. Link: https://www.tomshardware.com/news/china-supreme-court-sides-genesis-mining-gpu-battle
So that one single firm had more GPUs than ALL of the scalped GPUs sold on US Ebay.
Anyhow, I don't want to go too far off topic in the benchmark thread. But maybe there is at least a ray of hope that things will improve and we will be able to buy these things again at more normal pricing. If anybody reading this has waited this long, unless you absolutely need it, I would suggest waiting. This market may be starting to stabilize finally and prices might come down more. Besides that, the next generation of GPUs is just around the corner. I know it seems like this is said all the time, but the next generation of cards is expected to be on its way late this year. That's only a handful of months now. If you have held out this long, why not wait a bit more. And hopefully when the 4000 series (if that is the name) it will at the least cause much needed price drops for the 3000 series. The 4000 series may double performance, so if that is true, things could get wild because a lot of people will be upgrading....which means a lot of used cards will be hitting the market, too. As long as crypto doesn't explode again. That is the wild card that we are going to have to deal with now.
Testing ECC on/off and TCC/WDDM mode, all 4 combinations proved quite a large performance difference between them all. Both Turning Off ECC, and switching to TCC mode, had a positive impact to Interations per second and Loading times. Mostly having ECC off was a huge increase in performance, so it's just a good mental note, only have it on as and when needed. There was margin of error between loading times, it might be better, but not by much. As you can see, in theory, if you wanted to reduce loading times, go with TCC mode, it definitely is better. But you are looking at 2.8 seconds compared to 3.7 seconds, scaling up will only notice if this matters. TCC mode also increases Iterations, and free's up some memory, though for this GPU, not a concern. I was impressed how the 1080ti overclocks and performs in generally, but it is in pristine condition, had almost no use since purchase.
Incase your wondering the CPU is watercooled with Arctic Liquid Freezer II 360 so along with the 1080ti and it's extra fan on it's radiator, don't go over 45c. The A6000 with it's adjusted fan curve stay under 65c, I use a temp probe to kick in the case fans, when the A6000 gets warm, aswell as other components on the motherboard. It handles all night renders really well. I'm still getting my settings right, so this is far from a best setup.
Hope you are enjoying that A6000. Re: that CPU cooler, I have an Arctic Liquid Freezer II 420 in push-pull (with 2K fans) on my I9-10980xe. I would say probably the best cooler you can get for AIO.
It looks like the 2x Xeon Gold 6348 and the Threadripper 3970X have simialr performance.
I still haven't had time to test the 3970X with its different modes, stock, PBO, different power limits, etc... to see if there was any signifigant difference.
It looks like the 2x Xeon Gold 6348 and the Threadripper 3970X have simialr performance.
I still haven't had time to test the 3970X with its different modes, stock, PBO, different power limits, etc... to see if there was any signifigant difference.
The Dual Ice Lake is a tough build. For the Intel fanboys, Puget advocates keeping with a single Workstation class Xeon.Having just walked the Dual Xeon tightrope, I completely agree with them. Tons of setup and anxiety.
Anyone want to guess the RTX 3090 TiDaz rendering performance?
It looks like the 2x Xeon Gold 6348 and the Threadripper 3970X have simialr performance.
I still haven't had time to test the 3970X with its different modes, stock, PBO, different power limits, etc... to see if there was any signifigant difference.
The Dual Ice Lake is a tough build. For the Intel fanboys, Puget advocates keeping with a single Workstation class Xeon.Having just walked the Dual Xeon tightrope, I completely agree with them. Tons of setup and anxiety.
Anyone want to guess the RTX 3090 TiDaz rendering performance?
The 3090ti gaming performance is between 7 to 11% faster than the 3090. Iray tends to see larger gains than gaming does, so I would wager the 3090ti will render around 15 to 20% faster than a 3090. Exactly how fast will depend a lot on the model. The 3090ti also has faster VRAM which does well in this bench scene.
Cooling and power will be absolutely critical to any 3090ti build. The card is just insane. I personally do not recommend it, certainly not for the price it is. The original 3090 was more logical because of the 24gb VRAM that makes it stand out. The 3090ti does not offer any additional VRAM. The price and especially power are real concerns here. You have to have a very good power supply to even consider a 3090ti. Maybe the 450 Watts doesn't sound so bad by itself, but power spikes could be an issue that lesser power supplies can handle. These cards can go beyond 500 Watts. We joked about small space heaters with the 3090, but the 3090ti at 500 Watts is exactly what small space heaters in the US can be. So people need to understand that even if they water cool, that heat is getting displaced into the room. If you go to Walmart and buy a small space heater it will create the same amount of heat that a 3090ti will eject into your room. The 3090ti is very capable of heating your room.
I am running a 3090 with a 3060, so I can attest to the heat that can be generated by such a set up. I have managed it so far. And actually, a 3090 plus a 3060 is going to be much faster than a single 3090ti, and also cheaper. The caveat of course is that the 3060 only has 12gb of VRAM and so can only help up to a point.
The 3090ti is more like an experiment right now. The next generation of GPUs is expected to be very power hungry as Nvidia and AMD will be fighting hard for the performance crown. We do not not know what they will be called yet, but the fastest card is expected to have a similar power draw to the 3090ti, though it will be much faster (maybe even...twice as fast...maybe). This is coming in either Q3 or Q4 this year. After all, Ampere released in 2020, and will be 2 years old. The next gen is due, you don't need a crystal ball for that. I know that creates a tricky situation for those who are dying for a new GPU now because GPUs are finally starting to come down in price from this horrible market condition and are also in stock. So I can totally understand the difficulty in making a decision, plus for all we know the market could blow up again when the new stuff releases like it did last time. There is so much uncertainty. I do think GPUs will get cheaper in the short term.
Microcenter is advertising 3060s for $490, and 3070tis for $700. These are prices we have not seen since...well actually ever. They were available for less the day they launched, but sold out in seconds. Now they are in stock for a good while before getting sold. They are sold out now, but it took a bit. My MC currently has 98 different SKUs of GPUs in stock as I type this, with many showing "25+" in stock. They have 3050s for $380, though I still think that is pretty high, they have 25+ of just one SKU of 3050.
About half of a typical data center’s costs are related to cooling systems and cooling costs. I personally feel that both AMD and Nvidia should be focusing on reducing GPU power draw and increasing efficiency, much like Intel is doing with its new efficiency cores on CPUs. Especially at a time when energy costs are skyrocketing. Hopper is aiming for 600 watts, which is a step in the wrong direction. In other words, if we could buy the next generation of GPU that only pulled 65% of the power from the wall at the same or modestly better performance, that would be a more holistic improvement in product design.
I must also agree that different versions of the cards will create variances in terms of relative performance increases. Regarding the 3090 Ti, to help answer this question objectively, I will sample the EVGA 3090 FTW3 vs EVGA GeForce RTX 3090 Ti FTW3 with stock BIOS in the same PC. These two cards appear similar enough that we can get a reasonable measure of the differences. I should have data points next weekend.
The FE editions may offer an even clearer picture if anyone happens to have both cards that they can test in the same system (3090 FE and 3090 Ti FE).
Comments
Nice upgrade! Looks like you are rendering about 5 times faster. The "RTX" 1660 Super you mention I believe was actually a GTX card though. The A5000 has so many perks: low power, 2 slot height, exhaust style cooling, cleaner wiring, binned ram modules and no flashy lights. Very good cards IMO.
Thank you. Ah yes my bad (its a typo), 1660 super is GTX. A Zotac Amp card (2 slot) and with a backplate!
The VRAM amount was very low with my old card. (6 GB vs 24 GB in A5000)
I'm happy with the upgrade and enjoying the render performance.
I do wish A5000 had a backplate of some sort as it has some crazy SMT Components back there!
I got a good deal on a used Threadripper 3970X and motherbaord recently and decided to see what the CPU can do. I've been beta testing CTR and Hydra for Ryzen CPUs. I used CTR with this run. Average clock speed was 4.1 GHz at 1.1v. I'll try some other tests later at stock settings and PBO.
System Configuration
System/Motherboard: MSI TRX40 Pro Wifi
CPU: BRAND MODEL @ AMD Threadripper 3970X running CTR
GPU: BRAND MODEL Not used for render
System Memory: 64GB G.Skill TridentZ DDR4 3600 MHz CAS 16
OS Drive: Crucial P5 NVMe 1 TB
Asset Drive: Same
Power Supply: EVGA G2 1300 watt
Operating System: Windows 10 Pro 21H2 Build 19044.1469
Nvidia Drivers Version: N/A
Daz Studio Version: .16.03
Benchmark Results
2022-01-22 09:28:45.484 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Received update to 01800 iterations after 638.686s.
2022-01-22 09:28:45.490 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.
2022-01-22 09:28:46.088 Total Rendering Time: 10 minutes 40.31 seconds
2022-01-22 09:34:42.135 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2022-01-22 09:34:42.135 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 1800 iterations, 1.372s init, 637.313s render
Iteration Rate: (1800 / 637.313) 2.824 iterations per second
Loading Time: ((640.31) - 637.31) = 3.00 seconds
As promised, here be watercooled RTX A5000 benchmarks - as well as some hard-earned tidbits on various things (like poorly documented waterblock compatibilities and NVLink intricacies.)
System Configuration
System/Motherboard: Gigabyte Z370 Aorus Gaming 7
CPU: Intel 8700K @ stock (MCE enabled) - onboard graphics used for all display functionality
GPU: Nvidia RTX A5000 @ stock (custom watercoooled)
GPU: Nvidia RTX A5000 @ stock (custom watercoooled)
GPU: Nvidia Titan RTX @ stock (custom watercoooled)
System Memory: Corsair Vengeance LPX 32GB DDR4 @ 3000Mhz
OS Drive: Samsung Pro 980 512GB NVME SSD
Asset Drive: Sandisk Extreme Portable SSD 1TB
Power Supply: Corsair AX1500i 1500 watts
Operating System: Windows 10 Pro version 21H2 build 19044.1466
Nvidia Drivers Version: 472.12 SRD
Daz Studio Version: 4.16.0.3 64-bit
Benchmark Results - RTX A5000 #1 (PCI-E slot 2):
WDDM Driver Mode
Total Rendering Time: 1 minutes 52.39 seconds
CUDA device 0 (NVIDIA RTX A5000): 1800 iterations, 2.065s init, 108.096s render
Iteration Rate: 16.652 iterations per second
Loading Time: 4.294 seconds
TCC Driver Mode
Total Rendering Time: 1 minutes 47.62 seconds
CUDA device 0 (NVIDIA RTX A5000): 1800 iterations, 1.976s init, 103.527s render
Iteration Rate: 17.387 iterations per second
Loading Time: 4.093 seconds
Benchmark Results - RTX A5000 #2 (PCI-E slot 5):
WDDM Driver Mode
Total Rendering Time: 1 minutes 49.89 seconds
CUDA device 1 (NVIDIA RTX A5000): 1800 iterations, 1.982s init, 105.712s render
Iteration Rate: 17.027 iterations per second
Loading Time: 4.178 seconds
TCC Driver Mode
Total Rendering Time: 1 minutes 47.17 seconds
CUDA device 1 (NVIDIA RTX A5000): 1800 iterations, 2.003s init, 103.067s render
Iteration Rate: 17.464 iterations per second
Loading Time: 4.103 seconds
Benchmark Results - Titan RTX (PCI-E slot 7):
WDDM Driver Mode
Total Rendering Time: 3 minutes 18.82 seconds
CUDA device 2 (NVIDIA TITAN RTX): 1800 iterations, 2.105s init, 194.512s render
Iteration Rate: 9.254 iterations per second
Loading Time: 4.308 seconds
TCC Driver Mode
Total Rendering Time: 3 minutes 14.82 seconds
CUDA device 2 (NVIDIA TITAN RTX): 1800 iterations, 2.127s init, 190.587s render
Iteration Rate: 9.445 iterations per second
Loading Time: 4.233 seconds
Benchmark Results - RTX A5000 #1 (PCI-E slot 2) + RTX A5000 #2 (PCI-E slot 5):
WDDM Driver Mode
Total Rendering Time: 58.98 seconds
CUDA device 0 (NVIDIA RTX A5000): 906 iterations, 2.066s init, 54.630s render
CUDA device 1 (NVIDIA RTX A5000): 894 iterations, 2.031s init, 53.895s render
Iteration Rate: 33.398 iterations per second
Loading Time: 4.350 seconds
TCC Driver Mode
Total Rendering Time: 56.63 seconds
CUDA device 0 (NVIDIA RTX A5000): 897 iterations, 2.047s init, 52.334s render
CUDA device 1 (NVIDIA RTX A5000): 903 iterations, 1.926s init, 52.389s render
Iteration Rate: 34.358 iterations per second
Loading Time: 4.241 seconds
TCC Driver Mode - VRAM pooling active via NVLink bridge
Total Rendering Time: 57.0 seconds
CUDA device 0 (NVIDIA RTX A5000): 895 iterations, 2.039s init, 52.585s render
CUDA device 1 (NVIDIA RTX A5000): 905 iterations, 1.915s init, 52.770s render
Iteration Rate: 34.110 iterations per second
Loading Time: 4.230 seconds
Benchmark Results - RTX A5000 #1 (PCI-E slot 2) + RTX A5000 #2 (PCI-E slot 5) + Titan RTX (PCI-E slot 7):
WDDM Driver Mode
Total Rendering Time: 48.25 seconds
CUDA device 0 (NVIDIA RTX A5000): 707 iterations, 2.123s init, 43.074s render
CUDA device 1 (NVIDIA RTX A5000): 716 iterations, 2.177s init, 43.598s render
CUDA device 2 (NVIDIA TITAN RTX): 377 iterations, 2.172s init, 43.303s render
Iteration Rate: 41.286 iterations per second
Loading Time: 4.652 seconds
TCC Driver Mode
Total Rendering Time: 46.19 seconds
CUDA device 0 (NVIDIA RTX A5000): 712 iterations, 2.092s init, 41.822s render
CUDA device 1 (NVIDIA RTX A5000): 710 iterations, 1.972s init, 41.467s render
CUDA device 2 (NVIDIA TITAN RTX): 378 iterations, 2.014s init, 41.757s render
Iteration Rate: 43.034 iterations per second
Loading Time: 4.368 seconds
Long story short, in terms of Iray rendering performance specifically this upgrade amounts to an approximate 12% gain in rendering performance when using the two A5000s alone (no Titan RTX active) versus before the watercooling upgrade (see here.) And an approxiamtely 3% gain in rendering performance when using all three GPUs versus prior. Other than that, everything else is pretty much flat (eg. individual GPUs, when used for rendering solo, are within margin of error of how they performed pre-upgrade.)
However, where this upgrade has had a huge impact across the board is regarding (unsurprisingly) thermals. Here's a set of before/after graphs depicting the running temperatures of various key elements in this machine's GPU pipeline during a timed stress test (this thread's benchmarking scene, modified to run for 60 minutes rather than 1800 iterations):
Notice that the temperatures given in these graphs are relative to ambient room temperature (as determined by a temperature probe set up in the case's lower front air intake, and sampled on a per second basis.) Meaning that the approximatley +75 degree C temps for VRAM seen on the left were in reality approximately 100 degrees C as read directly from the GPU's sensors. As a quick reminder of the unusual case design of the Tower 900 in play here, here are before/after upgrade pics:
Why are these temperature reductions important given how little effect they demonstrably have on actual operational performance? Two reasons:
Having this headroom means that I basically never have to worry about operating temperatures ever again (assuming no component failures.) especially for weeks' long render/ai inferencing/protein folding sessions.
Some notes on NVLink memory pooling functionality:
All three GPUs in this system (two A5000s, one Titan RTX) have a single functional physical NVLink connector onboard (which fwiw are not inter-generationally physically compatible.) Meaning that a maximum of two cards from the same generation may be linked together for VRAM sharing and the like. This means that if you have a system like mine with an odd number of GPUs, there is no way for you to be able to use all of them in a functional NVLink configuration. In fact, the only way to get NVLink to boot successfully is by disabling any not-NVLinked-GPUs on the hardware level (disabling it via Device Manager is sufficient.) Otherwise you get error messages from Iray like this:
And an NVLink peer group size of 0 is the same as turning it off. So...
Also, as has been reported by others such as PugetSystems, getting NVLink to function properly - at least on A series GPUs (consumer 3000 series GPUs are said to be different...) - requires Nvidia driver mode being set on those cards to TCC. Although, interestingly, this is not effective at preventing the problem of odd numbers of NVLink capable GPUs being in the saystem resulting in NVLink setup failing. Ie. even with the two A5000s phyiscally linked together with an NVLink bridge and set TCC driver mode, and the Titan RTX neither physically linked nor set to TCC mode (driver mode is changeable on a per-card basis under Windows) the errors about failing to establish a link between all 3 GPUs persist. Makes me think there really is some arbitrarily feature siloing going on at the Nvidia driver level. Shocking, I know...
ETA: One other thing to keep in mind about NVLink on A series GPUs. Since getting vram pooling to work means having to have TCC mode active on those GPUs, this means that there is no way to use those GPUs for other things like running dForce simulations without having to reboot your system in between. Just something to keep in mind.
And finally, sa note about waterblocking the RTX A5000. Despite it being built using the same base board layout as the RTX 3080 reference design, there are several additional connectors on its front edge (right under where a full-coverage GPU waterblock's in/out ports are usually located) that make it incompatible with certain waterblock designs. I know for a fact that the EK-Classic GPU Water Block RTX 3080/3090 D-RGB is not compatible. igor'sLAB has apparently had success using the Alphacool Eisblock Aurora Plexi GPX-N RTX 3090/3080. I can personally attest that the Bitspower Classic VGA Water Block for GeForce RTX 3080 Reference Design fits it almost like a glove.
Great data and very helpful. I found the NVLink to be a pain point. With four cards and 2x pairs in NVLink mode, this was especially difficult to work with. Perhaps I will revisit one day. For now I am just using bigger cards. I would add that your 32 GB system RAM might create a bottleneck here, even without the pooling.If you are getting ready for a (4) card mainboard though, it makes perfect sense to wait until you switch that out before upgrading the modules.
Very impressive temps by the way. I'm interested to see how you eventually run the 4 card loop.
System/Motherboard: Gigabyte X299X
CPU: I9-10980XE @ 4.6 GHZ
GPU: RTX A6000 x5 (+OC)
System Memory: 96 GB Corsair Dominator Platinum DDR4-3466 (@ 2133 mhz)
OS Drive: Intel 670p M.2 1 TB NVMe
Asset Drive: Intel Optane 380 GB (905p)
Operating System: Win 11 Pro
Nvidia Drivers Version: 511.09 DCH
Daz Studio Version: 4.16
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (NVIDIA RTX A6000): 361 iterations, 2.218s init, 18.854s render
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA RTX A6000): 354 iterations, 1.986s init, 18.100s render
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA RTX A6000): 347 iterations, 2.377s init, 18.011s render
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 3 (NVIDIA RTX A6000): 347 iterations, 2.326s init, 18.006s render
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 4 (NVIDIA RTX A6000): 369 iterations, 1.865s init, 18.785s render
2022-01-25 23:23:17.240 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 22 iterations, 1.375s init, 18.691s render
Total Rendering Time: 24.51 seconds
Loading Time: 24.51-18.854 = 5.656 seconds
Rendering Performance: 1800/18.854 = 95.47 iterations/second
5th A6000 installed. Running it off the NVMe PCIe 4x bridge. CPU keeps throttling though when I crank up the cards. Just a 120mm AIO for the CPU cooler and no room left inside the chasis. I can see the render bar slug when the CPU throttles.. If I clock down the CPU my scores drop considerably, even without the CPU involved in the render. I think with a better cooler on the CPU, this config could get close to 100 iterations/second.
Yeah, currently this system isn't actually capable of loading a scene into Daz Studio big enough to need +24Gb of vram for rendering. Imo it's a very bad time right now to be breaking into the 3+ PCi-E x-16 slot motherboard market. Between PCI-E 4.0 x16 4-slot boards with iGPU processor compatibility - much less DDR5 support - not yet being a thing and the general price craziness surrounding all the new platforms right now, I'm pretty much set on picking up a 2nd 32GB set of DDR4 sticks (the max this motherboard will allow) on the used market to tide things over for the time being.
My long-term upgrade plan is to eventually replace the Titan RTX with two more A5000 class or better pro-level GPUs (possibly of the RTX 4000 series - see how things land with that later this year) with matching additional bitspower waterblocks, switch to a 4-slot evenly spaced PCI-E x16 motherboard/CPU/RAM combo, and simply resorting to some variation of these to get the GPU cooling loop under control in an as elegant way as possible.
Simple, really. Just a matter of waiting for the appropriate amount of money/products to fit the task to accumulate...
Wouldn't be surprised. While rendering with all 5 A6000s active and CPU inactive, if you pop open task manager and look at per-core CPU activity you should be able to notice 5 cores exhibiting very similar usage patterns. This is because Iray's in-built scheduler, during rendering, dedicates an entire CPU thread per active GPU in the system to active management of each GPU's contributions to the final render. Meaning that poor overall CPU performance is very much capable of adversely effecting GPU rendering performance even when the CPU itself is not officially being used as a rendering device.
Ok, this is very helpful. I ran the test scene at a higher resolution w/5000 iterations to see the CPU cores in action (byproduct 4k render attached). The i9-10980xe has 18 cores/36 threads with 48 PCIE lanes total. Looking at the idle cores, if I chill the CPU a bit, I should be able to do 5 more cards @ 4 lanes each.
See this link for the official nitty-gritty of how Iray plays out here, particularly the following:
Generally, Iray Photoreal uses all available CPU and GPU resources by default. Iray Photoreal employs a dynamic load balancing scheme that aims at making optimal use of a heterogeneous set of compute units. This includes balancing work to GPUs with different performance characteristics and control over the use of a display GPU.
Iray Photoreal uses one CPU core per GPU to manage rendering on the GPU. When the CPU has been enabled for rendering, all remaining unused CPU cores are used for rendering on the CPU. Note that Iray Photoreal can only manage as many GPUs as there are CPU cores in the system. If the number of enabled GPU resourses exceeds the number of available CPU cores in a system Iray Photoreal will ignore the excess GPUs.
System Configuration
System/Motherboard: Asus Tuf Gaming A15 (FA506IU) Laptop
CPU: AMD Ryzen 4600H @ Stock
GPU: Nvidia Geforce GTX 1660 Ti @ Stock
System Memory: Corsair Vengenance 32GB @ 3000Mhz
OS Drive: XPG SX8200 Pro - 1TB (#1)
Asset Drive: XPG SX8200 Pro - 1TB (#2)
Power Supply: Laptop 180w Power Brick (48 Wh battery)
Operating System: Windows 10 Home - Version 20H2
Nvidia Drivers Version: 511.09 (Studio)
Daz Studio Version: 4.16.0.3 Pro (64 Bit)
Benchmark Results - 1660 Ti
DAZ_STATS: Total Rendering Time: 10 minutes 9.89 seconds
IRAY_STATS: CUDA device 0 (NVIDIA GeForce GTX 1660 Ti): 1800 iterations, 2.465s init, 604.703s render
Iteration Rate: (1800/604.703) = 2.976 iterations per second
Loading Time: ((0*3600 + 10*60 + 9.89)-604.703) = 5.187 seconds
Benchmark Results - 4600H
DAZ_STATS: Total Rendering Time: 2 hours 22 minutes 53.19 seconds
IRAY_STATS: CPU: 1800 iterations, 2.415s init, 8568.140s render
Iteration Rate: (1800/8568.140) = 0.210 iterations per second
Loading Time: ((2*3600 + 22*60 + 53.19) - 8568.140) = 5.05 seconds
Results:
The 1660Ti (Mobile) iterations per second were about 3, compared to about 0.2 for the CPU, due to rounding, it was about 14 times faster on GPU. Loading time was about the same on both at around 5 seconds.
Notes:
Been using it to learn Daz3D for about a year, bought it when I was only a Programmer & SysAdmin (Still am), so not intended for this in away, so figured I'd give one final torture test. One of the GPU, then one of the CPU, just adding to the database, even if I'd recommend anyone use these for Daz3D. Obviously this is my laptop, but it's about to be replaced by my workstation which I'll post results for soon.
Can you do a combined benchmark? 1660ti + 4600H
System Configuration
System/Motherboard: Asus Tuf Gaming A15 (FA506IU) Laptop
CPU: AMD Ryzen 4600H @ Stock
GPU: Nvidia Geforce GTX 1660 Ti @ Stock
System Memory: Corsair Vengenance 32GB @ 3000Mhz
OS Drive: XPG SX8200 Pro - 1TB (#1)
Asset Drive: XPG SX8200 Pro - 1TB (#2)
Power Supply: Laptop 180w Power Brick (48 Wh battery)
Operating System: Windows 10 Home - Version 20H2
Nvidia Drivers Version: 511.09 (Studio)
Daz Studio Version: 4.16.0.3 Pro (64 Bit)
Benchmark Results - 1660Ti + 4600H
DAZ_STATS: Total Rendering Time: 9 minutes 42.77 seconds
IRAY_STATS:
Combined Iteration Rate: (1705+95) / 577.623 = 3.116 iterations per second
Loading Time: ((0*3600 + 9*60 + 42.77) - 577.623) = 5.147 seconds
Results:
Combining the 1660Ti and the 4600H on a Laptop did technically increase iterations per second compared to just a 1660Ti, but adding them together, you'll see the number of iterations if you add them together individually is less, so it's diminished returns. Also being a laptop, I'd never do it. Why? Simple, most laptops will thermal throttle and/or get way to hot to be worth the small extra gains. On this specific laptop, even though the 1660Ti has a Higher TDP about double this CPU, it cools much better, even with it half sharing it's cooling solution. The only CPU I'd consider doing this on is maybe a Threadripper where it has GPU like performance and you had the system RAM to spare.
System Configuration
Benchmark Results - 1080Ti
Benchmark Results - A6000 - WDDM Mode (ECC On)
Benchmark Results - A6000 - WDDM Mode (ECC Off)
Benchmark Results - A6000 - TCC Mode (ECC On)
Benchmark Results - A6000 - TCC Mode (ECC off)
Notes:
Testing ECC on/off and TCC/WDDM mode, all 4 combinations proved quite a large performance difference between them all. Both Turning Off ECC, and switching to TCC mode, had a positive impact to Interations per second and Loading times. Mostly having ECC off was a huge increase in performance, so it's just a good mental note, only have it on as and when needed. There was margin of error between loading times, it might be better, but not by much. As you can see, in theory, if you wanted to reduce loading times, go with TCC mode, it definitely is better. But you are looking at 2.8 seconds compared to 3.7 seconds, scaling up will only notice if this matters. TCC mode also increases Iterations, and free's up some memory, though for this GPU, not a concern. I was impressed how the 1080ti overclocks and performs in generally, but it is in pristine condition, had almost no use since purchase.
Incase your wondering the CPU is watercooled with Arctic Liquid Freezer II 360 so along with the 1080ti and it's extra fan on it's radiator, don't go over 45c. The A6000 with it's adjusted fan curve stay under 65c, I use a temp probe to kick in the case fans, when the A6000 gets warm, aswell as other components on the motherboard. It handles all night renders really well. I'm still getting my settings right, so this is far from a best setup.
Thank you. This was very helpful. This info encouraged me to combine (2) systems and clear up some clutter.
System/Motherboard: Gigabyte X299X Aorus Master
CPU: I9-10920X @ 5GHZ - All Cores (w/Hyperthreading disabled)
GPU: 4xA5000 & 5xA6000 (4x PCIe bandwidth per card via active bifurcation/36 lanes)
System Memory: 128 GB Corsair Dominator Platinum DDR4-3466 @ 2933 mhz (XMP disabled)
OS Drive: Intel Optane 905p 380GB
Asset Drive: (Same) Intel Optane 905p 380GB
Operating System: Win 11 Pro
Nvidia Drivers Version: 511.79 DCH
Daz Studio Version: 4.15
PSU: 2xCorsair 1600i & 1xCorsair 1500i
2022-02-17 22:58:46.055 Finished Rendering
2022-02-17 22:58:46.077 Total Rendering Time: 18.30 seconds
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA RTX A6000): 206 iterations, 3.285s init, 11.077s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA RTX A6000): 203 iterations, 3.252s init, 11.056s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 2 (NVIDIA RTX A6000): 229 iterations, 1.999s init, 12.282s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 3 (NVIDIA RTX A6000): 203 iterations, 3.182s init, 11.078s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 5 (NVIDIA RTX A5000): 179 iterations, 3.273s init, 11.399s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 6 (NVIDIA RTX A5000): 180 iterations, 3.019s init, 11.325s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 7 (NVIDIA RTX A5000): 192 iterations, 2.414s init, 12.238s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 8 (NVIDIA RTX A5000): 171 iterations, 3.197s init, 11.112s render
2022-02-17 22:58:55.058 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 4 (NVIDIA RTX A6000): 230 iterations, 1.914s init, 12.196s render
Loading Time: 6.018
Rendering Performance: 146.56 iterations per second
Peak Draw: 3256 Watts
9 gpu? how do you connect them? with all that pci lane drama?
Most systems have somewhere between 16 and 24 lanes. The I9-10980XE has up to 48. On this board you can use all 48 lanes through four PCIe slots (x16/x8/x16/x8). This board also allows for splitting up the lanes into groups of four via active bifurcation. This lets you run each card at 4 GB/s, which is essentially the same bandwidth provided by a Gen 3 M.2 interface.
To extract the 4x speeds for each card, I used two 4x4 MaxCloudOn daughterboards. For now, I have the 9th card attached to an 8x port. I did a test run with 11 cards, adding in two 3090s to the mix, but I ran into stability issues. I may have another go with 11 or 12 cards sometime. I think 10 A-Series cards should be very managable, skipping the bifurcation on cards 9 & 10. This hinders on me tuning the overclock on the CPU properly, or just moving the cards to a different platform.
The active bifurcation used here is more common on server and HEDT boards; I think it is a bit rare on consumer boards since you don’t have that many lanes to work with. On a consumer motherboard, you could achieve something similar using 1x4 PCIe splitters and 1x risers, but you would be capped at 1 lane per GPU (1 GB/s transfer speeds).
I remember some were curious about the A100 at one point. While those were shot down by Octane benchmarks, I just thought I would chime in with a Linus Tech Tips video on the A100. At one point he tests it with Blender, in CUDA mode it is close to a 3090 (but still slower). However in OptiX mode, which Iray uses, the A100 was way off the pace of the 3090. So while not an Iray bench, it is something we can compare performance to.
Additionally there were lots of other nagging issues with the A100 that would make it very impractical for GPU rendering. Without ray tracing cores it just makes no sense. Like I said previously it is a card built for more specific machine learning applications. Still, it is a cool video for people who like seeing this kind of hardware. He takes it apart, too.
Linus also has a mining GPU based on the A100 chip in his possession, if anyone was curious. But that card only has 8gb.
Also, I may be late to comment since i dont get notifications anymore, it is pretty cool the Threadripper 3970 can hit 3 iterations per second. So it does match up with a GTX 1080, and maybe even a touch faster. That is pretty neat, until you consider every RTX in existence is significantly faster than that. Ray tracing cores have totally changed the game, and the A100 only makes this even more clear.
My Threadripper results at 4.1 GHz were without any sort of extreme overclock or signifigant power limit increase. In other benchmarks, I had most of the cores stable at 4.60 or 4.65 GHz and a few cores at 4.5 GHz with just ambient water cooling. I forgot to run the Daz benchmark in that configuration. However, in Cinebench R23 I managed to hit almsot 52,000 points in that configuration.
You know what's insane right now? The street price of teh RTX 3080 is so high you can buy an A5000 for very close to the same money here in the good old USA. Completely nuts.
Yes. But this may be changing. Prices are starting to come down a bit, at least in the used market. At retail, we are starting to see cards actually in stock more often, but they are sitting on shelves longer because people are not as interested in paying those sky high prices like they were last year...when crypto was at its peak and trucking along. Now crypto is on a downward trend, and has been for about 2 or 3 months. There are those who are trying to wait it out, but if things do not bounce back they will be forced to reduce inventory as well.
If this happens (and I stress I am not claiming it will, just that it could), we may start to see a reset on pricing. At some point the retail units have to compete with the used market. Up until now the used market has been insane. Then everybody in the supply chain wanted a piece of this swelling pie, like AIBs and distributors. These guys are charging crazy prices to the retailers, so in some situations the retailers are not in a great position to just drop prices. They could end up losing money on some cards, unless the AIBs and/or distributors start to give them some form of rebate to work with. So this is why prices may not drop as fast as some people hope they would when crypto crashes, the companies involved have become used to this inflation and not so willing to dial it back.
But this does not impact the second hand market the same way, as there is not the same pressure to keep a price high...with the exception of scalpers. However, scalpers are limited in scope. While they certainly have hurt the market, they also have not gobbled that many cards.
I have a video that some may find interesting about the second hand market. This fellow pulled a lot of data together to try and see how much scalping has been going on. Now the thing I want you guys to remember is that while the numbers in the video are indeed high, keep in mind that Nvidia has shipped and sold tens of millions of new GPUs over the course of time this data was taken. For example, Nvidia stated they sold 9.1 million GPUs in Jan of 2021. Ampere was just 3 months old! Since that time They have sold a good 8 to 9 million every 3 months, and this is being conservative. All of his numbers are based on the US Ebay, but even so this is the largest 2nd hand market by far, dwarfing all the rest, possibly even all combined. It is like comparing California's economy to the rest of the US, where Cali alone would be the 5th largest economy in the world.
Another thing to put the numbers he gives into perspective, one single mining firm in China was running 485,000 GPUs. They were older RX 470s, yes, but this just shows the massive and mind boggling scale that these mining operations can be at. This was just ONE of them. There are many more. Link: https://www.tomshardware.com/news/china-supreme-court-sides-genesis-mining-gpu-battle
So that one single firm had more GPUs than ALL of the scalped GPUs sold on US Ebay.
Anyhow, I don't want to go too far off topic in the benchmark thread. But maybe there is at least a ray of hope that things will improve and we will be able to buy these things again at more normal pricing. If anybody reading this has waited this long, unless you absolutely need it, I would suggest waiting. This market may be starting to stabilize finally and prices might come down more. Besides that, the next generation of GPUs is just around the corner. I know it seems like this is said all the time, but the next generation of cards is expected to be on its way late this year. That's only a handful of months now. If you have held out this long, why not wait a bit more. And hopefully when the 4000 series (if that is the name) it will at the least cause much needed price drops for the 3000 series. The 4000 series may double performance, so if that is true, things could get wild because a lot of people will be upgrading....which means a lot of used cards will be hitting the market, too. As long as crypto doesn't explode again. That is the wild card that we are going to have to deal with now.
Hope you are enjoying that A6000. Re: that CPU cooler, I have an Arctic Liquid Freezer II 420 in push-pull (with 2K fans) on my I9-10980xe. I would say probably the best cooler you can get for AIO.
System/Motherboard: SuperMicro X12
CPU: 2x Xeon Gold 6348 @ Stock 3.5 GHZ
GPU: PNY A6000 (Idle)
System Memory: 32 GB ECC @ 3200 mhz - Single Channel
OS Drive: Intel 670p M.2 1 TB NVMe
Asset Drive: Intel 670p M.2 1 TB NVMe (same)
Operating System: Win 11 Pro
Nvidia Drivers Version: 511.09 DCH
Daz Studio Version: 4.20
2022-03-21 23:04:26.934 [INFO] :: Finished Rendering
2022-03-21 23:04:26.984 [INFO] :: Total Rendering Time: 10 minutes 16.30 seconds
2022-03-21 23:04:33.310 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2022-03-21 23:04:33.310 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 1800 iterations, 3.781s init, 608.982s render
Loading time: 7.318
Performance: 2.956 iterations per second
Finally doing a Gen 4 setup. CPUs only here.
It looks like the 2x Xeon Gold 6348 and the Threadripper 3970X have simialr performance.
I still haven't had time to test the 3970X with its different modes, stock, PBO, different power limits, etc... to see if there was any signifigant difference.
The Dual Ice Lake is a tough build. For the Intel fanboys, Puget advocates keeping with a single Workstation class Xeon.Having just walked the Dual Xeon tightrope, I completely agree with them. Tons of setup and anxiety.
Anyone want to guess the RTX 3090 Ti Daz rendering performance?
20.5 iterations per second (give or take 0..25)
The 3090ti gaming performance is between 7 to 11% faster than the 3090. Iray tends to see larger gains than gaming does, so I would wager the 3090ti will render around 15 to 20% faster than a 3090. Exactly how fast will depend a lot on the model. The 3090ti also has faster VRAM which does well in this bench scene.
Cooling and power will be absolutely critical to any 3090ti build. The card is just insane. I personally do not recommend it, certainly not for the price it is. The original 3090 was more logical because of the 24gb VRAM that makes it stand out. The 3090ti does not offer any additional VRAM. The price and especially power are real concerns here. You have to have a very good power supply to even consider a 3090ti. Maybe the 450 Watts doesn't sound so bad by itself, but power spikes could be an issue that lesser power supplies can handle. These cards can go beyond 500 Watts. We joked about small space heaters with the 3090, but the 3090ti at 500 Watts is exactly what small space heaters in the US can be. So people need to understand that even if they water cool, that heat is getting displaced into the room. If you go to Walmart and buy a small space heater it will create the same amount of heat that a 3090ti will eject into your room. The 3090ti is very capable of heating your room.
I am running a 3090 with a 3060, so I can attest to the heat that can be generated by such a set up. I have managed it so far. And actually, a 3090 plus a 3060 is going to be much faster than a single 3090ti, and also cheaper. The caveat of course is that the 3060 only has 12gb of VRAM and so can only help up to a point.
The 3090ti is more like an experiment right now. The next generation of GPUs is expected to be very power hungry as Nvidia and AMD will be fighting hard for the performance crown. We do not not know what they will be called yet, but the fastest card is expected to have a similar power draw to the 3090ti, though it will be much faster (maybe even...twice as fast...maybe). This is coming in either Q3 or Q4 this year. After all, Ampere released in 2020, and will be 2 years old. The next gen is due, you don't need a crystal ball for that. I know that creates a tricky situation for those who are dying for a new GPU now because GPUs are finally starting to come down in price from this horrible market condition and are also in stock. So I can totally understand the difficulty in making a decision, plus for all we know the market could blow up again when the new stuff releases like it did last time. There is so much uncertainty. I do think GPUs will get cheaper in the short term.
Microcenter is advertising 3060s for $490, and 3070tis for $700. These are prices we have not seen since...well actually ever. They were available for less the day they launched, but sold out in seconds. Now they are in stock for a good while before getting sold. They are sold out now, but it took a bit. My MC currently has 98 different SKUs of GPUs in stock as I type this, with many showing "25+" in stock. They have 3050s for $380, though I still think that is pretty high, they have 25+ of just one SKU of 3050.
About half of a typical data center’s costs are related to cooling systems and cooling costs. I personally feel that both AMD and Nvidia should be focusing on reducing GPU power draw and increasing efficiency, much like Intel is doing with its new efficiency cores on CPUs. Especially at a time when energy costs are skyrocketing. Hopper is aiming for 600 watts, which is a step in the wrong direction. In other words, if we could buy the next generation of GPU that only pulled 65% of the power from the wall at the same or modestly better performance, that would be a more holistic improvement in product design.
I must also agree that different versions of the cards will create variances in terms of relative performance increases. Regarding the 3090 Ti, to help answer this question objectively, I will sample the EVGA 3090 FTW3 vs EVGA GeForce RTX 3090 Ti FTW3 with stock BIOS in the same PC. These two cards appear similar enough that we can get a reasonable measure of the differences. I should have data points next weekend.
The FE editions may offer an even clearer picture if anyone happens to have both cards that they can test in the same system (3090 FE and 3090 Ti FE).