Effects of PCI-E bandwidth on load and render times...

GaryHGaryH Posts: 66
edited June 2017 in The Commons

I'm in the process of squeezing every last bit of Iray performance out of my circa 2011 computer platform.  I'm providing this info for those who might want to do the same and because no one has measured this in DS Iray before.  My processor is an Intel 2600K with a mild 4GHz OC running on a Asus P8Z68-V Pro motherboard.  The DDR3 ram is maxed out at 32GB, which so far seems fine for what I do.  The 2600K has just 16 PCI-E 2.0 lanes and the Z68 provides another 4.  The motherboard has three PCI-E x16 slots.  If you put GPUs in the first two slots then they run in x8 / x8 mode.  The third slot can run in x4 (by loosing other peripheral ports like front panel USB 3.0) or x1.  I searched the web to no avail to find info on whether a modern Nvidia GPU could run at PCI-E 2.0 x4, or even x1.  And if they could, what would the Iray performance be like?  I was hopeful based on companies like Amfeltec selling x4 splitters and render clusters that run at x1 off the motherboard.

After reading MEC4D's description of her Titan X updrade I decided that going with a dedicated GPU for the display and two powerful render GPUs would be ideal.  I already had two GTX 970 Strix cards so the first step was to add a Titan X Pascal into the mix.  After determining that the GPU ran well in the third x4/x1 slot I completed the upgrade by adding the new Titan Xp.  The second 970 is in storage.

I'm using Dine on the Orient Express as the benchmark because it's a complicated scene with lots of lights (60), geometry, and big textures.  The scene takes about a minute to load into the GPUs (depending on the PCI-E bandwidth as can be seen below) and requires around 10,000 iterations to look really good.  I ran it to 4000 iterations.

The GPUs are set up as follows:

Slot 2, PCI-E 2.0 x8:  GTX 970 Strix (display only)

Slot 5, PCI-E 2.0 x8:  Titan Xp

Slot 7, PCI-E 2.0 x4 or x1 (set in BIOS):  Titan X Pascal

I ran three benchmarks, one with the Titan Xp alone, one with the Titan Xp and Titan X Pascal at x4, and one with the Titan Xp and Titan X Pascal at x1.  First observation, the log's GPU "init" time varies based on the time it takes to load the GPU on the slowest PCI-E slot.  As the init times between GPUs on slots with different bandwidth is almost identical I suspect the software is waiting for each card to acknowledge that a chunk of the scene has been successfully loaded.

So here are the results, remember the Titan Xp is always at x8:

Titan X Pascal at x1:

2017-06-13 10:45:01.452 Total Rendering Time: 6 minutes 11.91 seconds
017-06-13 10:47:59.394 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Device statistics:
2017-06-13 10:47:59.394 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (TITAN Xp): 2122 iterations, 63.247s init, 298.921s render
2017-06-13 10:47:59.395 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 2 (TITAN X (Pascal)): 1878 iterations, 63.223s init, 299.617s render

Titan X Pascal at x4:

2017-06-09 14:27:16.537 Total Rendering Time: 6 minutes 0.4 seconds
2017-06-09 14:31:55.502 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Device statistics:
2017-06-09 14:31:55.502 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (TITAN Xp): 2114 iterations, 54.323s init, 297.076s render
2017-06-09 14:31:55.502 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 2 (TITAN X (Pascal)): 1886 iterations, 54.323s init, 296.991s render

Titan Xp alone at x8:

2017-06-13 11:37:01.565 Total Rendering Time: 10 minutes 23.66 seconds
2017-06-13 11:39:34.677 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : Device statistics:
2017-06-13 11:39:34.677 Iray INFO - module:category(IRAY:RENDER):   1.0   IRAY   rend info : CUDA device 0 (TITAN Xp): 4000 iterations, 51.673s init, 563.397s render

 

As the load times are based on the slowest PCI-E slot (with a rendering GPU) and not the GPU itself we can use the Titan Xp's load time to give us an x8 slot load time.  So to summarize, here is performance hit you take on load time based on the load time of the slowest PCI-E slot in a multi-gpu configuration:

PCI-E 2.0 x8 slot scene load time: 51.673s

PCI-E 2.0 x4 slot scene load time: 54.323s   -5.128%

PCI-E 2.0 x1 slot scene load time: 63.223s   -22.35% 

So certainly there is a substantial hit in load times at x1 compared to x4 or x8, something that those creating animations should note.  However, for this particular render, while the load time hit between x4 and x1 is -16.38%, the hit to overall render time is minimal at just 11.87 seconds or -3.19%, and this percentage would only decrease with more iterations.  Also note that if you have a newer motherboard and processor that supports PCI-E 3.0 lanes then your bandwidth is double that of what I've reported here.  My x8 is your x4.

Dine on the Orient Express (Titan Xp, Titan X (x1 mode), 4000).png
2560 x 1440 - 2M
Dine on the Orient Express (Titan Xp, 4000).png
2560 x 1440 - 2M
Post edited by Chohole on

Comments

Sign In or Register to comment.