SNC/NPS Tuning For Ryzen Threadripper 7000 Series To Further Boost Performance

Written by Michael Larabel in Software on 1 December 2023 at 10:20 AM EST. Page 4 of 4. 10 Comments.
ClickHouse benchmark with settings of 100M Rows Hits Dataset, First Run / Cold Cache. Default - Disabled was the fastest.
ClickHouse benchmark with settings of 100M Rows Hits Dataset, Third Run. Default - Disabled was the fastest.
PostgreSQL benchmark with settings of Scaling Factor: 100, Clients: 1000, Mode: Read Write. Default - Disabled was the fastest.
PostgreSQL benchmark with settings of Scaling Factor: 100, Clients: 1000, Mode: Read Write, Average Latency. Default - Disabled was the fastest.
PostgreSQL benchmark with settings of Scaling Factor: 1000, Clients: 1000, Mode: Read Write. Default - Disabled was the fastest.
PostgreSQL benchmark with settings of Scaling Factor: 1000, Clients: 1000, Mode: Read Write, Average Latency. Default - Disabled was the fastest.

Database software tended to perform the best in the default (disabled) mode.

Graph500 benchmark with settings of Scale: 26. SNC4 was the fastest.
Graph500 benchmark with settings of Scale: 26. SNC4 was the fastest.

The Graph500 HPC benchmark had some stellar improvements in SNC2 and then SNC4 modes.

PyTorch benchmark with settings of Device: CPU, Batch Size: 16, Model: Efficientnet_v2_l. Default - Disabled was the fastest.
PyTorch benchmark with settings of Device: CPU, Batch Size: 32, Model: ResNet-152. Default - Disabled was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 16, Model: ResNet-50. Default - Disabled was the fastest.
TensorFlow benchmark with settings of Device: CPU, Batch Size: 64, Model: ResNet-50. Default - Disabled was the fastest.

PyTorch and TensorFlow didn't benefit from the NUMA topology adjustments...

OpenVINO benchmark with settings of Model: Face Detection FP16, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Face Detection FP16, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Person Detection FP16, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Person Detection FP16, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Person Detection FP32, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Person Detection FP32, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Vehicle Detection FP16, Device: CPU. SNC2 was the fastest.
OpenVINO benchmark with settings of Model: Vehicle Detection FP16, Device: CPU. SNC2 was the fastest.
OpenVINO benchmark with settings of Model: Face Detection FP16-INT8, Device: CPU. SNC4 was the fastest.
OpenVINO benchmark with settings of Model: Face Detection FP16-INT8, Device: CPU. SNC4 was the fastest.

But the OpenVINO AI toolkit did benefit from the Sub-NUMA Clustering controls with this HP workstation BIOS. There were minor gains to the throughput for these AI benchmarks but where it was really dramatic is lower latency during these inference tests.

PETSc benchmark with settings of Test: Streams. SNC4 was the fastest.

Meanwhile for software like PetSc there was no measurable difference.

It comes down to the particular software of interest/use on your AMD Ryzen Threadripper workstation whether it's a wise idea adjusting the Sub-NUMA Clustering / Nodes Per Socket default. For NUMA-aware software this can mean some very nice performance gains as shown in cases like OpenVINO, OpenFOAM, Graph500, LULESH, code compilation workloads, etc. Those upgrading to an AMD Ryzen Threadripper 7000 series system and wanting to see all 196 benchmarks I ran in full for this SNC2/SNC4 comparison can find the data via this result page. NPS adjustments are a common consideration in the EPYC server/HPC space but for Threaderipper processors as well this can be a very beneficial setting worth proper consideration.

Thanks to HP for supplying the HP Z6 G5 A workstation for review on Phoronix that has made all of this Ryzen Threadripper PRO 7995WX testing possible.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.