SNC/NPS Tuning For Ryzen Threadripper 7000 Series To Further Boost Performance

Written by Michael Larabel in Software on 1 December 2023 at 10:20 AM EST. Page 2 of 4. 10 Comments.
NAS Parallel Benchmarks benchmark with settings of Test / Class: CG.C. SNC4 was the fastest.
NAS Parallel Benchmarks benchmark with settings of Test / Class: EP.C. SNC4 was the fastest.

For NUMA aware software the NPS/SNC tuning can be an easy win for performance.

NAS Parallel Benchmarks benchmark with settings of Test / Class: FT.C. Default - Disabled was the fastest.
NAS Parallel Benchmarks benchmark with settings of Test / Class: IS.D. SNC2 was the fastest.
CloverLeaf benchmark with settings of Input: clover_bm16. Default - Disabled was the fastest.
Rodinia benchmark with settings of Test: OpenMP Leukocyte. Default - Disabled was the fastest.
Rodinia benchmark with settings of Test: OpenMP Streamcluster. Default - Disabled was the fastest.

But it's not a universal win so it really comes down to what workloads you're running most frequently to know whether it's a worthwhile adjustment on your HEDT/workstation.

Algebraic Multi-Grid Benchmark benchmark with settings of . SNC4 was the fastest.
Xcompact3d Incompact3d benchmark with settings of Input: input.i3d 129 Cells Per Direction. SNC4 was the fastest.
Xcompact3d Incompact3d benchmark with settings of Input: input.i3d 193 Cells Per Direction. SNC4 was the fastest.
OpenFOAM benchmark with settings of Input: drivaerFastback, Medium Mesh Size, Mesh Time. SNC4 was the fastest.
OpenFOAM benchmark with settings of Input: drivaerFastback, Medium Mesh Size, Execution Time. SNC4 was the fastest.

The performance benefits though from this simple BIOS adjustment can be quite beneficial though for various real-world workloads like the computational fluid dynamics (CFD) performance with the OpenFOAM software.

Quantum ESPRESSO benchmark with settings of Input: AUSURF112. SNC4 was the fastest.
SPECFEM3D benchmark with settings of Model: Layered Halfspace. SNC4 was the fastest.
LULESH benchmark with settings of . SNC4 was the fastest.

For those using AMD Ryzen Threadripper 7000 series in production it's certainly a worthwhile consideration to make prior to deployment for evaluating the best Sub-NUMA Clustering / Nodes Per Socket configuration for your environment.


Related Articles