Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

Written by Michael Larabel in Software on 20 February 2019 at 11:26 AM EST. Page 1 of 6. 13 Comments.

With the GCC 9 compiler due to be officially released as stable in the next month or two, we've been running benchmarks of this near-final state to the GNU Compiler Collection on a diverse range of processors. In recent weeks that has included extensive compiler benchmarks on a dozen x86_64 systems, POWER9 compiler testing on the Talos II, and also the AArch64 compiler performance on recent releases of GCC and LLVM Clang. In this latest installment of our GCC 9 compiler benchmarking is an extensive look at the AMD EPYC Znver1 performance on various releases of the GCC compiler as well as looking at various optimization levels under this new compiler on the Znver1 processor.

First up in this article is a comparison of the GCC 6.5, GCC 7.4, GCC 8.2, and GCC 9.0.1 compiler performance. These past four release series to the GNU Compiler Collection were tested while the compiler flags were set to "-O3 -march=znver1" as well as "-O3 -march=x86-64" to look at the compiler's performance at generating optimized code specifically for the first-generation AMD Zen processors as well as generic x86-64 code. This multi-way comparison shows how the AMD Znver1 tuning has evolved since it was introduced originally in 2016 with the GCC 6 compiler as well as looking at the overall direction of the GCC x86_64 performance.

GCC 9.0 Znver1 x86-64 Linux Compiler Benchmarks

Following that multi-way compiler comparison are some follow-up tests using GCC 9.0.1 (the 20190210 snapshot) when looking at various compiler optimization levels. Those tested optimization levels include:

-O0 (No optimizations)
-Og (Basic optimizations not affecting the debug-ability of the binary)
-O1 (Optimize)
-O2 (More optimizations)
-O2 -ftree-vectorize -ftree-slp-vectorize (More optimizations plus vectorization; these vectorize options might be enabled by default for -O2 in GCC 10)
-O2 -march=znver1 (This mid-tier optimization level while also adding in the Znver1 targeting)
-O2 -flto (The mid-tier optimization level plus using Link-Time Optimizations)
-O3 (The optimization level most often pursued for aggressive performance)
-O3 -march=znver1 (This aggressive optimization level plus Znver1 targeting)
-O3 -march=znver1 -flto (Aggressive optimizations, Znver1 targeting, and Link-Time Optimizations)
-Ofast -march=znver1 (The aggressive optimizations that also break strict standards compliance with potentially unsafe math while also having Znver1 targeting)

For those wondering about the GCC 9 optimization levels and the resulting impact on the performance, this article should yield those answers for this latest GCC9 snapshot for what will be released as GCC 9.1.0 in either late March or April.

This is our largest AMD EPYC compiler benchmarking roundabout we've done in a while. All of this GCC compiler benchmarking was done with a Dell PowerEdge R7425 server sporting two AMD EPYC 7601 processors and 512GB of RAM, allowing plenty of memory for LTO'ing in those tests. Ubuntu 18.04 x86_64 was running on the system while opting for a Linux 5.0 Git kernel snapshot.

All of the tested GCC compilers were built in their release/optimized modes and all CFLAGS/CXXFLAGS maintained the same except where otherwise noted. Via the Phoronix Test Suite a wide range of benchmarks were carried out. First up in this article is looking at the GCC 6.5 through GCC 9.0 compiler performance of generic x86-64 binaries and the Znver1 tuned benchmark binaries.


Related Articles