IF YOU ARE INTERESTED IN THIS POST, PLEASE ALSO SEE: viewtopic.php?f=29&t=4665
I hope some of you find this helpful. I needed a computer for a new graduate student to run models and do data analysis. For 1500$ I built a Ryzen 1800x box. Since this is a new and popular new CPU, I thought others might be interested in benchmarks. Ryzen has 8 real cores, and can have up to 16 threads. I compare my results to runs on a dual CPU Xeon X5650 with 12 cores purchased in 2011 for 5400$. Some of my understanding of the results is grounded in the discussion in viewtopic.php?f=17&t=2001&p=7771#p7714 ; this is a useful read for anyone trying to make ROMS run faster.
I ran the benchmark test case with the largest domain (ocean_benchmark3.in); all results are the same with the next smaller domain (ocean_benchmark2.in) when the appropriate number of tiles are chosen. For the large domain, I used a tiling of 4x32; this was empirically found to be optimal (with the Xeon system, I used 4x30). I used gfortran version 6.3; I will install ifort when my student arrives and he can get a student license; I will update this post then. The model was parallelized with openMP, and I used the default flags in Linux-gfortran.mk; compling with -march=znver1 made no difference.
On the Ryzen system, the memory speed was set to either the default of 2400MHz, or the maximum supported speed for the memory I purchased of 3200Mhz.
I attach two plots; the first shows the time to compute one grid point for one timestep for various numbers of threads. To calculate the time to run one timestep, multiply by the grid size of 2048*256. This figure illustrates
- The new CPU is about twice as fast as the old one; time marches on. Despite the comments below, the Ryzen box is always faster than the (old!) dual-Xeon setup.
- ROMS on Ryzen in this application does not show much perfomance increase beyond 4 threads; there is some marginal increase in performance to 8 threads.
- Hyperthreading (any threads beyond 8 ) hurts performance on this CPU in ROMS with Ryzen. This is not true for compiling or some of my biology python codes, but it is certainly true for ROMS. The Intel chip does gain (some) with virtual threads.
- Overclocking the Ryzen CPU made nearly no difference on ROMS run speeds -- suggesting memory is the bottleneck. I do not show the CPU overclocking results, since they would be visually indistinguishable from the other results.
- ROMS on Ryzen shows somewhat better scaling with faster memory speed. Faster memory on a 1 thread job increases performance by 7%, on an 8 thread job by 17%.
- ROMS on the dual-Xeon shows better scaling with increased numbers of threads. I strongly suspect, due to the dependence of scaling on memory speed in Ryzen shown above, that the the Xeon scales better because it has three memory channels, while Ryzen only has two.
I have money for a 7k$ system that I need to spend; anybody have a new Intel system to compare to? .
I also welcome comments (Sacha?) about what I am being stupid about...
Jamie