I am running an online nesting configuration and finding a tremendous slow-down in computational speed when transitioning from offline to online nesting. Here are some numbers I have calculated:
Code: Select all
Run # of time steps # of nodes Total run time
(16 cores per node)
Parent grid 1000 1 ~ 37 min
4 ~ 13 min
Child grid 1000 1 ~ 26 min
4 ~ 8 min
Online nesting 100 1 ~ 53 min
with parent and 4 ~ 51 min
child grids
However, I am finding that when I transition to the online nesting configuration there is very little time difference between using a single node or several nodes (16 processes on 1 node, or 64 on 4 nodes). Additionally, summing the computational times of the parent and child individual runs for 1000 time steps is ~ 63 min on a single node. Whereas scaling the online run for both grids up to 1000 time steps on a single node would take almost 9 hours to run. The online nesting takes about 9 times longer to run than the parent and child in sequence. It appears that the parallelization of online nesting, particularly the fine2coarse and coarse2fine steps may have a bottleneck that significantly slows down the computational time.
We don’t think this is a memory issue as the nested configuration on 1 node only takes up about 60% of the total node memory.
Is anyone else finding similar results? I may have made a mistake in my model configuration. Any guidance would be greatly appreciated.
Thank you ROMS developers for continuing to improve the online nesting capabilities!