Online Nesting slow

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
ablowe
Posts: 3
Joined: Tue Sep 24, 2013 2:22 am
Location: University of California Santa Cruz

Online Nesting slow

#1 Unread post by ablowe »

Hi all,
I am running an online nesting configuration and finding a tremendous slow-down in computational speed when transitioning from offline to online nesting. Here are some numbers I have calculated:

Code: Select all

Run                # of time steps   # of nodes            Total run time
                                     (16 cores per node)
Parent grid        1000               1                    ~ 37 min
                                      4                    ~ 13 min
Child grid         1000               1                    ~ 26 min
                                      4                    ~ 8 min
Online nesting     100                1                    ~ 53 min
with parent and                       4                    ~ 51 min
child grids
When running the parent grid or child grid individually, increasing the number of nodes used by a factor of 4 decreases computational time by roughly a factor of 3, as expected.

However, I am finding that when I transition to the online nesting configuration there is very little time difference between using a single node or several nodes (16 processes on 1 node, or 64 on 4 nodes). Additionally, summing the computational times of the parent and child individual runs for 1000 time steps is ~ 63 min on a single node. Whereas scaling the online run for both grids up to 1000 time steps on a single node would take almost 9 hours to run. The online nesting takes about 9 times longer to run than the parent and child in sequence. It appears that the parallelization of online nesting, particularly the fine2coarse and coarse2fine steps may have a bottleneck that significantly slows down the computational time.

We don’t think this is a memory issue as the nested configuration on 1 node only takes up about 60% of the total node memory.

Is anyone else finding similar results? I may have made a mistake in my model configuration. Any guidance would be greatly appreciated.

Thank you ROMS developers for continuing to improve the online nesting capabilities! :D

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Online Nesting slow

#2 Unread post by arango »

Yes, we are aware of this and it is in our TO-DO list. The issue here is that in two-way nesting (default) we are currently making global gathering of data (full 2D and 3D arrays) between all the MPI nodes to compute easily the fine-to-coarse averaging. This causes a bottle neck because of the excessively long communications between nodes. Notice that ONE_WAY is faster because we don't need these type communications. We code it like this because it was much easier and we have the priority to have the nesting working correctly. The problem get worse when users become greedy with the number of partitions for a particular application. We need to experiment and always use optimal partition!

I already know what to do but it is complex. It is very tricky to get an efficient manipulation of the summation (averaging) within the coarser donor grid cell in the presence of parallel partitions. I will start looking at this when I get a chance. I am very busy now with my other projects.

ablowe
Posts: 3
Joined: Tue Sep 24, 2013 2:22 am
Location: University of California Santa Cruz

Re: Online Nesting slow

#3 Unread post by ablowe »

Hi all,
Thank you for your response. I have run the same model using both 2-way and 1-way nesting. The 1-way nesting code does not seem to speed up the model run time very much.

Code: Select all

                  # time steps  # of nodes     run time
                                (16 cpu/node)
2-way nesting     100            1             ~ 54 min
1-way nesting     100            1             ~ 52 min
Is anyone else finding similar results in timing? I may have made a mistake in my model configuration. My model has a donor grid encompassing the US west coast and a refined nest in the central California region. To switch from 2-way to 1-way nesting I have recompiled with the CPP def ONE_WAY defined and ran the model with the same input file. Am I missing a step to run a 1-way nesting configuration?

CPP Definitions defined:
2-way nesting: NESTING, TIME_INTERP_FLUX
1-way nesting: NESTING, TIME_INTERP_FLUX, ONE_WAY

Thank you very much for your help!

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Online Nesting slow

#4 Unread post by arango »

What is the size of your grids? Also, what are the partitions that you are using? I want to be sure that you are not using excessive parallel nodes. There is always an optimal partition in ROMS. This is computer engineering and we should abstain to put a lot of processes just because they are available. There most be a balance between the tile size, cash and too many MPI communications. Sometimes less number of parallel nodes is more efficient.

By the way, you should not use TIME_INTERP_FLUX. This is a developers option and should be not activated by users. The mass transport from the coarse grid into the refined grid should be persisted over all finer grid steps for a singe coarse grid time-step and not time-interpolated. This option will be removed in the future. I just have it there for debugging testing.

ablowe
Posts: 3
Joined: Tue Sep 24, 2013 2:22 am
Location: University of California Santa Cruz

Re: Online Nesting slow

#5 Unread post by ablowe »

Both grids are large. The nest has a refinement factor of 3.
Parent grid rho points: 556 x 541
Nested grid rho points: 542 x 407

I was running on a single node with 16 processors (4x4). I removed TIME_INTERP_FLUX and the timing has remained approximately the same. Thank you for you help!

Code: Select all

! Number of nested grids.

      Ngrids =  2

! Number of grid nesting layers.  This parameter is used to allow refinement
! and composite grid combinations.

  NestLayers =  2

! Number of grids in each nesting layer [1:NestLayers].

GridsInLayer =  1 1

! Grid dimension parameters. See notes below in the Glossary for how to set
! these parameters correctly.

          Lm == 554 540         ! Number of I-direction INTERIOR RHO-points
          Mm == 539 405         ! Number of J-direction INTERIOR RHO-points
           N ==  42  42       ! Number of vertical levels

        Nbed =  0             ! Number of sediment bed layers

         NAT =  2             ! Number of active tracers (usually, 2)
         NPT =  0             ! Number of inactive passive tracers
         NCS =  0             ! Number of cohesive (mud) sediment tracers
         NNS =  0             ! Number of non-cohesive (sand) sediment tracers

! Domain decomposition parameters for serial, distributed-memory or
! shared-memory configurations used to determine tile horizontal range
! indices (Istr,Iend) and (Jstr,Jend), [1:Ngrids].

      NtileI == 04 04                            ! I-direction partition
      NtileJ == 04 04                            ! J-direction partition

Tomasz
Posts: 23
Joined: Tue Oct 07, 2008 11:27 am
Location: Marine Institute, Ireland

Re: Online Nesting slow

#6 Unread post by Tomasz »

Hi, you may want to have a look at the figures I posted last year:
viewtopic.php?f=17&t=3471&p=13125&hilit=Tomasz#p13125

Post Reply