Opened 4 years ago

Closed 4 years ago

#861 closed upgrade (Done)

VERY IMPORTANT: Accelerating nested applications with refinement

Reported by: arango Owned by:
Priority: major Milestone: Release ROMS/TOMS 3.9
Component: Nonlinear Version: 3.9
Keywords: Cc:

Description

John Warner brought to my attention that the nesting algorithms were computing the vertical interpolation weights (Vweight) at every timestep for each nested grid.

Currently, the vertical interpolation weights are used in composite grids because their grids are usually not coincident. They are not needed in refinement grids because the donor and receiver grids have the same number of vertical levels and matching bathymetry. However, in the future, it is possible to have configurations that require vertical interpolation weights in refinement. The switch get_Vweights is introduced to control if such weights are computed or not. If false, it will accelerate computations because of less distributed-memory communications.

Therefore, in nesting.F we now have:

     IF ((isection.eq.nzwgt).and.get_Vweights) THEN
       DO tile=last_tile(ng),first_tile(ng),-1
         CALL z_weights (ng, model, tile)
       END DO
       RETURN
     END IF

Here, get_Vweiths is a new logical switch added in mod_nesting.F. It is initialized in set_contact.F as:

!
!  Set the switch to compute vertical interpolation weights. Currently,
!  they are only needed in non-coincident composite grids.
!
!
      IF (.not.ANY(Lcoincident).and.ANY(Lcomposite)) THEN
        get_Vweights=.TRUE.
      ELSE
        get_Vweights=.FALSE.
      END IF

where the coincident and composite variables are the set in the nested grid configuration NetCDF file and computed in Matlab script contact.m.


Profiling:

Several runs were made to measure the improvement in nested applications with refinement grids (no composite grids):

  • The LAKE_JERSEY test case (grids a and d) with one refinement grid runs 21.11% faster on 12 processes.
  • The LAKE_JERSEY test case (grids a, c, d, and e) with tree refinement grids runs 36.83% faster on 12 processes.
  • Our operational US East Coast application (grids DOPPIO, PIONEER, and ARRAY) with 2 telescoping refinement grid runs 24.77% faster on 12 processes.

That's quite an improvement with such a simple change in the code. Obviously, as the number of nested grids are increased, the improvement is greater.

Change History (1)

comment:1 by arango, 4 years ago

Resolution: Done
Status: newclosed
Note: See TracTickets for help on using tickets.