In ROMS, we have the option to impose a volume conservation in applications with open boundaries if tidal forcing is not anabled:
Code: Select all
! Set lateral open boundary edge volume conservation switch for
! nonlinear model and adjoint-based algorithms. Usually activated
! with radiation boundary conditions to enforce global mass
! conservation, except if tidal forcing is enabled. [1:Ngrids].
VolCons(west) == T ! western boundary
VolCons(east) == T ! eastern boundary
VolCons(south) == T ! southern boundary
VolCons(north) == F ! northern boundary
ad_VolCons(west) == T ! western boundary
ad_VolCons(east) == T ! eastern boundary
ad_VolCons(south) == T ! southern boundary
ad_VolCons(north) == F ! northern boundary
The computation of bc_flux and bc_area are simple but they are subject to round-off in parallel and serial applications with tile partitions. Both values are usually large numbers, specially in big domains and deep bathymetry. This makes the round-off problem worse.
In parallel applications, we need to have a special code to carry out the global reduction operation (summation):
Code: Select all
IF (ANY(VolCons(:,ng))) THEN
#ifdef DISTRIBUTE
NSUB=1 ! distributed-memory
#else
IF (DOMAIN(ng)%SouthWest_Corner(tile).and. &
& DOMAIN(ng)%NorthEast_Corner(tile)) THEN
NSUB=1 ! non-tiled application
ELSE
NSUB=NtileX(ng)*NtileE(ng) ! tiled application
END IF
#endif
!$OMP CRITICAL (OBC_VOLUME)
IF (tile_count.eq.0) THEN
bc_flux=0.0_r8
bc_area=0.0_r8
END IF
bc_area=bc_area+my_area
bc_flux=bc_flux+my_flux
tile_count=tile_count+1
IF (tile_count.eq.NSUB) THEN
tile_count=0
#ifdef DISTRIBUTE
buffer(1)=bc_area
buffer(2)=bc_flux
op_handle(1)='SUM'
op_handle(2)='SUM'
CALL mp_reduce (ng, iNLM, 2, buffer, op_handle)
bc_area=buffer(1)
bc_flux=buffer(2)
#endif
ubar_xs=bc_flux/bc_area
END IF
!$OMP END CRITICAL (OBC_VOLUME)
END IF
Notice that in any computer, (A + B + C + D) will not give the same results as (A + D + C + B). Therefore, when we do floating-point parallel reduce operations the order of the reduction between chunks of numbers matters. This order is not deterministic in routine like mpi_allreduce, which is called by the ROMS routine mp_reduce. As matter of fact, there are three different ways in mp_reduce to compute this reduction operation using either mpi_allreduce, mpi_allgather, and mpi_isend/mpi_irecv. All three methods are subject to round-off problems.
What To Do:
- Currently, there is not much that we can do. Sasha suggested to carry the summation-by-pairs to reduce the round-off error. This summation adds only comparable numbers from the chunk of parallel data.
- If you are using frequent data from larger scale models, you can suppress volume conservation to see what happens. Specially, you need frequent free-surface (zeta) data. You may get into trouble if running for long (years) periods.
- Impose tidal forcing at the open boundary since we need to avoid volume conservation.
- Notice that we not longer can compare NetCDF files byte by byte to check for parallel bugs when volume conservation is activated:
To compute binary differences, you need to activate CPP options: DEBUGGING, OUT_DOUBLE, and POSITIVE_ZERO.
Code: Select all
% diff r1/ocean_avg.nc r2/ocean_avg.nc Binary files r1/ocean_avg.nc and r2/ocean_avg.nc differ % diff r1/ocean_his.nc r2/ocean_his.nc Binary files r1/ocean_his.nc and r2/ocean_his.nc differ % diff r1/ocean_rst.nc r2/ocean_rst.nc Binary files r1/ocean_rst.nc and r2/ocean_rst.nc differ