I've got some more stuff on OBC_VOLCONS, including (I think) a real bug this time. To refresh your memory, I've attached the message I sent last time where all I did was increase the precision in some of the volume conservation calculations. I was using the beta version 2d (and not the current release) at that point, but I don't think that matters much.
When I was testing the UPWELLING case with open boundaries, OBC_VOLCONS and the change to the code before, I got the same answer with:
OMP, 4 tiles, 4 processors
OMP, 4 tiles, 2 processors
MPI, 4 tiles, 4 processors
However, I was just recently checking something with the current release and found that I still get different answers when I change the tiling. The problem is actually in set_DUV_bc_tile. Look at the following code (for just DUon) taken from step2d.F.
Code: Select all
DO j=-2+J_RANGE+1
DO i=-1+I_RANGE+1
DUon(i,j)=0.5_r8*(Drhs(i,j)+Drhs(i-1,j))*ubar(i,j,krhs)* &
& on_u(i,j)
END DO
END DO
#ifdef DISTRIBUTE
!
! Do a special exchange since to avoid having three ghost points for
! high order numerical stencil.
!
CALL exchange_u2d_tile (ng, Istr, Iend, Jstr, Jend, &
& Istr-3, Iend+3, Jstr-3, Jend+3, &
& DUon)
#endif
#undef I_RANGE
#undef J_RANGE
#ifdef OBC_VOLCONS
!
! Set vertically integrated mass fluxes DUon and DVom along the open
! boundaries in such a way that the integral volume is conserved.
!
CALL set_DUV_bc_tile (ng, Istr, Iend, Jstr, Jend, &
& LBi, UBi, LBj, UBj, &
& krhs, &
# ifdef MASKING
& umask, vmask, &
# endif
& om_v, on_u, ubar, vbar, &
& Drhs, DUon, DVom)
#endif
Code: Select all
IF (WESTERN_EDGE) THEN
DO j=Jstr-1,JendR
Duon(Istr,j)=0.5_r8*(Drhs(Istr,j)+Drhs(Istr-1,j))* &
& (ubar(Istr,j,krhs)-ubar_xs)* &
& on_u(Istr,j)
# ifdef MASKING
Duon(Istr,j)=Duon(Istr,j)*umask(Istr,j)
# endif
END DO
END IF
To fix this, I just changed the range of all the calculations in set_DUV_bc_tile to be just like they are in step2d.F. This includes the exchange_u2d_tile and exchange_v2d_tile (if necessary) for MPI code. I don't know if what I did is the best way to handle this, but I've attached a code snippet for you to look at.
Now (this fix combined with what I did last time), I get the same answer with:
OMP, 4 tiles, 4 processors
OMP, 4 tiles, 2 processors
OMP, 2 tiles, 2 processors
MPI, 4 tiles, 4 processors
MPI, 2 tiles, 2 processors
Let me know if you have any questions.
Mike
msd@ccpo.odu.edu