There's always a 1st time...1st time...1st time...1st time

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

There's always a 1st time...1st time...1st time...1st time

#1 Unread post by m.hadfield »

In a 4-thread OpenMP run of the UPWELLING case I'm getting the header line for the time stepping (the one that begins with "STEP") printed four times. The first occurrence is where it should be and the remaining 3 occur at irregular intervals, eg.

Code: Select all

   STEP   Day HH:MM:SS  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
          C => (i,j,k)       Cu            Cv            Cw         Max Speed

      0     0 00:00:00  0.000000E+00  6.585677E+02  6.585677E+02  3.884376E+11
            (00,00,00)  0.000000E+00  0.000000E+00  0.000000E+00  0.000000E+00
      DEF_HIS   - creating history file, Grid 01: ocean_his.nc
      WRT_HIS   - wrote history  fields (Index=1,1) into time record = 0000001
      DEF_AVG   - creating average file, Grid 01: ocean_avg.nc
      DEF_DIAGS - creating diagnostics file, Grid 01: ocean_dia.nc
      1     0 00:05:00  6.479408E-13  6.585677E+02  6.585677E+02  3.884376E+11
            (01,01,01)  1.069435E-09  1.950204E-06  0.000000E+00  9.216783E-06
...
      5     0 00:25:00  5.085264E-10  6.585677E+02  6.585677E+02  3.884376E+11
            (01,80,16)  7.514633E-05  6.587518E-06  5.217479E-06  2.538854E-04

   STEP   Day HH:MM:SS  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
          C => (i,j,k)       Cu            Cv            Cw         Max Speed

      6     0 00:30:00  1.053310E-09  6.585677E+02  6.585677E+02  3.884376E+11
            (01,79,16)  1.041251E-04  9.436421E-06  6.285301E-06  3.542437E-04
...
     17     0 01:25:00  5.366855E-08  6.585677E+02  6.585677E+02  3.884376E+11
            (01,79,16)  6.092481E-04  5.508986E-05  1.661070E-05  2.067027E-03

   STEP   Day HH:MM:SS  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
          C => (i,j,k)       Cu            Cv            Cw         Max Speed

     18     0 01:30:00  6.609198E-08  6.585677E+02  6.585677E+02  3.884376E+11
            (01,79,16)  6.680899E-04  6.145463E-05  1.795638E-05  2.267042E-03
...
     24     0 02:00:00  1.869037E-07  6.585677E+02  6.585677E+02  3.884376E+11
            (01,79,16)  1.057119E-03  1.077470E-04  1.640725E-05  3.592006E-03

   STEP   Day HH:MM:SS  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
          C => (i,j,k)       Cu            Cv            Cw         Max Speed

     25     0 02:05:00  2.163735E-07  6.585677E+02  6.585677E+02  3.884376E+11
            (01,79,16)  1.127504E-03  1.168364E-04  1.674078E-05  3.832213E-03
..
The problem is that the printing of the line is controlled by the first_time variable (diag.F, lines 420-421) which is declared $OMP THREADPRIVATE (mod_scalars.F, line 505). The code in question is executed by the tile that matches the condition tilecount.eq.NSUB (diag.F, line 366) and--somewhat to my surprise--that tile is not the same on every time step. Every thread eventually comes to execute the code and, since first_time is thread-private, every thread prints the header line once.

The solution appears to be to remove the $OMP THREADPRIVATE declaration. which also necessitates moving the initialisation of first_time (mod_scalars.F, line 1668) outside of its enclosing $OMP PARALLEL block.

User avatar
shchepet
Posts: 188
Joined: Fri Nov 14, 2003 4:57 pm

Re: There's always a 1st time...1st time...1st time...1st ti

#2 Unread post by shchepet »

There are in principle two way to handle this situation, which should not be mixed together.

1. The first one is as you suggested: switch first_time should have exactly the same status
as any global summation variable, which mans that it should be shared, that is not declared
as threadprivate, and should be initialized to .true. outsaide of the parallel reagion. Then,
whichever thread prints the line STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG ...
sets this variable to .false. in diag.F right a way at the moment when printing is done
(the code already have all the necessary syncronization/locking mechanism to protect
it from being modified simultaneously by multiple threads).

In principle Fortran 90 allows ititialization of a variable at the declaration,

LOGICAL first_time = .TRUE.

which can be used in this context.

A bit tricky rule to remember is that initialization at declaration makes the variable be
statically allocated, which also makes it shared amon OpenMP threads, e.g., if a variable
is declared and initialized inside a subroutine without even placing it into a module, it is
static and shared, thus retains its value from one call to the subroutine to another, and if
the subroutine is called by by multiple threads, they are aware of the changes in its value.


2. The second way is to have it threadprivate and initialize inside the parallel region.
However in this case not just printing thread, but all the threads mist both initialize it
to .true. and set it back to .false. after first call to diag.

This is fairly straightforward in a code with one thread-one-tile, but in ROMS code
it would take an effort to figure out when a thread processes its last tile durin the
first call to diag. this is doable, but requires extra logic.

The second way its generally more efficient is accessing of this variable would be
the only thing what is going on, however in this particular context it would not matter
because, there are other shared variables assesed by myltiple threads at the very
same moment.

Post Reply