Custom Query (964 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (607 - 609 of 964)

Ticket Owner Reporter Resolution Summary
#734 arango Fixed Important: Corrected bug in dateclock.F (routine caldate)
Description

Corrected a bug in the routine caldate (module dateclock.F) when computing the fractional hour and fractional minutes. This only affects an application that uses the DIURNAL_SRFLUX, which modulate the shortwave radiation SRFLX (read and interpolated elsewhere) by the local diurnal cycle (a function of longitude, latitude, and day-of-year) using the Albedo equations.

We need to call caldate for the Albedo equations:

    CALL caldate (tdays(ng), yd_r8=yday, h_r8=hour)

Here the fractional hour of the date was not computed correctly.

I was tracking why ROMS was not producing the same solution as before the change to the Calendar/Date/Clock update (src:ticket:724). It turns out that difference in the solution is due to round-off.

The new routines compute and more exact date variables because we now use the floating-point rounding function with a Fuzzy or Tolerant Floor function:

     seconds=DayFraction*86400.0_r8
     CT=3.0_r8*EPSILON(seconds)             ! comparison tolerance
     seconds=ROUND(seconds, CT)             ! tolerant round function

The ROUND function eliminates round-off by improving the floating point representation in the computer.

So if you are using the option DIURNAL_SRFLUX the solutions are not reproducible because of the round-off. The solutions with the new code are more precise!

#735 arango Done Very IMPORTANT: ROMS Profiling Overhault
Description

This is an important update because I revised the entire profile before I start looking for ways to improve the computational efficiency. In particular, I have been experimenting with ways to accelerate the ROMS nesting algorithms. Currently, I am concentrating on routines mp_assemble and mp_aggregate of distribute.F.

This update also includes a correction with the management of ntend(ng) and few changes to the arguments for routines wclock_on and clock_off.

What it is new?

  • The routine mp_assemble is a multidimensional version of mp_collect and are use in the nesting and 4D-Var algorithms, respectively. Both routines assemble/collect elements of arrays from all the MPI nodes. Each node process parts of these arrays computed from tiled state variables.

The assembly/collection operation can be code with high-level MPI functions like mpi_allgather or mpi_allreduce (summation since all arrays are initialized from zero). Alternatively, one could use lower-level routines mpi_irecv, mpi_isend, and mpi_bcast similarly at what is done in the tile-halo exchanges (mp_exchange.F). It turns out the lower-level functions are actually more efficient than the higher-level functions. This is the case for us using generic MPI libraries (like OpenMPI). The high-level functions are usually optimized in millions of dollars supercomputers and compilers by the vendors.

Notice that at the top of distribute.F, we have the following internal CPP options to set the desired communication options. The default is to have:

# undef  ASSEMBLE_ALLGATHER /* use mpi_allgather im mp_assemble */
# undef  ASSEMBLE_ALLREDUCE /* use mpi_allreduce in mp_assemble */
# define BOUNDARY_ALLREDUCE /* use mpi_allreduce in mp_boundary */
# undef  COLLECT_ALLGATHER  /* use mpi_allgather in mp_collect  */
# undef  COLLECT_ALLREDUCE  /* use mpi_allreduce in mp_collect  */
# define REDUCE_ALLGATHER   /* use mpi_allgather in mp_reduce   */
# undef  REDUCE_ALLREDUCE   /* use mpi_allreduce in mp_reduce   */
  • The ROMS internal profiling was modified to include more regions (Pregions) in mod_strings.F:
          character (len=50), dimension(Nregion) :: Pregion =             &
       &    (/'Allocation and array initialization ..............',       & !01
       &      'Ocean state initialization .......................',       & !02
       &      'Reading of input data ............................',       & !03
       &      'Processing of input data .........................',       & !04
       &      'Processing of output time averaged data ..........',       & !05
       &      'Computation of vertical boundary conditions ......',       & !06
       &      'Computation of global information integrals ......',       & !07
       &      'Writing of output data ...........................',       & !08
       &      'Model 2D kernel ..................................',       & !09
       &      'Lagrangian floats trajectories ...................',       & !10
       &      'Tidal forcing ....................................',       & !11
       &      '2D/3D coupling, vertical metrics .................',       & !12
       &      'Omega vertical velocity ..........................',       & !13
       &      'Equation of state for seawater ...................',       & !14
       &      'Biological module, source/sink terms .............',       & !15
       &      'Sediment transport module, source/sink terms .....',       & !16
       &      'Atmosphere-Ocean bulk flux parameterization ......',       & !17
       &      'KPP vertical mixing parameterization .............',       & !18
       &      'GLS vertical mixing parameterization .............',       & !19
       &      'My2.5 vertical mixing parameterization ...........',       & !20
       &      '3D equations right-side terms ....................',       & !21
       &      '3D equations predictor step ......................',       & !22
       &      'Pressure gradient ................................',       & !23
       &      'Harmonic mixing of tracers, S-surfaces ...........',       & !24
       &      'Harmonic mixing of tracers, geopotentials ........',       & !25
       &      'Harmonic mixing of tracers, isopycnals ...........',       & !26
       &      'Biharmonic mixing of tracers, S-surfaces .........',       & !27
       &      'Biharmonic mixing of tracers, geopotentials ......',       & !28
       &      'Biharmonic mixing of tracers, isopycnals .........',       & !29
       &      'Harmonic stress tensor, S-surfaces ...............',       & !30
       &      'Harmonic stress tensor, geopotentials ............',       & !31
       &      'Biharmonic stress tensor, S-surfaces .............',       & !32
       &      'Biharmonic stress tensor, geopotentials ..........',       & !33
       &      'Corrector time-step for 3D momentum ..............',       & !34
       &      'Corrector time-step for tracers ..................',       & !35
       &      'Nesting algorithm ................................',       & !36
       &      'Bottom boundary layer module .....................',       & !37
       &      'GST Analysis eigenproblem solution ...............',       & !38
       &      'Two-way coupling to Atmosphere Model .............',       & !39
       &      'Two-way coupling to Sea Ice Model ................',       & !40
       &      'Two-way coupling to Wave Model ...................',       & !41
       &      'Reading model state vector .......................',       & !42
       &      '4D-Var minimization solver .......................',       & !43
       &      'Background error covariance matrix ...............',       & !44
       &      'Posterior error covariance matrix ................',       & !45
       &      'Unused 01 ........................................',       & !46
       &      'Unused 02 ........................................',       & !47
       &      'Unused 03 ........................................',       & !48
       &      'Unused 04 ........................................',       & !49
       &      'Unused 05 ........................................',       & !50
       &      'Unused 06 ........................................',       & !51
       &      'Unused 07 ........................................',       & !52
       &      'Unused 08 ........................................',       & !53
       &      'Unused 09 ........................................',       & !54
       &      'Unused 10 ........................................',       & !55
       &      'Unused 11 ........................................',       & !56
       &      'Unused 12 ........................................',       & !57
       &      'Unused 13 ........................................',       & !58
       &      'Unused 14 ........................................',       & !59
       &      'Message Passage: 2D halo exchanges ...............',       & !60
       &      'Message Passage: 3D halo exchanges ...............',       & !61
       &      'Message Passage: 4D halo exchanges ...............',       & !62
       &      'Message Passage: lateral boundary exchanges ......',       & !63
       &      'Message Passage: data broadcast ..................',       & !64
       &      'Message Passage: data reduction ..................',       & !65
       &      'Message Passage: data gathering ..................',       & !66
       &      'Message Passage: data scattering..................',       & !67
       &      'Message Passage: boundary data gathering .........',       & !68
       &      'Message Passage: point data gathering ............',       & !69
       &      'Message Passage: nesting point data gathering ....',       & !70
       &      'Message Passage: nesting array data gathering ....',       & !71
       &      'Message Passage: synchronization barrier .........',       & !72
       &      'Message Passage: multi-model coupling ............'/)        !73
    

Notice that we now have 73 regions including 14 unused regions for later use. We need to separate the Message Passage (MPI) regions from the rest. It was tedious to renumber all the MPI regions from the rest of algorithms. The MPI regions need to located in indices Mregion=60 to Nregion=72. In wclock_off, we have:

# ifdef DISTRIBUTE
          DO imodel=1,4
            DO iregion=Mregion,Nregion
              ...
            END DO
          END DO
# endif

to process all the MPI regions. Notice that regions indices 36, 39, 40, 41, 42, 43, 44, 45, 70, 71, and 72 are the new regions introduced here to help the profiling and identify the bottleneck areas.

  • There are two additional arguments to routines wclock_on and wclock_off:
         SUBROUTINE wclock_on  (ng, model, region, line, routine)
         SUBROUTINE wclock_off (ng, model, region, line, routine)
    
    so in the calling routine, we have for example:
         CALL wclock_on  (ng, iNLM, 9, __LINE__, __FILE__)
         CALL wclock_off (ng, iNLM, 9, __LINE__, __FILE__)
    
    and the C-preprocessing code will yield:
         CALL wclock_on  (ng, iNLM, 9, 39,  "ROMS/Nonlinear/step2d_LF_AM3.h")
         CALL wclock_off (ng, iNLM, 9, 116, "ROMS/Nonlinear/step2d_LF_AM3.h")
    
    The new arguments line and routine will be used in the future for more ellaborated profiling using third-party libraries.

Below is the profiling statistics for a 5-day simulation with two nested grids using the low-level mpi_irecv/mpi_isend/mpi_bcast in routines mp_assemble and mp_collect. The simulation was run on my latest Mac on 4-CPUS.

 Elapsed CPU time (seconds):

 Node   #  0 CPU:    4150.677
 Node   #  3 CPU:    4209.386
 Node   #  1 CPU:    4209.369
 Node   #  2 CPU:    4209.324
 Total:             16778.756

 Nonlinear model elapsed CPU time profile, Grid: 01

  Allocation and array initialization ..............         1.185  ( 0.0071 %)
  Ocean state initialization .......................         0.837  ( 0.0050 %)
  Reading of input data ............................        63.005  ( 0.3755 %)
  Processing of input data .........................        27.495  ( 0.1639 %)
  Processing of output time averaged data ..........        91.075  ( 0.5428 %)
  Computation of vertical boundary conditions ......         0.636  ( 0.0038 %)
  Computation of global information integrals ......        18.845  ( 0.1123 %)
  Writing of output data ...........................       103.600  ( 0.6174 %)
  Model 2D kernel ..................................       405.983  ( 2.4196 %)
  Tidal forcing ....................................        23.336  ( 0.1391 %)
  2D/3D coupling, vertical metrics .................        58.658  ( 0.3496 %)
  Omega vertical velocity ..........................        35.253  ( 0.2101 %)
  Equation of state for seawater ...................        44.661  ( 0.2662 %)
  Atmosphere-Ocean bulk flux parameterization ......        53.266  ( 0.3175 %)
  GLS vertical mixing parameterization .............       851.479  ( 5.0747 %)
  3D equations right-side terms ....................        62.280  ( 0.3712 %)
  3D equations predictor step ......................       148.728  ( 0.8864 %)
  Pressure gradient ................................        45.963  ( 0.2739 %)
  Harmonic mixing of tracers, geopotentials ........        96.407  ( 0.5746 %)
  Harmonic stress tensor, S-surfaces ...............        38.824  ( 0.2314 %)
  Corrector time-step for 3D momentum ..............        79.510  ( 0.4739 %)
  Corrector time-step for tracers ..................       105.353  ( 0.6279 %)
  Nesting algorithm ................................       205.556  ( 1.2251 %)
  Reading model state vector .......................         0.785  ( 0.0047 %)
                                              Total:      2562.721   15.2736

 Nonlinear model message Passage profile, Grid: 01

  Message Passage: 2D halo exchanges ...............        51.440  ( 0.3066 %)
  Message Passage: 3D halo exchanges ...............        93.536  ( 0.5575 %)
  Message Passage: 4D halo exchanges ...............        36.680  ( 0.2186 %)
  Message Passage: data broadcast ..................       117.041  ( 0.6976 %)
  Message Passage: data reduction ..................         1.395  ( 0.0083 %)
  Message Passage: data gathering ..................        20.711  ( 0.1234 %)
  Message Passage: data scattering..................         0.912  ( 0.0054 %)
  Message Passage: boundary data gathering .........         0.904  ( 0.0054 %)
  Message Passage: point data gathering ............         0.573  ( 0.0034 %)
  Message Passage: nesting point data gathering ....       708.861  ( 4.2248 %)
                                              Total:      1032.054    6.1510

 Nonlinear model elapsed CPU time profile, Grid: 02

  Allocation and array initialization ..............         1.185  ( 0.0071 %)
  Ocean state initialization .......................         0.851  ( 0.0051 %)
  Reading of input data ............................         6.918  ( 0.0412 %)
  Processing of input data .........................        24.180  ( 0.1441 %)
  Processing of output time averaged data ..........       610.139  ( 3.6364 %)
  Computation of vertical boundary conditions ......         3.645  ( 0.0217 %)
  Computation of global information integrals ......        93.566  ( 0.5576 %)
  Writing of output data ...........................       187.852  ( 1.1196 %)
  Model 2D kernel ..................................      2680.264  (15.9742 %)
  Tidal forcing ....................................         0.038  ( 0.0002 %)
  2D/3D coupling, vertical metrics .................       175.925  ( 1.0485 %)
  Omega vertical velocity ..........................       131.463  ( 0.7835 %)
  Equation of state for seawater ...................       213.727  ( 1.2738 %)
  Atmosphere-Ocean bulk flux parameterization ......       274.219  ( 1.6343 %)
  GLS vertical mixing parameterization .............      4496.748  (26.8002 %)
  3D equations right-side terms ....................       414.284  ( 2.4691 %)
  3D equations predictor step ......................       758.085  ( 4.5181 %)
  Pressure gradient ................................       242.797  ( 1.4471 %)
  Harmonic mixing of tracers, geopotentials ........       503.073  ( 2.9983 %)
  Harmonic stress tensor, S-surfaces ...............       219.733  ( 1.3096 %)
  Corrector time-step for 3D momentum ..............       362.818  ( 2.1624 %)
  Corrector time-step for tracers ..................       418.174  ( 2.4923 %)
  Nesting algorithm ................................       842.104  ( 5.0189 %)
  Reading model state vector .......................         1.443  ( 0.0086 %)
                                              Total:     12663.231   75.4718

 Nonlinear model message Passage profile, Grid: 02

  Message Passage: 2D halo exchanges ...............       223.769  ( 1.3336 %)
  Message Passage: 3D halo exchanges ...............       254.173  ( 1.5149 %)
  Message Passage: 4D halo exchanges ...............       113.998  ( 0.6794 %)
  Message Passage: data broadcast ..................       115.124  ( 0.6861 %)
  Message Passage: data reduction ..................         5.276  ( 0.0314 %)
  Message Passage: data gathering ..................        34.649  ( 0.2065 %)
  Message Passage: data scattering..................         0.336  ( 0.0020 %)
  Message Passage: point data gathering ............         0.348  ( 0.0021 %)
  Message Passage: nesting point data gathering ....       565.746  ( 3.3718 %)
  Message Passage: nesting array data gathering ....       485.481  ( 2.8934 %)
                                              Total:      1798.901   10.7213

  Unique code regions profiled .....................     15225.952   90.7454 %
  Residual, non-profiled code ......................      1552.804    9.2546 %


 All percentages are with respect to total time =        16778.756

Notice that the most expensive algorithm in this particular profiling is the GLS vertical mixing parameterization (31.6%), 2D kernel (19.2%), and nesting (6.2%). I don't think that there is much that we can do about the GSL since involve several fractional powers that are very expensive.

On average, the low-level MPI functions yield 1-3% faster code than using mpi_allreduce in mp_assemble and mp_aggregate. Similarly, the low-level MPI functions yield 6-9% faster code than using mpi_allgather in mp_assemble and mp_aggregate.

One must be careful when examining these numbers. It will depend on the type of computer hardware, compiler, the number of parallel nodes, intra-node connectivity, node speed, etc. We always need to investigate the optimal number of nodes for a particular ROMS application. Too many nodes may slow the computations because of the communications overhead.

Therefore, the default is to use the low-level MPI functions in routines mp_assemble, mp_aggregate, and mp_collect. See top of distribute.F.

#736 arango Done Important: tidal forcing processing and vertical mixing limiter
Description

WARNING:

The standard input script ocean.in was modified and include new configuration parameters.


This ticket includes two critical updates to ROMS: tidal processing and upper limiter in vertical diffusion and viscosity computed from mixing parameterizations.

  • The tidal forcing file was removed from the FRCNAME filenames list in ocean.in. It is now is entered separately in a new TIDENAME keyword:
    ! Input tidal forcing file name
    
      TIDENAME == ../om/doppio_tide_7km.nc
    
    If all the forcing files are the same for all nested grids and the data is in its native resolution, we could only specify one set of filenames instead of Ngrids copies. ROMS will replicate those files internally to the remaining grids using the plural keyword protocol. Including the tidal forcing filename in the list does not allow a single set because, in nested applications, the tidal forcing is needed only by the main coarser grid (RefineScale(ng)=0). The replication of tidal forcing file gives errors due to the different nested grid dimensions.

For example, in a three-grid nesting application we just need:

 NFFILES == 8 ! number of unique forcing files

 FRCNAME == ../om/lwrad_down_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/lwrad_down_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/Pair_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/Pair_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/Qair_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/Qair_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/rain_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/rain_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/swrad_daily_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/swrad_daily_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/Tair_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/Tair_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/Uwind_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/Uwind_nam_3hourly_MAB_and_GoM_2015.nc \

            ../om/Vwind_nam_3hourly_MAB_and_GoM_2014.nc |
            ../om/Vwind_nam_3hourly_MAB_and_GoM_2015.nc

instead of three copies of these set of atmospheric forcing files.

Notice that some idealized applications may have a single file for all the forcing fields (say, atmospheric and tidal data). In such case, we just need to specify the same filename in both the TIDENAME and FRCNAME keywords. ROMS will read the fields that it needs in both cases.

  • A new output NetCDF file (HARNAME keyword) is added to store the least-squared detiding harmonic coefficients when the AVERAGES_DETIDE option is activated:
     HARNAME == doppio_har.nc  \
                pioneer_har.nc \
                array_har.nc
    
    for a three-grid nested application. The time-accumulated harmonic coefficients no longer are written in the input tidal forcing NetCDF file because of nesting. We need a harmonics NetCDF for each nested grid. Even if the tidal forcing is applied to just the coarser file, the detiding is operated in all nested grids.
  • Introduced switches LprocessTides and LprocessOBS to determine which nested grids need input tidal data and open boundary condition data. These switches are set in inp_par.F:
    !
    ! Determine the switch to process input open boundary conditions data.
    !
    ! In nesting applications, the lateral boundary conditions data is
    ! is needed only by the main coarser grid (RefineScale(ng)=0).
    !
          DO ng=1,Ngrids
            IF (.not.(RefinedGrid(ng).and.RefineScale(ng).gt.0)) THEN
              LprocessOBC(ng)=.TRUE.
            END IF
          END DO
    
    #if defined SSH_TIDES || defined UV_TIDES
    !
    ! Determine the switch to process input tidal forcing data.
    !
    ! In nesting applications, the tides are processed only by the main
    ! coarser grid (RefineScale(ng)=0) and the other grids get tidal
    ! forcing from the contact areas.
    !
          DO ng=1,Ngrids
            IF (.not.(RefinedGrid(ng).and.RefineScale(ng).gt.0)) THEN
              LprocessTides(ng)=.TRUE.
            END IF
          END DO
    #endif 
    
  • Introduced C-preprocessing options LIMIT_VDIFF and LIMIT_VVISC to limit the upper values of vertical diffusion and vertical viscosity, respectively, computed from vertical mixing parameterizations (GLS_MIXING, LMD_MIXING, and MY25_MIXING). Sometimes, these parameterizations yield high mixing values under stable water column conditions, and the threshold variables Akt_limit and Akv_limit are used as an upper limiter. However, if the water column is unstable, high values of vertical mixing are needed to eliminate vertical static instability numerically. So, please use these limiters with caution. The standard input script ocean.in was modified to include these upper limiting values:
    ! Upper threshold values to limit vertical mixing coefficients computed
    ! from vertical mixing parameterizations. Although this is an engineering
    ! fix, the vertical mixing values inferred from ocean observations are
    ! rarely higher than this upper limit value.
    
      AKT_LIMIT == 1.0d-3 1.0d-3 ! m2/s
    
      AKV_LIMIT == 1.0d-3 ! m2/s
    
    For example, in gls_corstep.F we have:
    # if defined LIMIT_VDIFF || defined LIMIT_VVISC
    !
    ! Limit vertical mixing coefficients with the upper threshold value.
    ! This is an engineering fix, but it can be based on the fact that
    ! vertical mixing in the ocean from indirect observations are not
    ! higher than the threshold value.
    !
          DO k=0,N(ng)
    # ifdef LIMIT_VDIFF
            DO itrc=1,NAT
              Akt(i,j,k,itrc)=MIN(Akt_limit(itrc,ng), Akt(i,j,k,itrc))
            END DO
    # endif
    # ifdef LIMIT_VVISC
            Akv(i,j,k)=MIN(Akv_limit(ng), Akv(i,j,k))
    # endif
          END DO
    # endif
    
    My experience with the vertical mixing parameterizations is that often they yield high vertical diffusion/viscosity values in trouble spots. Usually, their values are from 3 to 4 orders of magnitude greater than those inferred from ocean observations. I have talked with various observationalists in meetings, and the consensus is that vertical diffusion/viscosity in the ocean rarely exceeds a nominal value of 0.001. I have been worry about this for years, and I mentioned it to Hans Burchard when he visited us several years ago. Apparently, the turbulence algorithm goes unstable. I suspected that this was due to the nonlinear 3D advection of turbulent state variables. Hans mentioned that they rarely activate 3D advection in GOTM (General Ocean Turbulence Model). I tried to turn off the 3D advection in the GLS with CPP options, but the model blow-up. I didn't spend much time debugging. It is still an item in my extensive priority list.
Batch Modify
Note: See TracBatchModify for help on using batch modify.
Note: See TracQuery for help on using queries.