Custom Query (969 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (757 - 759 of 969)

Ticket Owner Reporter Resolution Summary
#733 arango Done Hypoxia Simple Respiration Model
Description

Added the Hypoxia Simple Respiration Model. It can be activated with CPP option HYPOXIA_SRM. It needs total respiration rate as input that can be read from a forcing NetCDF file or can be set with analytical functions (ANA_RESPIRATION) in ana_respiration.h.

The Metadata for respiration rate is as follows:

        double respiration_time(respiration_time) ;
                respiration_time:long_name = "respiration date time" ;
                respiration_time:units = "days since 1992-01-01 00:00:00" ;
        double respiration(respiration_time, s_rho, eta_rho, xi_rho) ;
                respiration:long_name = "respiration rate" ;
                respiration:units = "day-1" ;
                respiration:time = "respiration_time" ;

You can set respiration to zero in places with no respiration like very deep water. Recall that hypoxia occurs in coastal and shallow waters, like estuaries. Use frc_respiration.cdl as a guideline to create input NetCDF file for respiration. It can have time records. If this is the case, ROMS will interpolate in time between snapshots.

The model code follows ROMS design and is added as include files:

ROMS/Nonlinear/Biology/hypoxia_srm_def.h
ROMS/Nonlinear/Biology/hypoxia_srm_inp.h
ROMS/Nonlinear/Biology/hypoxia_srm_mod.h
ROMS/Nonlinear/Biology/hypoxia_srm_var.h
ROMS/Nonlinear/Biology/hypoxia_srm_wrt.h

The input parameters are specified in hypoxia_srm.in.

CPP Options:

  • ANA_RESPIRATION: use analytical respiration rate. If not activated, it will read from input forcing NetCDF file (see Data/ROMS/CDL/frc_respiration.cdl for metadata).
  • HYPOXIA_SRM: use to activate this Hypoxia Simple Respiration Model.
  • OCMIP_OXYGEN_SC: use Schmidt number from Keeling et al. (1998) to compute surface dissolved oxygen. Otherwise, use Schmidt number from Wanninkhof (1992). This option needs to be used in conjunction with SURFACE_DO_SATURATION.
  • SURFACE_DO_SATURATION: use surface dissolved oxygen saturation at the model surface level instead of a surface dissolved oxygen flux.

References:

Scully, M.E., 2010: Wind Modulation of Dissolved Oxygen in the Chesapeake Bay, Estuaries and Coasts, 33, 1164-1175.

Scully, M.E., 2013: Physical control on hypoxia in the Chesapeake Bay: A numerical modeling study, J. Geophys. Res., 118, 1239-1256.


The original model was written by Malcolm Scully. It was adapted to follow ROMS design and ecological models numerical algorithms. Many thanks to Marjy Friedrichs, Aaron Bever, John Wilkin, and others for helping debug and test this model.

#734 arango Fixed Important: Corrected bug in dateclock.F (routine caldate)
Description

Corrected a bug in the routine caldate (module dateclock.F) when computing the fractional hour and fractional minutes. This only affects an application that uses the DIURNAL_SRFLUX, which modulate the shortwave radiation SRFLX (read and interpolated elsewhere) by the local diurnal cycle (a function of longitude, latitude, and day-of-year) using the Albedo equations.

We need to call caldate for the Albedo equations:

    CALL caldate (tdays(ng), yd_r8=yday, h_r8=hour)

Here the fractional hour of the date was not computed correctly.

I was tracking why ROMS was not producing the same solution as before the change to the Calendar/Date/Clock update (src:ticket:724). It turns out that difference in the solution is due to round-off.

The new routines compute and more exact date variables because we now use the floating-point rounding function with a Fuzzy or Tolerant Floor function:

     seconds=DayFraction*86400.0_r8
     CT=3.0_r8*EPSILON(seconds)             ! comparison tolerance
     seconds=ROUND(seconds, CT)             ! tolerant round function

The ROUND function eliminates round-off by improving the floating point representation in the computer.

So if you are using the option DIURNAL_SRFLUX the solutions are not reproducible because of the round-off. The solutions with the new code are more precise!

#735 arango Done Very IMPORTANT: ROMS Profiling Overhault
Description

This is an important update because I revised the entire profile before I start looking for ways to improve the computational efficiency. In particular, I have been experimenting with ways to accelerate the ROMS nesting algorithms. Currently, I am concentrating on routines mp_assemble and mp_aggregate of distribute.F.

This update also includes a correction with the management of ntend(ng) and few changes to the arguments for routines wclock_on and clock_off.

What it is new?

  • The routine mp_assemble is a multidimensional version of mp_collect and are use in the nesting and 4D-Var algorithms, respectively. Both routines assemble/collect elements of arrays from all the MPI nodes. Each node process parts of these arrays computed from tiled state variables.

The assembly/collection operation can be code with high-level MPI functions like mpi_allgather or mpi_allreduce (summation since all arrays are initialized from zero). Alternatively, one could use lower-level routines mpi_irecv, mpi_isend, and mpi_bcast similarly at what is done in the tile-halo exchanges (mp_exchange.F). It turns out the lower-level functions are actually more efficient than the higher-level functions. This is the case for us using generic MPI libraries (like OpenMPI). The high-level functions are usually optimized in millions of dollars supercomputers and compilers by the vendors.

Notice that at the top of distribute.F, we have the following internal CPP options to set the desired communication options. The default is to have:

# undef  ASSEMBLE_ALLGATHER /* use mpi_allgather im mp_assemble */
# undef  ASSEMBLE_ALLREDUCE /* use mpi_allreduce in mp_assemble */
# define BOUNDARY_ALLREDUCE /* use mpi_allreduce in mp_boundary */
# undef  COLLECT_ALLGATHER  /* use mpi_allgather in mp_collect  */
# undef  COLLECT_ALLREDUCE  /* use mpi_allreduce in mp_collect  */
# define REDUCE_ALLGATHER   /* use mpi_allgather in mp_reduce   */
# undef  REDUCE_ALLREDUCE   /* use mpi_allreduce in mp_reduce   */
  • The ROMS internal profiling was modified to include more regions (Pregions) in mod_strings.F:
          character (len=50), dimension(Nregion) :: Pregion =             &
       &    (/'Allocation and array initialization ..............',       & !01
       &      'Ocean state initialization .......................',       & !02
       &      'Reading of input data ............................',       & !03
       &      'Processing of input data .........................',       & !04
       &      'Processing of output time averaged data ..........',       & !05
       &      'Computation of vertical boundary conditions ......',       & !06
       &      'Computation of global information integrals ......',       & !07
       &      'Writing of output data ...........................',       & !08
       &      'Model 2D kernel ..................................',       & !09
       &      'Lagrangian floats trajectories ...................',       & !10
       &      'Tidal forcing ....................................',       & !11
       &      '2D/3D coupling, vertical metrics .................',       & !12
       &      'Omega vertical velocity ..........................',       & !13
       &      'Equation of state for seawater ...................',       & !14
       &      'Biological module, source/sink terms .............',       & !15
       &      'Sediment transport module, source/sink terms .....',       & !16
       &      'Atmosphere-Ocean bulk flux parameterization ......',       & !17
       &      'KPP vertical mixing parameterization .............',       & !18
       &      'GLS vertical mixing parameterization .............',       & !19
       &      'My2.5 vertical mixing parameterization ...........',       & !20
       &      '3D equations right-side terms ....................',       & !21
       &      '3D equations predictor step ......................',       & !22
       &      'Pressure gradient ................................',       & !23
       &      'Harmonic mixing of tracers, S-surfaces ...........',       & !24
       &      'Harmonic mixing of tracers, geopotentials ........',       & !25
       &      'Harmonic mixing of tracers, isopycnals ...........',       & !26
       &      'Biharmonic mixing of tracers, S-surfaces .........',       & !27
       &      'Biharmonic mixing of tracers, geopotentials ......',       & !28
       &      'Biharmonic mixing of tracers, isopycnals .........',       & !29
       &      'Harmonic stress tensor, S-surfaces ...............',       & !30
       &      'Harmonic stress tensor, geopotentials ............',       & !31
       &      'Biharmonic stress tensor, S-surfaces .............',       & !32
       &      'Biharmonic stress tensor, geopotentials ..........',       & !33
       &      'Corrector time-step for 3D momentum ..............',       & !34
       &      'Corrector time-step for tracers ..................',       & !35
       &      'Nesting algorithm ................................',       & !36
       &      'Bottom boundary layer module .....................',       & !37
       &      'GST Analysis eigenproblem solution ...............',       & !38
       &      'Two-way coupling to Atmosphere Model .............',       & !39
       &      'Two-way coupling to Sea Ice Model ................',       & !40
       &      'Two-way coupling to Wave Model ...................',       & !41
       &      'Reading model state vector .......................',       & !42
       &      '4D-Var minimization solver .......................',       & !43
       &      'Background error covariance matrix ...............',       & !44
       &      'Posterior error covariance matrix ................',       & !45
       &      'Unused 01 ........................................',       & !46
       &      'Unused 02 ........................................',       & !47
       &      'Unused 03 ........................................',       & !48
       &      'Unused 04 ........................................',       & !49
       &      'Unused 05 ........................................',       & !50
       &      'Unused 06 ........................................',       & !51
       &      'Unused 07 ........................................',       & !52
       &      'Unused 08 ........................................',       & !53
       &      'Unused 09 ........................................',       & !54
       &      'Unused 10 ........................................',       & !55
       &      'Unused 11 ........................................',       & !56
       &      'Unused 12 ........................................',       & !57
       &      'Unused 13 ........................................',       & !58
       &      'Unused 14 ........................................',       & !59
       &      'Message Passage: 2D halo exchanges ...............',       & !60
       &      'Message Passage: 3D halo exchanges ...............',       & !61
       &      'Message Passage: 4D halo exchanges ...............',       & !62
       &      'Message Passage: lateral boundary exchanges ......',       & !63
       &      'Message Passage: data broadcast ..................',       & !64
       &      'Message Passage: data reduction ..................',       & !65
       &      'Message Passage: data gathering ..................',       & !66
       &      'Message Passage: data scattering..................',       & !67
       &      'Message Passage: boundary data gathering .........',       & !68
       &      'Message Passage: point data gathering ............',       & !69
       &      'Message Passage: nesting point data gathering ....',       & !70
       &      'Message Passage: nesting array data gathering ....',       & !71
       &      'Message Passage: synchronization barrier .........',       & !72
       &      'Message Passage: multi-model coupling ............'/)        !73
    

Notice that we now have 73 regions including 14 unused regions for later use. We need to separate the Message Passage (MPI) regions from the rest. It was tedious to renumber all the MPI regions from the rest of algorithms. The MPI regions need to located in indices Mregion=60 to Nregion=72. In wclock_off, we have:

# ifdef DISTRIBUTE
          DO imodel=1,4
            DO iregion=Mregion,Nregion
              ...
            END DO
          END DO
# endif

to process all the MPI regions. Notice that regions indices 36, 39, 40, 41, 42, 43, 44, 45, 70, 71, and 72 are the new regions introduced here to help the profiling and identify the bottleneck areas.

  • There are two additional arguments to routines wclock_on and wclock_off:
         SUBROUTINE wclock_on  (ng, model, region, line, routine)
         SUBROUTINE wclock_off (ng, model, region, line, routine)
    
    so in the calling routine, we have for example:
         CALL wclock_on  (ng, iNLM, 9, __LINE__, __FILE__)
         CALL wclock_off (ng, iNLM, 9, __LINE__, __FILE__)
    
    and the C-preprocessing code will yield:
         CALL wclock_on  (ng, iNLM, 9, 39,  "ROMS/Nonlinear/step2d_LF_AM3.h")
         CALL wclock_off (ng, iNLM, 9, 116, "ROMS/Nonlinear/step2d_LF_AM3.h")
    
    The new arguments line and routine will be used in the future for more ellaborated profiling using third-party libraries.

Below is the profiling statistics for a 5-day simulation with two nested grids using the low-level mpi_irecv/mpi_isend/mpi_bcast in routines mp_assemble and mp_collect. The simulation was run on my latest Mac on 4-CPUS.

 Elapsed CPU time (seconds):

 Node   #  0 CPU:    4150.677
 Node   #  3 CPU:    4209.386
 Node   #  1 CPU:    4209.369
 Node   #  2 CPU:    4209.324
 Total:             16778.756

 Nonlinear model elapsed CPU time profile, Grid: 01

  Allocation and array initialization ..............         1.185  ( 0.0071 %)
  Ocean state initialization .......................         0.837  ( 0.0050 %)
  Reading of input data ............................        63.005  ( 0.3755 %)
  Processing of input data .........................        27.495  ( 0.1639 %)
  Processing of output time averaged data ..........        91.075  ( 0.5428 %)
  Computation of vertical boundary conditions ......         0.636  ( 0.0038 %)
  Computation of global information integrals ......        18.845  ( 0.1123 %)
  Writing of output data ...........................       103.600  ( 0.6174 %)
  Model 2D kernel ..................................       405.983  ( 2.4196 %)
  Tidal forcing ....................................        23.336  ( 0.1391 %)
  2D/3D coupling, vertical metrics .................        58.658  ( 0.3496 %)
  Omega vertical velocity ..........................        35.253  ( 0.2101 %)
  Equation of state for seawater ...................        44.661  ( 0.2662 %)
  Atmosphere-Ocean bulk flux parameterization ......        53.266  ( 0.3175 %)
  GLS vertical mixing parameterization .............       851.479  ( 5.0747 %)
  3D equations right-side terms ....................        62.280  ( 0.3712 %)
  3D equations predictor step ......................       148.728  ( 0.8864 %)
  Pressure gradient ................................        45.963  ( 0.2739 %)
  Harmonic mixing of tracers, geopotentials ........        96.407  ( 0.5746 %)
  Harmonic stress tensor, S-surfaces ...............        38.824  ( 0.2314 %)
  Corrector time-step for 3D momentum ..............        79.510  ( 0.4739 %)
  Corrector time-step for tracers ..................       105.353  ( 0.6279 %)
  Nesting algorithm ................................       205.556  ( 1.2251 %)
  Reading model state vector .......................         0.785  ( 0.0047 %)
                                              Total:      2562.721   15.2736

 Nonlinear model message Passage profile, Grid: 01

  Message Passage: 2D halo exchanges ...............        51.440  ( 0.3066 %)
  Message Passage: 3D halo exchanges ...............        93.536  ( 0.5575 %)
  Message Passage: 4D halo exchanges ...............        36.680  ( 0.2186 %)
  Message Passage: data broadcast ..................       117.041  ( 0.6976 %)
  Message Passage: data reduction ..................         1.395  ( 0.0083 %)
  Message Passage: data gathering ..................        20.711  ( 0.1234 %)
  Message Passage: data scattering..................         0.912  ( 0.0054 %)
  Message Passage: boundary data gathering .........         0.904  ( 0.0054 %)
  Message Passage: point data gathering ............         0.573  ( 0.0034 %)
  Message Passage: nesting point data gathering ....       708.861  ( 4.2248 %)
                                              Total:      1032.054    6.1510

 Nonlinear model elapsed CPU time profile, Grid: 02

  Allocation and array initialization ..............         1.185  ( 0.0071 %)
  Ocean state initialization .......................         0.851  ( 0.0051 %)
  Reading of input data ............................         6.918  ( 0.0412 %)
  Processing of input data .........................        24.180  ( 0.1441 %)
  Processing of output time averaged data ..........       610.139  ( 3.6364 %)
  Computation of vertical boundary conditions ......         3.645  ( 0.0217 %)
  Computation of global information integrals ......        93.566  ( 0.5576 %)
  Writing of output data ...........................       187.852  ( 1.1196 %)
  Model 2D kernel ..................................      2680.264  (15.9742 %)
  Tidal forcing ....................................         0.038  ( 0.0002 %)
  2D/3D coupling, vertical metrics .................       175.925  ( 1.0485 %)
  Omega vertical velocity ..........................       131.463  ( 0.7835 %)
  Equation of state for seawater ...................       213.727  ( 1.2738 %)
  Atmosphere-Ocean bulk flux parameterization ......       274.219  ( 1.6343 %)
  GLS vertical mixing parameterization .............      4496.748  (26.8002 %)
  3D equations right-side terms ....................       414.284  ( 2.4691 %)
  3D equations predictor step ......................       758.085  ( 4.5181 %)
  Pressure gradient ................................       242.797  ( 1.4471 %)
  Harmonic mixing of tracers, geopotentials ........       503.073  ( 2.9983 %)
  Harmonic stress tensor, S-surfaces ...............       219.733  ( 1.3096 %)
  Corrector time-step for 3D momentum ..............       362.818  ( 2.1624 %)
  Corrector time-step for tracers ..................       418.174  ( 2.4923 %)
  Nesting algorithm ................................       842.104  ( 5.0189 %)
  Reading model state vector .......................         1.443  ( 0.0086 %)
                                              Total:     12663.231   75.4718

 Nonlinear model message Passage profile, Grid: 02

  Message Passage: 2D halo exchanges ...............       223.769  ( 1.3336 %)
  Message Passage: 3D halo exchanges ...............       254.173  ( 1.5149 %)
  Message Passage: 4D halo exchanges ...............       113.998  ( 0.6794 %)
  Message Passage: data broadcast ..................       115.124  ( 0.6861 %)
  Message Passage: data reduction ..................         5.276  ( 0.0314 %)
  Message Passage: data gathering ..................        34.649  ( 0.2065 %)
  Message Passage: data scattering..................         0.336  ( 0.0020 %)
  Message Passage: point data gathering ............         0.348  ( 0.0021 %)
  Message Passage: nesting point data gathering ....       565.746  ( 3.3718 %)
  Message Passage: nesting array data gathering ....       485.481  ( 2.8934 %)
                                              Total:      1798.901   10.7213

  Unique code regions profiled .....................     15225.952   90.7454 %
  Residual, non-profiled code ......................      1552.804    9.2546 %


 All percentages are with respect to total time =        16778.756

Notice that the most expensive algorithm in this particular profiling is the GLS vertical mixing parameterization (31.6%), 2D kernel (19.2%), and nesting (6.2%). I don't think that there is much that we can do about the GSL since involve several fractional powers that are very expensive.

On average, the low-level MPI functions yield 1-3% faster code than using mpi_allreduce in mp_assemble and mp_aggregate. Similarly, the low-level MPI functions yield 6-9% faster code than using mpi_allgather in mp_assemble and mp_aggregate.

One must be careful when examining these numbers. It will depend on the type of computer hardware, compiler, the number of parallel nodes, intra-node connectivity, node speed, etc. We always need to investigate the optimal number of nodes for a particular ROMS application. Too many nodes may slow the computations because of the communications overhead.

Therefore, the default is to use the low-level MPI functions in routines mp_assemble, mp_aggregate, and mp_collect. See top of distribute.F.

Batch Modify
Note: See TracBatchModify for help on using batch modify.
Note: See TracQuery for help on using queries.