Custom Query (969 matches)
Results (757 - 759 of 969)
Ticket | Owner | Reporter | Resolution | Summary |
---|---|---|---|---|
#733 | Done | Hypoxia Simple Respiration Model | ||
Description |
Added the Hypoxia Simple Respiration Model. It can be activated with CPP option HYPOXIA_SRM. It needs total respiration rate as input that can be read from a forcing NetCDF file or can be set with analytical functions (ANA_RESPIRATION) in ana_respiration.h. The Metadata for respiration rate is as follows: double respiration_time(respiration_time) ; respiration_time:long_name = "respiration date time" ; respiration_time:units = "days since 1992-01-01 00:00:00" ; double respiration(respiration_time, s_rho, eta_rho, xi_rho) ; respiration:long_name = "respiration rate" ; respiration:units = "day-1" ; respiration:time = "respiration_time" ; You can set respiration to zero in places with no respiration like very deep water. Recall that hypoxia occurs in coastal and shallow waters, like estuaries. Use frc_respiration.cdl as a guideline to create input NetCDF file for respiration. It can have time records. If this is the case, ROMS will interpolate in time between snapshots. The model code follows ROMS design and is added as include files: ROMS/Nonlinear/Biology/hypoxia_srm_def.h ROMS/Nonlinear/Biology/hypoxia_srm_inp.h ROMS/Nonlinear/Biology/hypoxia_srm_mod.h ROMS/Nonlinear/Biology/hypoxia_srm_var.h ROMS/Nonlinear/Biology/hypoxia_srm_wrt.h The input parameters are specified in hypoxia_srm.in. CPP Options:
References: Scully, M.E., 2010: Wind Modulation of Dissolved Oxygen in the Chesapeake Bay, Estuaries and Coasts, 33, 1164-1175. Scully, M.E., 2013: Physical control on hypoxia in the Chesapeake Bay: A numerical modeling study, J. Geophys. Res., 118, 1239-1256. The original model was written by Malcolm Scully. It was adapted to follow ROMS design and ecological models numerical algorithms. Many thanks to Marjy Friedrichs, Aaron Bever, John Wilkin, and others for helping debug and test this model. |
|||
#734 | Fixed | Important: Corrected bug in dateclock.F (routine caldate) | ||
Description |
Corrected a bug in the routine caldate (module dateclock.F) when computing the fractional hour and fractional minutes. This only affects an application that uses the DIURNAL_SRFLUX, which modulate the shortwave radiation SRFLX (read and interpolated elsewhere) by the local diurnal cycle (a function of longitude, latitude, and day-of-year) using the Albedo equations. We need to call caldate for the Albedo equations: CALL caldate (tdays(ng), yd_r8=yday, h_r8=hour) Here the fractional hour of the date was not computed correctly. I was tracking why ROMS was not producing the same solution as before the change to the Calendar/Date/Clock update (src:ticket:724). It turns out that difference in the solution is due to round-off. The new routines compute and more exact date variables because we now use the floating-point rounding function with a Fuzzy or Tolerant Floor function: seconds=DayFraction*86400.0_r8 CT=3.0_r8*EPSILON(seconds) ! comparison tolerance seconds=ROUND(seconds, CT) ! tolerant round function The ROUND function eliminates round-off by improving the floating point representation in the computer. So if you are using the option DIURNAL_SRFLUX the solutions are not reproducible because of the round-off. The solutions with the new code are more precise! |
|||
#735 | Done | Very IMPORTANT: ROMS Profiling Overhault | ||
Description |
This is an important update because I revised the entire profile before I start looking for ways to improve the computational efficiency. In particular, I have been experimenting with ways to accelerate the ROMS nesting algorithms. Currently, I am concentrating on routines mp_assemble and mp_aggregate of distribute.F. This update also includes a correction with the management of ntend(ng) and few changes to the arguments for routines wclock_on and clock_off. What it is new?
Below is the profiling statistics for a 5-day simulation with two nested grids using the low-level mpi_irecv/mpi_isend/mpi_bcast in routines mp_assemble and mp_collect. The simulation was run on my latest Mac on 4-CPUS. Elapsed CPU time (seconds): Node # 0 CPU: 4150.677 Node # 3 CPU: 4209.386 Node # 1 CPU: 4209.369 Node # 2 CPU: 4209.324 Total: 16778.756 Nonlinear model elapsed CPU time profile, Grid: 01 Allocation and array initialization .............. 1.185 ( 0.0071 %) Ocean state initialization ....................... 0.837 ( 0.0050 %) Reading of input data ............................ 63.005 ( 0.3755 %) Processing of input data ......................... 27.495 ( 0.1639 %) Processing of output time averaged data .......... 91.075 ( 0.5428 %) Computation of vertical boundary conditions ...... 0.636 ( 0.0038 %) Computation of global information integrals ...... 18.845 ( 0.1123 %) Writing of output data ........................... 103.600 ( 0.6174 %) Model 2D kernel .................................. 405.983 ( 2.4196 %) Tidal forcing .................................... 23.336 ( 0.1391 %) 2D/3D coupling, vertical metrics ................. 58.658 ( 0.3496 %) Omega vertical velocity .......................... 35.253 ( 0.2101 %) Equation of state for seawater ................... 44.661 ( 0.2662 %) Atmosphere-Ocean bulk flux parameterization ...... 53.266 ( 0.3175 %) GLS vertical mixing parameterization ............. 851.479 ( 5.0747 %) 3D equations right-side terms .................... 62.280 ( 0.3712 %) 3D equations predictor step ...................... 148.728 ( 0.8864 %) Pressure gradient ................................ 45.963 ( 0.2739 %) Harmonic mixing of tracers, geopotentials ........ 96.407 ( 0.5746 %) Harmonic stress tensor, S-surfaces ............... 38.824 ( 0.2314 %) Corrector time-step for 3D momentum .............. 79.510 ( 0.4739 %) Corrector time-step for tracers .................. 105.353 ( 0.6279 %) Nesting algorithm ................................ 205.556 ( 1.2251 %) Reading model state vector ....................... 0.785 ( 0.0047 %) Total: 2562.721 15.2736 Nonlinear model message Passage profile, Grid: 01 Message Passage: 2D halo exchanges ............... 51.440 ( 0.3066 %) Message Passage: 3D halo exchanges ............... 93.536 ( 0.5575 %) Message Passage: 4D halo exchanges ............... 36.680 ( 0.2186 %) Message Passage: data broadcast .................. 117.041 ( 0.6976 %) Message Passage: data reduction .................. 1.395 ( 0.0083 %) Message Passage: data gathering .................. 20.711 ( 0.1234 %) Message Passage: data scattering.................. 0.912 ( 0.0054 %) Message Passage: boundary data gathering ......... 0.904 ( 0.0054 %) Message Passage: point data gathering ............ 0.573 ( 0.0034 %) Message Passage: nesting point data gathering .... 708.861 ( 4.2248 %) Total: 1032.054 6.1510 Nonlinear model elapsed CPU time profile, Grid: 02 Allocation and array initialization .............. 1.185 ( 0.0071 %) Ocean state initialization ....................... 0.851 ( 0.0051 %) Reading of input data ............................ 6.918 ( 0.0412 %) Processing of input data ......................... 24.180 ( 0.1441 %) Processing of output time averaged data .......... 610.139 ( 3.6364 %) Computation of vertical boundary conditions ...... 3.645 ( 0.0217 %) Computation of global information integrals ...... 93.566 ( 0.5576 %) Writing of output data ........................... 187.852 ( 1.1196 %) Model 2D kernel .................................. 2680.264 (15.9742 %) Tidal forcing .................................... 0.038 ( 0.0002 %) 2D/3D coupling, vertical metrics ................. 175.925 ( 1.0485 %) Omega vertical velocity .......................... 131.463 ( 0.7835 %) Equation of state for seawater ................... 213.727 ( 1.2738 %) Atmosphere-Ocean bulk flux parameterization ...... 274.219 ( 1.6343 %) GLS vertical mixing parameterization ............. 4496.748 (26.8002 %) 3D equations right-side terms .................... 414.284 ( 2.4691 %) 3D equations predictor step ...................... 758.085 ( 4.5181 %) Pressure gradient ................................ 242.797 ( 1.4471 %) Harmonic mixing of tracers, geopotentials ........ 503.073 ( 2.9983 %) Harmonic stress tensor, S-surfaces ............... 219.733 ( 1.3096 %) Corrector time-step for 3D momentum .............. 362.818 ( 2.1624 %) Corrector time-step for tracers .................. 418.174 ( 2.4923 %) Nesting algorithm ................................ 842.104 ( 5.0189 %) Reading model state vector ....................... 1.443 ( 0.0086 %) Total: 12663.231 75.4718 Nonlinear model message Passage profile, Grid: 02 Message Passage: 2D halo exchanges ............... 223.769 ( 1.3336 %) Message Passage: 3D halo exchanges ............... 254.173 ( 1.5149 %) Message Passage: 4D halo exchanges ............... 113.998 ( 0.6794 %) Message Passage: data broadcast .................. 115.124 ( 0.6861 %) Message Passage: data reduction .................. 5.276 ( 0.0314 %) Message Passage: data gathering .................. 34.649 ( 0.2065 %) Message Passage: data scattering.................. 0.336 ( 0.0020 %) Message Passage: point data gathering ............ 0.348 ( 0.0021 %) Message Passage: nesting point data gathering .... 565.746 ( 3.3718 %) Message Passage: nesting array data gathering .... 485.481 ( 2.8934 %) Total: 1798.901 10.7213 Unique code regions profiled ..................... 15225.952 90.7454 % Residual, non-profiled code ...................... 1552.804 9.2546 % All percentages are with respect to total time = 16778.756 Notice that the most expensive algorithm in this particular profiling is the GLS vertical mixing parameterization (31.6%), 2D kernel (19.2%), and nesting (6.2%). I don't think that there is much that we can do about the GSL since involve several fractional powers that are very expensive. On average, the low-level MPI functions yield 1-3% faster code than using mpi_allreduce in mp_assemble and mp_aggregate. Similarly, the low-level MPI functions yield 6-9% faster code than using mpi_allgather in mp_assemble and mp_aggregate. One must be careful when examining these numbers. It will depend on the type of computer hardware, compiler, the number of parallel nodes, intra-node connectivity, node speed, etc. We always need to investigate the optimal number of nodes for a particular ROMS application. Too many nodes may slow the computations because of the communications overhead. Therefore, the default is to use the low-level MPI functions in routines mp_assemble, mp_aggregate, and mp_collect. See top of distribute.F. |