Below is the profiling numbers for East Australia Current Application (EAC_4), 64x80x30 grid points, on 4-CPUs with 1×4 partition in my Linux Box:
Linux 2.6.12-15mdksmp #1 SMP Mon Jan 9 23:35:18 MST 2006 x86_64 Dual Core AMD Opteron(tm) Processor 265
The nonlinear model is run for 800 time-steps as standalone driver with the same CPP options as the tangent linear, adjoint and representer models. The vertical mixing parameterization is set to a constant for all models.
Nonlinear model elapsed time profile:
Initialization ................................... 4.587 ( 0.7179 %)
Reading of input data ............................ 3.979 ( 0.6226 %)
Processing of input data ......................... 7.522 ( 1.1772 %)
Computation of vertical boundary conditions ...... 0.625 ( 0.0978 %)
Computation of global information integrals ...... 8.183 ( 1.2807 %)
Writing of output data ........................... 7.926 ( 1.2404 %)
Model 2D kernel .................................. 359.438 (56.2512 %)
2D/3D coupling, vertical metrics ................. 12.438 ( 1.9465 %)
Omega vertical velocity .......................... 16.498 ( 2.5818 %)
Equation of state for seawater ................... 13.606 ( 2.1292 %)
3D equations right-side terms .................... 23.653 ( 3.7016 %)
3D equations predictor step ...................... 57.813 ( 9.0476 %)
Pressure gradient ................................ 22.302 ( 3.4901 %)
Harmonic stress tensor, S-surfaces ............... 9.827 ( 1.5379 %)
Corrector time-step for 3D momentum .............. 37.238 ( 5.8277 %)
Corrector time-step for tracers .................. 37.273 ( 5.8331 %)
Total: 622.907 97.4834
Nonlinear model message Passage profile:
Message Passage: 2D halo exchanges ............... 140.839 (22.0409 %)
Message Passage: 3D halo exchanges ............... 29.981 ( 4.6920 %)
Message Passage: 4D halo exchanges ............... 11.629 ( 1.8199 %)
Message Passage: data broadcast .................. 6.781 ( 1.0612 %)
Message Passage: data reduction .................. 1.470 ( 0.2301 %)
Message Passage: data gathering .................. 2.517 ( 0.3939 %)
Message Passage: data scattering.................. 3.362 ( 0.5262 %)
Total: 196.579 30.7641
All pecentages are with respect to total time = 638.988
The tangent linear and adjoint models are run together using the SANITY_CHECK driver, so the total elapsed time accounts for both models.
Tangent linear model elapsed time profile:
Initialization ................................... 14.835 ( 0.4350 %)
Reading of input data ............................ 173.861 ( 5.0979 %)
Processing of input data ......................... 95.638 ( 2.8043 %)
Computation of vertical boundary conditions ...... 0.875 ( 0.0256 %)
Computation of global information integrals ...... 8.092 ( 0.2373 %)
Writing of output data ........................... 4.161 ( 0.1220 %)
Model 2D kernel .................................. 828.082 (24.2810 %)
2D/3D coupling, vertical metrics ................. 18.079 ( 0.5301 %)
Omega vertical velocity .......................... 31.976 ( 0.9376 %)
Equation of state for seawater ................... 27.216 ( 0.7980 %)
3D equations right-side terms .................... 44.738 ( 1.3118 %)
3D equations predictor step ...................... 98.420 ( 2.8859 %)
Pressure gradient ................................ 44.543 ( 1.3061 %)
Harmonic stress tensor, S-surfaces ............... 13.495 ( 0.3957 %)
Corrector time-step for 3D momentum .............. 74.287 ( 2.1782 %)
Corrector time-step for tracers .................. 72.838 ( 2.1358 %)
Total: 1551.133 45.4823
Tangent linear model message Passage profile:
Message Passage: 2D halo exchanges ............... 223.804 ( 6.5624 %)
Message Passage: 3D halo exchanges ............... 55.990 ( 1.6417 %)
Message Passage: 4D halo exchanges ............... 7.083 ( 0.2077 %)
Message Passage: data broadcast .................. 10.551 ( 0.3094 %)
Message Passage: data reduction .................. 1.248 ( 0.0366 %)
Message Passage: data gathering .................. 1.241 ( 0.0364 %)
Message Passage: data scattering.................. 132.319 ( 3.8799 %)
Message Passage: point data gathering ............ 0.004 ( 0.0001 %)
Total: 432.240 12.6741
Adjoint model elapsed time profile:
Initialization ................................... 6.466 ( 0.1896 %)
Reading of input data ............................ 47.553 ( 1.3944 %)
Processing of input data ......................... 93.203 ( 2.7329 %)
Computation of vertical boundary conditions ...... 1.597 ( 0.0468 %)
Computation of global information integrals ...... 8.488 ( 0.2489 %)
Writing of output data ........................... 5.610 ( 0.1645 %)
Model 2D kernel .................................. 1008.202 (29.5625 %)
2D/3D coupling, vertical metrics ................. 31.938 ( 0.9365 %)
Omega vertical velocity .......................... 45.539 ( 1.3353 %)
Equation of state for seawater ................... 42.038 ( 1.2326 %)
3D equations right-side terms .................... 55.763 ( 1.6351 %)
3D equations predictor step ...................... 138.419 ( 4.0587 %)
Pressure gradient ................................ 68.866 ( 2.0193 %)
Harmonic stress tensor, S-surfaces ............... 20.567 ( 0.6031 %)
Corrector time-step for 3D momentum .............. 80.219 ( 2.3522 %)
Corrector time-step for tracers .................. 98.440 ( 2.8865 %)
Total: 1752.908 51.3987
Adjoint model message Passage profile:
Message Passage: 2D halo exchanges ............... 255.896 ( 7.5034 %)
Message Passage: 3D halo exchanges ............... 71.025 ( 2.0826 %)
Message Passage: 4D halo exchanges ............... 15.459 ( 0.4533 %)
Message Passage: data broadcast .................. 8.647 ( 0.2535 %)
Message Passage: data reduction .................. 1.842 ( 0.0540 %)
Message Passage: data gathering .................. 1.750 ( 0.0513 %)
Message Passage: data scattering.................. 42.078 ( 1.2338 %)
Total: 396.695 11.6319
All pecentages are with respect to total time = 3410.412
The representer tangent linear model is run as a separated driver using TLM_DRIVER.
Representer model elapsed time profile:
Initialization ................................... 4.597 ( 0.2799 %)
Reading of input data ............................ 38.600 ( 2.3505 %)
Processing of input data ......................... 95.962 ( 5.8436 %)
Computation of vertical boundary conditions ...... 1.309 ( 0.0797 %)
Computation of global information integrals ...... 8.371 ( 0.5098 %)
Writing of output data ........................... 4.301 ( 0.2619 %)
Model 2D kernel .................................. 1007.191 (61.3323 %)
2D/3D coupling, vertical metrics ................. 18.687 ( 1.1379 %)
Omega vertical velocity .......................... 32.856 ( 2.0007 %)
Equation of state for seawater ................... 33.136 ( 2.0178 %)
3D equations right-side terms .................... 49.279 ( 3.0008 %)
3D equations predictor step ...................... 101.300 ( 6.1686 %)
Pressure gradient ................................ 46.778 ( 2.8485 %)
Harmonic stress tensor, S-surfaces ............... 14.770 ( 0.8994 %)
Corrector time-step for 3D momentum .............. 76.202 ( 4.6403 %)
Corrector time-step for tracers .................. 75.721 ( 4.6110 %)
Total: 1609.061 97.9828
Representer model message Passage profile:
Message Passage: 2D halo exchanges ............... 241.656 (14.7155 %)
Message Passage: 3D halo exchanges ............... 58.278 ( 3.5488 %)
Message Passage: 4D halo exchanges ............... 13.696 ( 0.8340 %)
Message Passage: data broadcast .................. 3.486 ( 0.2123 %)
Message Passage: data reduction .................. 1.426 ( 0.0868 %)
Message Passage: data scattering.................. 33.392 ( 2.0334 %)
Total: 351.935 21.4308
All pecentages are with respect to total time = 1642.187
If we take the timings for the kernel and ignore the rest and normalize with respect the nonlinear model we get:
Tangent linear model: 1551/622 = 2.49
Representer model 1609/622 = 2.59
Adjoint model: 1752/622 = 2.82
That is, the tangent linear is model is approximately 2.5 times expensier than the nonlinear model. The representer model is about 2.6 times expensier than the nonlinear model and around 3-percent expensier that the tangent linear model. The adjoint model is about 2.8 times expensier than the nolinear model and around 12-percent expensier than the tangent linear model.