Context Navigation

← Previous Ticket
Next Ticket →

#789 closed upgrade (Done)

VERY IMPORTANT: ROMS single precision numerical kernel

Reported by:	arango	Owned by:
Priority:	major	Milestone:	Release ROMS/TOMS 3.7
Component:	Nonlinear	Version:	3.7
Keywords:		Cc:

Description (last modified by arango)

ROMS uses double-precision arithmetic in its numerical kernel by default. I have been asked several times what is needed to run in single precision to accelerate the computations. I always answered that it is straightforward since ROMS defines the numerical precision with KIND type parameters in mod_kinds.F globally. That is, of course, very far from reality because we need to look at the accuracy in various computations like:

time managing variables,
bounded time interpolation between snapshots,
vertical stretching of terrain-following coordinates,
weighting coefficients for time averaging of barotropic fields,
area and volume conservation variables,
time variables representation in output NetCDF files, and
processing of ROMS unique namelist like parameters from input scripts

Therefore, implementing a single-precision numerical kernel was not as trivial as I thought since we need to keep/compute a few variables in double precision with lots of files being modified.

What Is New:

A new C-preprocessing SINGLE_PRECISION is introduced to convert the numerical kernel to single precision. In globaldef.h, we now have:

/*
** Turn ON/OFF double precision arithmetic in numerical kernel (default)
** and floating-point type variables and associated intrinsic functions.
*/

#ifdef SINGLE_PRECISION
# ifdef OUT_DOUBLE
#   undef OUT_DOUBLE
# endif
# ifndef RST_SINGLE
#   define RST_SINGLE
# endif
#else
# define DOUBLE_PRECISION
#endif

Notice that if OUT_DOUBLE and RST_SINGLE are activated, they are turned off for consistency in the numerical kernel.

In mod_kinds.F, we now have

        integer, parameter :: i1b= SELECTED_INT_KIND(1)        !  8-bit
        integer, parameter :: i2b= SELECTED_INT_KIND(2)        !  8-bit
        integer, parameter :: i4b= SELECTED_INT_KIND(4)        ! 16-bit
        integer, parameter :: i8b= SELECTED_INT_KIND(8)        ! 32-bit
        integer, parameter :: c8 = SELECTED_REAL_KIND(6,30)    ! 32-bit
        integer, parameter :: dp = SELECTED_REAL_KIND(12,300)  ! 64-bit
        integer, parameter :: r4 = SELECTED_REAL_KIND(6,30)    ! 32-bit
# ifdef SINGLE_PRECISION
        integer, parameter :: r8 = SELECTED_REAL_KIND(6,30)    ! 32-bit
# else
        integer, parameter :: r8 = SELECTED_REAL_KIND(12,300)  ! 64-bit
# endif

Notice that the r8 parameter is defined for 32-bit (4-bytes) floating-point variables in single precision computations. A new parameter, dp, is introduced to keep few variable in double precision. For example, in mod_scalars.F we have:

       real(dp), allocatable :: tdays(:)                ! days
       real(dp), allocatable :: time(:)                 ! seconds

       real(dp), allocatable :: dt(:)                   ! seconds
       real(dp), allocatable :: dtfast(:)               ! seconds

       real(dp), allocatable :: TimeEnd(:)              ! seconds
       real(dp), allocatable :: AVGtime(:)              ! seconds
       real(dp), allocatable :: DIAtime(:)              ! seconds
       real(dp), allocatable :: IMPtime(:)              ! seconds
       real(dp), allocatable :: ObsTime(:)              ! seconds
       real(dp), allocatable :: FrcTime(:)              ! seconds

       real(dp) :: dstart = 0.0_dp                      ! days
       real(dp) :: io_time = 0.0_dp                     ! seconds
       real(dp) :: run_time = 0.0_dp                    ! seconds
       real(dp) :: tide_start = 0.0_dp                  ! days

A new module, inp_decode.F, is introduced to process and decode ROMS KeyWord parameters from input script files. It has the following module procedures:

      INTERFACE load_i
        MODULE PROCEDURE load_1d_i       ! 1D integer array
        MODULE PROCEDURE load_2d_i       ! 2D integer array
        MODULE PROCEDURE load_3d_i       ! 3D integer array
      END INTERFACE load_i

      INTERFACE load_l
        MODULE PROCEDURE load_1d_l       ! 1D logical array
        MODULE PROCEDURE load_2d_l       ! 2D logical array
        MODULE PROCEDURE load_3d_l       ! 3D logical array
      END INTERFACE load_l

      INTERFACE load_r
#ifdef SINGLE_PRECISION
        MODULE PROCEDURE load_1d_dp      ! 1D real(dp) array
        MODULE PROCEDURE load_2d_dp      ! 2D real(dp) array
        MODULE PROCEDURE load_3d_dp      ! 3D real(dp) array
#endif
        MODULE PROCEDURE load_1d_r8      ! 1D real(r8) array
        MODULE PROCEDURE load_2d_r8      ! 2D real(r8) array
        MODULE PROCEDURE load_3d_r8      ! 3D real(r8) array
      END INTERFACE load_r

to process 1D, 2D, and 3D variables since we are using the strong typing properties of modules. Before we didn't use modules because all the arrays arguments were reshaped to 1D arrays (F77 style). This part is very tricky but more robust since we need to consider single and double precision variables that are written to input scripts with 8-bytes double representation. I considered polymorphic objects (available in Fortran 2003, 2008, 2018), but I rejected it because of backward compatibility with older Fortran compilers.

WARNING: The input script processing routines (read_asspar.F, read_biopar.F, read_phypar.F, and read_stapar.F) were modified, and the functions load_i, load_l, and load_r have a different number of arguments depending if we are processing 1D, 2D, or 3D variables. For Example for load_r we can have:
           CASE ('DT')
             Npts=load_r(Nval, Rval, Ngrids, dt)
           CASE ('TNU2')
             Npts=load_r(Nval, Rval, NAT+NPT, Ngrids, Rtracer)
           CASE ('HdecayB(isTvar)')
             Npts=load_r(Nval, Rval, 4, MT, Ngrids, Rboundary)
for processing dt(:), Rtracer(:,:,:), and Rboundary(:,:,:) respectively.

NOTICE: on principle single-precision computations run twice as fast as double-precision computations. However, since some double precision computations are carried out for accuracy in SINGLE_PRECISION solutions the gain is less than 50 percent. For example, in a 3-nested grid application on 4 CPUs, I get

Elapsed CPU time (seconds):

Node   #    0 CPU:    4923.242
Node   #    1 CPU:    4953.730
Node   #    2 CPU:    4953.504
Node   #    3 CPU:    4953.539
Total:               19784.015    for double-precision computations

Elapsed CPU time (seconds):

Node   #    0 CPU:    2628.964
Node   #    1 CPU:    2683.894
Node   #    3 CPU:    2683.867
Node   #    2 CPU:    2683.818
Total:               10680.543    for single-precision computations

That is, the single-precision is 46 percent faster than the double-precision simulation.

In 4D-Var simulations, this number get much smaller because the computations are dominated by I/O.

Change History (1)

comment:1 by arango, 7 years ago

Description:	modified (diff)
Resolution:	→ Done
Status:	new → closed

Note: See TracTickets for help on using tickets.

Download in other formats: