Opened 6 years ago

Closed 6 years ago

#783 closed upgrade (Done)

VERY IMPORTANT: ROMS Dynamic, Automatic, and Static Memory Requirements

Reported by: arango Owned by:
Priority: major Milestone: Release ROMS/TOMS 3.7
Component: Nonlinear Version: 3.7
Keywords: Cc:

Description

Currently, ROMS uses primarily dynamic and automatic memory which is allocated at running time. It uses small static memory allocation at compile time.

The dynamical memory is that associated with the ocean state arrays, and it is allocated at runtime, and it is persistent until the ROMS termination of the execution.

The automatic arrays appear in subroutines and functions for temporary local computations. They are created on entry to the subroutine for intermediate computations and disappear on exit. The automatic arrays (meaning non-static) are either allocated on heap or stack memory. If using the ifort compiler, the option -heap-arrays directs the compiler to put automatic arrays on the heap instead of the stack. However, it may affect performance by slowing down the computations. If using stack memory, the application needs to have enough to avoid weird segmentation faults during execution. In Linux operating systems, unlimited stack memory is possible by setting:

  ulimit -s unlimited              in your .bashrc
  limit stacksize unlimited        in your .cshrc, .tcshrc

The static arrays are allocated at compilation time and the memory reserved can be neither increased or decreased. Only a few static arrays are used in ROMS and mainly needed for I/O processing in the mod_netcdf routines.

In serial and shared-memory (OpenMP) applications, the dynamic memory associated with the ocean state is for full, global variables. Contrarily, in distributed-memory (MPI) applications, the dynamical memory related to the ocean state is for the smaller tiled arrays with global indices. Recall that the tiling in only done in the horizontal I- and J-dimensions and not in the vertical dimension.

Mostly all the ocean state arrays are dereferenced pointers and are allocated after processing ROMS standard input parameters. Recall that arrays represent a continuous linear sequence of memory. The pointer indicates the beginning of the state variable in the memory block.

ROMS is updated to compute an estimate of the dynamic and automatic memory requirements needed to run an application. The automatic memory is difficult to estimate since it is volatile. The maximum automatic memory is computed by looking at step2d.F, step3d_t.F, and I/O routines. Check mod_arrays.F to see how it is done. Also, information is provided in ROMS/memory.txt.


We can use the memory requirements to optimize partitions in parallel computers by examining the memory available for each Persistent Execution Thread (PET) or CPU. We need to make sure that the memory required by each distributed-memory tile fits on the PET to accelerate computations and optimize the computer resources.

The memory requirements are written to the standard output file after the activated CPP options report. For example, for three grids nested application on four distributed-memory PETs, we get:

 Process Information:

 Node #    0 (pid=    7420) is active.
 Node #    3 (pid=    7423) is active.
 Node #    1 (pid=    7421) is active.
 Node #    2 (pid=    7422) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  240x104x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           145.33            16.94           162.27
        1           146.47            16.94           163.42
        2           147.88            16.94           164.83
        3           149.05            16.94           165.99

      SUM           588.73            67.78           656.51

 Dynamic and Automatic memory (MB) usage for Grid 02:  204x216x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           217.32            29.46           246.78
        1           215.32            29.46           244.78
        2           215.43            29.46           244.89
        3           213.45            29.46           242.91

      SUM           861.52           117.84           979.36

 Dynamic and Automatic memory (MB) usage for Grid 03:  276x252x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           334.33            46.32           380.66
        1           332.02            46.32           378.34
        2           331.81            46.32           378.13
        3           329.51            46.32           375.83

      SUM          1327.67           185.29          1512.95

    TOTAL          2777.92           370.90          3148.82

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Notice that the information is provided in decimal megabytes (MB) for each PET (tile). The USAGE column is the sum of dynamic and automatic memory requirements for each PET (tile). The report is also done for each nested grid. The TOTAL row provides memory requirements for all tree nested grids. Its value is a little underestimated. It will give you a guideline of what amounts to use in supercomputer queueing batch jobs. This application needs around 3.5 GB if we want a nice rounded number.

In a shared-memory 2x2 partitions for the UPWELLING test case with the BIO_FENNEL ecosystem model, we get:

 Process Information:

 Thread #    3 (pid=   70227) is active.
 Thread #    0 (pid=   70227) is active.
 Thread #    1 (pid=   70227) is active.
 Thread #    2 (pid=   70227) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           216.11             2.93           219.04
        1             0.00             2.81             2.81
        2             0.00             2.93             2.93
        3             0.00             2.81             2.81

    TOTAL           216.11            11.48           227.59

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Since it is a shared-memory application, the dynamic memory requirements are reported only for PET (tile) zero.

Identical values are obtained in a serial 2x2 partitions:

 Process Information:

 Thread #    0 (pid=   36227) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           216.11             2.93           219.04
        1             0.00             2.81             2.81
        2             0.00             2.93             2.93
        3             0.00             2.81             2.81

    TOTAL           216.11            11.48           227.59

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

and similar values in a serial 1x1 partitions:

 Process Information:

 Thread #    0 (pid=   18037) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 1x1

     tile          Dynamic        Automatic            USAGE

        0           216.11             9.75           225.86

    TOTAL           216.11             9.75           225.86

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

WARNING:

The memory requirement values are reported in the International System (SI) Units for megabyte (MB):

In ROMS:

   one r4 array element = 32 bits = 4 bytes    (single-precission)
   one r8 array element = 64 bits = 8 bytes    (double-precission)

In the metric decimal system (SI):

   1 byte               8 bits

   1 kilobyte (kB)      1E+3  bytes (1000)
   1 megabyte (MB)      1E+6  bytes (1000^2)
   1 gigabyte (GB)      1E+9  bytes (1000^3)
   1 terabyte (TB)      1E+12 bytes (1000^4)
   1 petabyte (PB)      1E+15 bytes (1000^5)

In the binary system (deprecated):

   1 kibibyte (KiB)     1024              bytes (2^10)
   1 mebibyte (MiB)     1,048,576         bytes (2^20, 1024^2)
   1 gibibyte (GiB)     1,073,741,834     bytes (2^30, 1024^3)
   1 tebibyte (TiB)     1,099,511,627,776 bytes (2^40, 1024^4)

Change History (1)

comment:1 by arango, 6 years ago

Resolution: Done
Status: newclosed
Note: See TracTickets for help on using tickets.