Custom Query (986 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (652 - 654 of 986)

Ticket Owner Reporter Resolution Summary
#783 arango Done VERY IMPORTANT: ROMS Dynamic, Automatic, and Static Memory Requirements
Description

Currently, ROMS uses primarily dynamic and automatic memory which is allocated at running time. It uses small static memory allocation at compile time.

The dynamical memory is that associated with the ocean state arrays, and it is allocated at runtime, and it is persistent until the ROMS termination of the execution.

The automatic arrays appear in subroutines and functions for temporary local computations. They are created on entry to the subroutine for intermediate computations and disappear on exit. The automatic arrays (meaning non-static) are either allocated on heap or stack memory. If using the ifort compiler, the option -heap-arrays directs the compiler to put automatic arrays on the heap instead of the stack. However, it may affect performance by slowing down the computations. If using stack memory, the application needs to have enough to avoid weird segmentation faults during execution. In Linux operating systems, unlimited stack memory is possible by setting:

  ulimit -s unlimited              in your .bashrc
  limit stacksize unlimited        in your .cshrc, .tcshrc

The static arrays are allocated at compilation time and the memory reserved can be neither increased or decreased. Only a few static arrays are used in ROMS and mainly needed for I/O processing in the mod_netcdf routines.

In serial and shared-memory (OpenMP) applications, the dynamic memory associated with the ocean state is for full, global variables. Contrarily, in distributed-memory (MPI) applications, the dynamical memory related to the ocean state is for the smaller tiled arrays with global indices. Recall that the tiling in only done in the horizontal I- and J-dimensions and not in the vertical dimension.

Mostly all the ocean state arrays are dereferenced pointers and are allocated after processing ROMS standard input parameters. Recall that arrays represent a continuous linear sequence of memory. The pointer indicates the beginning of the state variable in the memory block.

ROMS is updated to compute an estimate of the dynamic and automatic memory requirements needed to run an application. The automatic memory is difficult to estimate since it is volatile. The maximum automatic memory is computed by looking at step2d.F, step3d_t.F, and I/O routines. Check mod_arrays.F to see how it is done. Also, information is provided in ROMS/memory.txt.


We can use the memory requirements to optimize partitions in parallel computers by examining the memory available for each Persistent Execution Thread (PET) or CPU. We need to make sure that the memory required by each distributed-memory tile fits on the PET to accelerate computations and optimize the computer resources.

The memory requirements are written to the standard output file after the activated CPP options report. For example, for three grids nested application on four distributed-memory PETs, we get:

 Process Information:

 Node #    0 (pid=    7420) is active.
 Node #    3 (pid=    7423) is active.
 Node #    1 (pid=    7421) is active.
 Node #    2 (pid=    7422) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  240x104x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           145.33            16.94           162.27
        1           146.47            16.94           163.42
        2           147.88            16.94           164.83
        3           149.05            16.94           165.99

      SUM           588.73            67.78           656.51

 Dynamic and Automatic memory (MB) usage for Grid 02:  204x216x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           217.32            29.46           246.78
        1           215.32            29.46           244.78
        2           215.43            29.46           244.89
        3           213.45            29.46           242.91

      SUM           861.52           117.84           979.36

 Dynamic and Automatic memory (MB) usage for Grid 03:  276x252x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           334.33            46.32           380.66
        1           332.02            46.32           378.34
        2           331.81            46.32           378.13
        3           329.51            46.32           375.83

      SUM          1327.67           185.29          1512.95

    TOTAL          2777.92           370.90          3148.82

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Notice that the information is provided in decimal megabytes (MB) for each PET (tile). The USAGE column is the sum of dynamic and automatic memory requirements for each PET (tile). The report is also done for each nested grid. The TOTAL row provides memory requirements for all tree nested grids. Its value is a little underestimated. It will give you a guideline of what amounts to use in supercomputer queueing batch jobs. This application needs around 3.5 GB if we want a nice rounded number.

In a shared-memory 2x2 partitions for the UPWELLING test case with the BIO_FENNEL ecosystem model, we get:

 Process Information:

 Thread #    3 (pid=   70227) is active.
 Thread #    0 (pid=   70227) is active.
 Thread #    1 (pid=   70227) is active.
 Thread #    2 (pid=   70227) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           216.11             2.93           219.04
        1             0.00             2.81             2.81
        2             0.00             2.93             2.93
        3             0.00             2.81             2.81

    TOTAL           216.11            11.48           227.59

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Since it is a shared-memory application, the dynamic memory requirements are reported only for PET (tile) zero.

Identical values are obtained in a serial 2x2 partitions:

 Process Information:

 Thread #    0 (pid=   36227) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 2x2

     tile          Dynamic        Automatic            USAGE

        0           216.11             2.93           219.04
        1             0.00             2.81             2.81
        2             0.00             2.93             2.93
        3             0.00             2.81             2.81

    TOTAL           216.11            11.48           227.59

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

and similar values in a serial 1x1 partitions:

 Process Information:

 Thread #    0 (pid=   18037) is active.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  41x80x16  tiling: 1x1

     tile          Dynamic        Automatic            USAGE

        0           216.11             9.75           225.86

    TOTAL           216.11             9.75           225.86

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

WARNING:

The memory requirement values are reported in the International System (SI) Units for megabyte (MB):

In ROMS:

   one r4 array element = 32 bits = 4 bytes    (single-precission)
   one r8 array element = 64 bits = 8 bytes    (double-precission)

In the metric decimal system (SI):

   1 byte               8 bits

   1 kilobyte (kB)      1E+3  bytes (1000)
   1 megabyte (MB)      1E+6  bytes (1000^2)
   1 gigabyte (GB)      1E+9  bytes (1000^3)
   1 terabyte (TB)      1E+12 bytes (1000^4)
   1 petabyte (PB)      1E+15 bytes (1000^5)

In the binary system (deprecated):

   1 kibibyte (KiB)     1024              bytes (2^10)
   1 mebibyte (MiB)     1,048,576         bytes (2^20, 1024^2)
   1 gibibyte (GiB)     1,073,741,834     bytes (2^30, 1024^3)
   1 tebibyte (TiB)     1,099,511,627,776 bytes (2^40, 1024^4)

#785 arango Done Updated dynamic and automatic memory reporting
Description

In src:ticket:783, I introduced the reporting of dynamic memory and automatic memory estimates for a particular ROMS application. There is still some memory requirements that are not accounted.

The reporting is now done at the end of the computations to allow for unaccounted automatic memory. A new variable BmemMax(ng) is introduced to track the maximum automatic buffer size used in distributed-memory (MPI) exchanges. In distributed-memory applications with serial I/O, the size of the automatic, temporary buffers needed for scattering/gathering of data increases as the ROMS grid size increases. It can become a memory bottleneck with the increasing of tile partitions since every parallel CPU allocates a full copy of the data array to process. The temporary buffers are automatically allocated on stack or heap. The user has the option to activate INLINE_2DIO to process 3D and 4D arrays as 2D-slabs to reduce the memory requirements. Alternatively, one can activate PARALLEL_IO if such hardware infrastructure is available.

A new subroutine, memory.F, is introduce to compute and report ROMS dynamic and automatic memory requirements. It is called from ROMS_finalize. That is, at the end of execution.

!
!  Report dynamic memory and automatic memory requirements.
!
!$OMP PARALLEL
      CALL memory
!$OMP END PARALLEL
!
!  Close IO files.
!
      CALL close_out

      RETURN
      END SUBROUTINE ROMS_finalize

For a three-nested grid MPI application, I get:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  240x104x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE      MPI-Buffers

        0           229.42            16.94           246.36             9.83
        1           230.56            16.94           247.51             9.83
        2           231.97            16.94           248.92             9.83
        3           233.14            16.94           250.08             9.83

      SUM           925.09            67.78           992.87            39.32

 Dynamic and Automatic memory (MB) usage for Grid 02:  204x216x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE      MPI-Buffers

        0           382.54            35.84           418.38            35.84
        1           380.54            35.84           416.38            35.84
        2           380.65            35.84           416.48            35.84
        3           378.66            35.84           414.50            35.84

      SUM          1522.40           143.34          1665.74           143.34

 Dynamic and Automatic memory (MB) usage for Grid 03:  276x252x40  tiling: 2x2

     tile          Dynamic        Automatic            USAGE      MPI-Buffers

        0           406.61            55.21           461.82            55.21
        1           404.30            55.21           459.50            55.21
        2           404.09            55.21           459.29            55.21
        3           401.79            55.21           456.99            55.21

      SUM          1616.79           220.83          1837.61           220.83

    TOTAL          4064.28           431.95          4496.23           403.50

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Notice that the last column reports the maximum size of the MPI-Buffers computed from BmemMax for each nested grid. It is the limiting factor in grids 2 and 3 since it is the same value reported in the Automatic column.

We will research third-party memory profiling software to see how accurated are the reported memory estimates.

#786 jcwarner Fixed Reading forcing data with DT < 1 second
Description

I am using a NetCDF forcing data file with a baroclinic DT of 0.2 sec to drive a lab test case. But ROMS does not interpolate the data correctly because in set_ngfld (same for set_2dlfd and set_3dlfd) we have:

     fac1=ANINT(Tintrp(it2,ifield,ng)-time(ng),r8)
     fac2=ANINT(time(ng)-Tintrp(it1,ifield,ng),r8)

which truncates the time interpolation weights to a whole number for the nearest second towards zero.

We got it to work for smaller than a second baroclinic timestep by using:

      fac1=ANINT((Tintrp(it2,ifield,ng)-time(ng))*SecScale,r8)
      fac2=ANINT((time(ng)-Tintrp(it1,ifield,ng))*SecScale,r8)

where SecScale=1000. That is, the time interpolation weights are rounded to the nearest millisecond instead.

The following statements at the full precision did not work:

     fac1=Tintrp(it2,ifield,ng)-time(ng)
     fac2=time(ng)-Tintrp(it1,ifield,ng)

because there can be a small value of fac1 that is negative because of roundoff, and then the interpolation is stopped by

      ELSE IF (((fac1*fac2).ge.0.0_r8).and.(fac1+fac2).gt.0.0_r8) THEN
...

indicating unbounded interpolants.


WARNING:

Notice that we no longer will get identical solutions with previous versions due to very small differences in the time interpolated fields. It does not matter much because the differences are in the order of roundoff. However, users need to be aware of such fact.

Batch Modify
Note: See TracBatchModify for help on using batch modify.
Note: See TracQuery for help on using queries.