Difference between revisions of "Parallelization"

From WikiROMS
Jump to navigationJump to search
(copying stuff from the manual, phase I)   (change visibility)
Line 1: Line 1:
<div class="title">Parallelization</div>
<div class="title">Parallelization</div>
 
<wikitex>
[[Image:parallel_tile_map.png|center|frame|<center>Parallel Tile Partitions</center>]]
[[Image:parallel_tile_map.png|center|frame|<center>Parallel Tile Partitions</center>]]
<div style="clear:both;"></div>
<div style="clear:both;"></div>
ROMS supports serial, OpenMP, and MPI computations, with the user choosing between them at compile time. The serial code can also take  advantage of multiple small tiles which can be sized to fit in cache. All are accomplished through domain decomposition in the horizontal. All of the horizontal operations are explicit with a relatively small footprint, so the tiling is a logical choice. Some goals in the parallel design of ROMS were:
* Minimize code changes.
* Don't hard-code the number of processes.
* MPI and OpenMP share the same basic structure.
* Don't break the serial optimizations.
* Same result as serial code for any number of processes.
* Portability - able to run on any (Unix) system.


First, some [[C Preprocessor]] options. If we're compiling for MPI, the option '''-DMPI''' gets added to the argument list for '''cpp'''. Then, in [[globaldefs.h]], we have:
<div class="box"><code>#if defined MPI<br /># define DISTRIBUTE<br />#endif</code></div>
The rest of the code uses '''DISTRIBUTE''' to identify distributed memory jobs. The OpenMP case is more straightforward, with '''-D_OPENMP''' getting passed to '''cpp''' and '''_OPENMP''' being the tag to check within ROMS.


The whole horizontal ROMS grid is shown here:
[[Image:Whole_grid.png|center|frame|<center>Whole grid</center>]]
<div style="clear:both;"></div>
The computations are done over the cells inside the darker line; the cells are numbered 1 to '''Lm''' in the $\xi$-direction and 1 to '''Mm''' in the $\eta$-direction. Those looking ahead to running in parallel would be wise to include factors of two in their choice of '''Lm''' and '''Mm'''. ROMS will run in parallel with any values of '''Lm''' and '''Mm''', but the computations might not be load-balanced.
</wikitex>
===ROMS internal numbers===
<wikitex>
</wikitex>
===Other figures===
[[Image:tile.png|center|frame|<center>Parallel Tile</center>]]
[[Image:tile.png|center|frame|<center>Parallel Tile</center>]]
<div style="clear:both;"></div>
<div style="clear:both;"></div>

Revision as of 19:57, 12 November 2009

Parallelization

<wikitex>

Parallel Tile Partitions

ROMS supports serial, OpenMP, and MPI computations, with the user choosing between them at compile time. The serial code can also take advantage of multiple small tiles which can be sized to fit in cache. All are accomplished through domain decomposition in the horizontal. All of the horizontal operations are explicit with a relatively small footprint, so the tiling is a logical choice. Some goals in the parallel design of ROMS were:

  • Minimize code changes.
  • Don't hard-code the number of processes.
  • MPI and OpenMP share the same basic structure.
  • Don't break the serial optimizations.
  • Same result as serial code for any number of processes.
  • Portability - able to run on any (Unix) system.

First, some C Preprocessor options. If we're compiling for MPI, the option -DMPI gets added to the argument list for cpp. Then, in globaldefs.h, we have:

#if defined MPI
# define DISTRIBUTE
#endif

The rest of the code uses DISTRIBUTE to identify distributed memory jobs. The OpenMP case is more straightforward, with -D_OPENMP getting passed to cpp and _OPENMP being the tag to check within ROMS.

The whole horizontal ROMS grid is shown here:

Whole grid

The computations are done over the cells inside the darker line; the cells are numbered 1 to Lm in the $\xi$-direction and 1 to Mm in the $\eta$-direction. Those looking ahead to running in parallel would be wise to include factors of two in their choice of Lm and Mm. ROMS will run in parallel with any values of Lm and Mm, but the computations might not be load-balanced. </wikitex>

ROMS internal numbers

<wikitex> </wikitex>

Other figures

Parallel Tile


ROMS Nonlinear East-West Communications


ROMS Nonlinear North-South Communications


ROMS Adjoint East-West Communications


ROMS Adjoint North-South Communications