Just wondering about minimization problem (for example ad_rpcg_lanczos.F) in ROMS and would like to hear your opinion.
For example, if we are using (think the trend is mostly true) weak formulation, then all is done in obs space.
For that case rpcg is working only on single MASTER node with, and in modern era, we are using big number of obs -> big loop (Ndatum) (in my case ~ mil). I know that standard ROMS parallelization is done using tiles (domain decomposition), but this big loop (Ndatum, running only on master while all workers are idle) is good candidate for MPI parallelization (I hope).
What if we use standard tile type of parallelization that already exist, we could easily add pointer for each obs (Ndatum) to which tile it belongs. Then minimization could work in "standard" parallel (looping on tiles) using only those obs which are in the specific tile (by using pointer) and that when done, just bcast to full matrix at the end of each minimization - iteration loop.
It could speedup computations as we typically use ~10-30 iteration loops + 1-10 mil observations and use ~100 cores...
It looks too easy to be true, possibly I got something wrong?
Ivica
parallel RPCG minimization (Andy & Hernan & Brian)?
- jivica
- Posts: 172
- Joined: Mon May 05, 2003 2:41 pm
- Location: The University of Western Australia, Perth, Australia
- Contact:
Re: parallel RPCG minimization (Andy & Hernan & Brian)?
add on;
In that way (standard tile parallelization) memory allocation could be even more efficient.
MASTER node shouldn't allocate all obs into memory and do all the liftings, instead each worker should know only about observations falling into it's specific tile and make calcs. Does it make sense?
In that way (standard tile parallelization) memory allocation could be even more efficient.
MASTER node shouldn't allocate all obs into memory and do all the liftings, instead each worker should know only about observations falling into it's specific tile and make calcs. Does it make sense?