Hi all,
I have recently moved my cases to a new machine, and i use
mpirun -np 4 ./oceanM ocean_prereal_grd2_wrt_8.in > out.log
and following errors occur (all the node number is zero) in the log file:
Process Information:
Node # 0 (pid= 4314) is active.
Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:
Node # 0 (pid= 4315) is active.
Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:
Node # 0 (pid= 4316) is active.
Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:
Node # 0 (pid= 4317) is active.
Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Wind-Driven Upwelling/Downwelling over a Periodic Channel
Operating system : Linux
CPU/hardware : x86_64
Compiler system : gfortran
Compiler command : /opt/mpich2/gnu/bin/mpif90
Compiler flags : -frepack-arrays -O3 -ffast-math -ffree-form -ffree-line-length-none
Input Script : ocean_prereal_grd2_wrt_8.in
SVN Root URL : https://www.myroms.org/svn/src/trunk
SVN Revision :
Local Root : /home/zutt/source/roms148m63_obc1
Header Dir : /home/zutt/ttcase/PREreal_grd2_wrt_8/Forward
Header file : prereal_grd2_wrt_8.h
Analytical Dir: /home/zutt/source/roms148m63_obc1/ROMS/Functionals
Resolution, Grid 01: 0398x0198x030, Parallel Nodes: 1, Tiling: 002x002
ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.
Tile partition information for Grid 01: 0398x0198x0030 tiling: 002x002
tile Istr Iend Jstr Jend Npts
0 1 199 1 99 591030
1 200 398 1 99 591030
2 1 199 100 198 591030
3 200 398 100 198 591030
Maximum halo size in XI and ETA directions:
HaloSizeI(1) = 630
HaloSizeJ(1) = 330
TileSide(1) = 204
TileSize(1) = 21216
.
.
.
and the output informaiton in the out.log file is not in a right sequence, seems each node is writing into the log file without communicating with each other.
The new machine is:
Linux cluster.hpc.cc 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
and I use gfortran to compile the ROMS, and it is compiled successfully without any errors,
the netcdf is the precomiled binary file (binary-netcdf-3.6.3_nc3_gfortran_gfortran_g++.tar.gz) from unidata,
Previous machine is:
Linux hqlx75.ust.hk 2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:35:07 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
and I use pgi to compile the ROMS,
The exactly same case is working all right on the previous machine.
I am wondering if there is something wrong with the mpirun command? Any comments and suggestions are appreciated!
mpirun problem?
mpirun problem?
Last edited by Barbara on Wed Apr 21, 2010 9:59 am, edited 1 time in total.
Re: mpirun problem?
It sure sounds like a problem in the mpi. Each processor is running its own copy of the exact same code, which queries the system to find out which node they are, from 0 to 3. If they all think they are node zero, there will be trouble - they all think they are the master and will act accordingly. The master is also in charge of writing to stdout, so yes, they would all be writing.
Re: mpirun problem?
Thanks for your reply, the problem is solved by adding .mpd.conf file in my home directory and excute mpd &, then use the same mpirun command.
Re: mpirun problem?
In order to set the mpi running in a cluster, I wonder if you are familiarised with these problems or know who can help me.
It compiles and runs good in serial mode, also it compiles well for parallel purposes (oceanM), For parallel It compiles either in ifort or gfortran, but there is a partition error that I can’t find, all nodes are zero and the number of parallel nodes always is 1.
I’ve tried either with intel/mpich, gcc/mpich, gcc/mpich2, but nothing. also with the default roms-sample cases, the idealised ones, idealised bathymetry and parameters that can be compiled in serial or parallel.
out.log file:
I've tried with the smpd.conf file (linked with mpich2) in the home directory but none. it seems there is an environment error that might have a wrong initialization, lack of variables, ...
Thanks for any reply.
RaulG
It compiles and runs good in serial mode, also it compiles well for parallel purposes (oceanM), For parallel It compiles either in ifort or gfortran, but there is a partition error that I can’t find, all nodes are zero and the number of parallel nodes always is 1.
I’ve tried either with intel/mpich, gcc/mpich, gcc/mpich2, but nothing. also with the default roms-sample cases, the idealised ones, idealised bathymetry and parameters that can be compiled in serial or parallel.
out.log file:
Code: Select all
Process Information:
Process Information:
Process Information:
Process Information:
Node # 0 (pid= 2687) is active.
Node # 0 (pid= 2686) is active.
Node # 0 (pid= 2688) is active.
Node # 0 (pid= 2689) is active.
Model Input Parameters: ROMS/TOMS version 3.4
Thursday - May 20, 2010 - 10:57:01 AM
-----------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 3.4
Thursday - May 20, 2010 - 10:57:01 AM
Model Input Parameters: ROMS/TOMS version 3.4
Thursday - May 20, 2010 - 10:57:01 AM
Model Input Parameters: ROMS/TOMS version 3.4
Thursday - May 20, 2010 - 10:57:01 AM
-----------------------------------------------------------------------------
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Lake Signell Sediment Test Case
Operating system : Linux
CPU/hardware : x86_64
Compiler system : gfortran
Compiler command : /cvos/shared/apps/mpich/ge/gcc/64/1.2.7/bin/mpif90
Compiler flags : -frepack-arrays -O3 -ffast-math -ffree-form -ffree-line-length-none
Input Script : /home/primare/pl/raulg/roms/Projects/lake_signell/ocean_lake_signell.in
SVN Root URL : https://www.myroms.org/svn/src/trunk
SVN Revision : 448M
Local Root : /home/primare/pl/raulg/roms/trunk
Header Dir : /home/primare/pl/raulg/roms/Projects/lake_signell
Header file : lake_signell.h
Analytical Dir: /home/primare/pl/raulg/roms/Projects/lake_signell
Resolution, Grid 01: 0100x0020x008, Parallel Nodes: 1, Tiling: 002x002
ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.
...
All percentages are with respect to total time = ************
ROMS/TOMS - Output NetCDF summary for Grid 01:
ROMS/TOMS - Partition error ......... exit_flag: 6
ERROR: Illegal domain partition.
Thanks for any reply.
RaulG
Re: mpirun problem?
Barbara wrote:Thanks for your reply, the problem is solved by adding .mpd.conf file in my home directory and excute mpd &, then use the same mpirun command.
Sorry for misleading, It seems the problem is solved when I posted the reply, but failed later, then I give up trying mpich2 and use openmpi instead.
Re: mpirun problem?
Good day to all;
Im new in ROMS..
I got the same error when I Run ROMS...
-----------------------------------------------------------------------------
Resolution, Grid 01: 0398x0498x001, Parallel Nodes: 1, Tiling: 004x004
-----------------------------------------------------------------------------
ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.
Elapsed CPU time (seconds):
ROMS/TOMS - Output NetCDF summary for Grid 01:
ROMS/TOMS - Partition error ......... exit_flag: 6
ERROR: Illegal domain partition.
-----------------------------------------------------------------------------
I used distributed memory.
export USE_MPI=on # distributed-memory parallelism
export USE_MPIF90=on # compile with mpif90 script
#export USE_OpenMP=on # shared-memory parallelism
#export which_MPI=mpich # compile with MPICH library
#export which_MPI=mpich2 # compile with MPICH2 library
export which_MPI=openmpi # compile with OpenMPI library
when I set the NtileI=1 and NtileJ=1
the model is run smoothly with
$MPI -np 16 /home/badria/ROMS3.5/FORWARD/oceanM /home/badria/ROMS3.5/FORWARD/oman.in > /home/badria/ROMS3.5/oman_GONU.out .
had any one solved this problem before??
Im new in ROMS..
I got the same error when I Run ROMS...
-----------------------------------------------------------------------------
Resolution, Grid 01: 0398x0498x001, Parallel Nodes: 1, Tiling: 004x004
-----------------------------------------------------------------------------
ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.
Elapsed CPU time (seconds):
ROMS/TOMS - Output NetCDF summary for Grid 01:
ROMS/TOMS - Partition error ......... exit_flag: 6
ERROR: Illegal domain partition.
-----------------------------------------------------------------------------
I used distributed memory.
export USE_MPI=on # distributed-memory parallelism
export USE_MPIF90=on # compile with mpif90 script
#export USE_OpenMP=on # shared-memory parallelism
#export which_MPI=mpich # compile with MPICH library
#export which_MPI=mpich2 # compile with MPICH2 library
export which_MPI=openmpi # compile with OpenMPI library
when I set the NtileI=1 and NtileJ=1
the model is run smoothly with
$MPI -np 16 /home/badria/ROMS3.5/FORWARD/oceanM /home/badria/ROMS3.5/FORWARD/oman.in > /home/badria/ROMS3.5/oman_GONU.out .
had any one solved this problem before??
Re: mpirun problem?
Upon moving to a new machine I am experiencing this exact problem. I wish the other users who experienced this problem had reported back with their solution...
Re: mpirun problem?
Greetings,
recently i gain access to a HPC system and I had
the same problem. System has various mpi libraries (openmpi and intelmpi).
With openmpi 1.8.5 for all available compilers (gnu,intel) i had no luck but when
i start using openmpi 1.8.7 had no problem at all.
I suggest to change your openmpi library and try again.
Giannis
recently i gain access to a HPC system and I had
the same problem. System has various mpi libraries (openmpi and intelmpi).
With openmpi 1.8.5 for all available compilers (gnu,intel) i had no luck but when
i start using openmpi 1.8.7 had no problem at all.
I suggest to change your openmpi library and try again.
Giannis