cannot work parallel nodes

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
kobl1201
Posts: 60
Joined: Tue Nov 04, 2014 8:29 pm
Location: Kongju National Universty

cannot work parallel nodes

#1 Unread post by kobl1201 »

Hi

when i work out ROMS model, I try to use 2 cpu, but the cpu continues to work with 1.

When i set paralle node =1, the model work out well.

which option can I check??

---------------------------------
It is my run.sh file

#!/bin/bash

#PBS -V
#PBS -q batch
#PBS -l nodes=2:ppn=1
#PBS -N ROMS

ulimit -s unlimited
ulimit -c unlimited

cd $PBS_O_WORKDIR


cp -f $PBS_NODEFILE .

echo $PBS_NODEFILE > ./test.prt
echo "job started"
date
NP=`/usr/bin/wc -l $PBS_NODEFILE | awk '{ print $1 }'`

echo "$NP" >> test.prt
mpirun -np 2 -machinefile $PBS_NODEFILE ./romsM ocean_ub_3k_new.in >> test.prt

date
echo "job finished"

---------------------------------

--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 4.2
Sunday - August 6, 2023 - 5:40:38 PM
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 4.2
Sunday - August 6, 2023 - 5:40:39 PM
--------------------------------------------------------------------------------

Ulleung Basin 1 km-ETOPO5

Operating system : Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /usr/local/mpi/intel18/mvapich2-2.2/bin/mpif90
Compiler flags : -O3
OCN Communicator : 1140850688, PET size = 1

Input Script : ocean_ub_3k_new.in

SVN Root URL : https://www.myroms.org/svn/src/trunk
SVN Revision : 1177

Local Root : /ghome/kobl1201/ROMS/trunk
Header Dir : /ghome/kobl1201/ROMS/trunk/Projects
Header file : ulleungbasin.h
Analytical Dir : /ghome/kobl1201/ROMS/trunk/Projects

Resolution, Grid 01: 240x225x31, Parallel Nodes: 1, Tiling: 1x2

ROMS/TOMS: Wrong choice of grid 01 partition or number of parallel nodes.
NtileI * NtileJ = 2
must be equal to the number of parallel processes = 1
Change -np value to mpirun or
change domain partition in input script.
Found Error: 06 Line: 210 Source: ROMS/Utility/inp_par.F
Found Error: 06 Line: 125 Source: ROMS/Drivers/nl_roms.h, ROMS_initialize

Elapsed wall CPU time for each process (seconds):


Ulleung Basin 1 km-ETOPO5

Operating system : Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /usr/local/mpi/intel18/mvapich2-2.2/bin/mpif90
Compiler flags : -O3
OCN Communicator : 1140850688, PET size = 1

Input Script : ocean_ub_3k_new.in

SVN Root URL : https://www.myroms.org/svn/src/trunk
SVN Revision : 1177

Local Root : /ghome/kobl1201/ROMS/trunk
Header Dir : /ghome/kobl1201/ROMS/trunk/Projects
Header file : ulleungbasin.h
Analytical Dir : /ghome/kobl1201/ROMS/trunk/Projects

Resolution, Grid 01: 240x225x31, Parallel Nodes: 1, Tiling: 1x2

ROMS/TOMS: Wrong choice of grid 01 partition or number of parallel nodes.
NtileI * NtileJ = 2
must be equal to the number of parallel processes = 1
Change -np value to mpirun or
change domain partition in input script.
Found Error: 06 Line: 210 Source: ROMS/Utility/inp_par.F
Found Error: 06 Line: 125 Source: ROMS/Drivers/nl_roms.h, ROMS_initialize

Elapsed wall CPU time for each process (seconds):

-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[56596,1],0]
Exit code: 128
--------------------------------------------------------------------------

jcwarner
Posts: 1182
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: cannot work parallel nodes

#2 Unread post by jcwarner »

ROMS/TOMS: Wrong choice of grid 01 partition or number of parallel nodes.
NtileI * NtileJ = 2
must be equal to the number of parallel processes = 1
Change -np value to mpirun or
change domain partition in input script.


if you set to run with -np X, where X is an integer, then you need to change ocean.in values of NtileI and NtileJ so that
NtileI * NtileJ = X

an example is
-np 2
NtileI=1
NtileJ=2

Post Reply