automatic restart in pbs script.

Message

sankaras · #1 Unread post by **sankaras** » Thu Apr 09, 2009 12:31 am

Hello,

I am running ROMS on a unix cluster, which has an 8 hour walltime limit. My ROMS simulation is 270000 time steps long and the 8 hour walltime limit completes about 26000 time steps. So, I wrote a script(given below) moving the output files to another directory; copied the ocean_rst.nc to ocean_ini.nc and then running 10 times. The script completes the first run; then the model makes an interrupt exit after reaching the 8 hour walltime limit. The script is also then terminated, giving the following error.

p0_7176: p4_error: interrupt SIGx: 15
p5_7219: p4_error: net_recv read: probable EOF on socket: 1
rm_l_5_7221: (28843.296875) net_send: could not write to fd=10, errno = 32
p9_7228: p4_error: interrupt SIGx: 13

I can still manually copy the ocean_rst.nc to ocean_ini.nc and restart the simulation and that works fine. Is there a way to trap this interruption due to walltime limit; but still continue executing the loop in the script?

Thanks,

Sankar

#!/bin/bash
#$ -S /bin/bash
#PBS -N ROMS1
#PBS -V
#PBS -l nodes=8:ppn=8
# -o /home/subbayya/roms/projects/upwelling/upwelling.log
#PBS -e /home/subbayya/roms/projects/upwelling/upwelling.err
MAXRUNS=10
STARTRUN=1
JOBDIR=/home/subbayya/roms-3.0/projects/jet_obcs
#RESULTS_DIR=${JOBDIR}/Results
RUNNUMFILE=${JOBDIR}/runnum.txt
cd $PBS_O_WORKDIR

if [ ! -f ${RUNNUMFILE} ] ; then
echo 1 > ${RUNNUMFILE}
fi

while(true)
do
RUNNUM=`cat ${RUNNUMFILE}`
if [ ${RUNNUM} -le ${MAXRUNS} ] ; then
echo Starting job ${RUNNUM} of ${MAXRUNS} on `date`
time mpirun -machine vapi ./oceanM ocean_jet_obcs.in > jet.log
rundir=${JOBDIR}/RUN${RUNNUM}
mkdir -p ${rundir}
cp *.nc ${rundir}/.
mv ocean_rst.nc ocean_ini.nc
cp jet.log ${rundir}/.
echo `expr ${RUNNUM} + 1` > ${RUNNUMFILE}
else
break;
fi
done

echo run ended at `date`

kate · #2 Unread post by **kate** » Thu Apr 09, 2009 5:14 pm

I think you have to do what's known as job chaining - having the first script submit the second script, and so on. I'm sure it's been written up in the ARSC HPC newsletter.

Ocean Modeling Discussion

automatic restart in pbs script.

automatic restart in pbs script.

Re: automatic restart in pbs script.