I got into a new trouble after upgraded to ROMS3.1. The model will quit after several months(some cases longer than 1 year) without blow up messages. I have compared the results that ROMS3.1 behaves better at open boundaries than ROMS3.0, which yield beautiful patterns in the whole domain. But I really don't know why it stops unexpectedly.
The messages in the end of the log file are:
The model restarted from the 4th year RST file, and run from 1440 day to 1897 day, then quit.
791830 1897 02:13:30 3.629872E-02 8.952483E+03 8.952519E+03 9.528867E+14
791840 1897 02:48:00 3.516587E-02 8.953228E+03 8.953263E+03 9.529380E+14
791850 1897 03:22:30 3.441360E-02 8.953581E+03 8.953615E+03 9.530236E+14
WRT_HIS - wrote history fields (Index=1,1) into time record = 0000002
MPI process terminated unexpectedly
Exit code -5 signaled from node34
Killing remote processes...DONE
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
My configuration:
For the climatological forcing and boundary input, it runs for 457 days; for other two kinds of forcings, it runs for 381 days or 457days.Operating system : Linux
CPU/hardware : x86_64
Compiler system : pgi
Compiler command : /opt/mpi/mvapich/1.1/gcc.pgf90/bin/mpif90
Compiler flags : -O3 -tp k8-64 -Mfree
I have asked our cluster administrator, he said its not a system problem.
The first thing I could imagine is the model blows up, but there is no NaN in the log file.
The second thing I would suppose is the problem of cycle_length (my cycle_length is 360 days), but it already runs more than 14 months.
Any suggestion is welcomed! Thank you!
zhou