Runtime error with MPI

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

Runtime error with MPI

#1 Unread post by francis »

Hi,
When I am running ROMS-2.2 with MPI option I could run and I could get the output also but the next day when I ran the same application, it is not running and giving runtime error. I tried to run the test cases like upwelling fortunately, they are running well. The error is:
Process Information:

Node # 0 (pid= 11457) is active.
Node # 1 (pid= 11458) is active.

Model Input Parameters: ROMS/TOMS version 2.2
Monday - May 12, 2008 - 1:02:41 PM
-----------------------------------------------------------------------------
forrtl: severe (408): fort: (3): Subscript #1 of the array CVAL has value -1789532123 which is less than the lower bound of 1

Image PC Routine Line Source
myindia_2proc 0000000000A4E74E Unknown Unknown Unknown
myindia_2proc 0000000000A4D94E Unknown Unknown Unknown
myindia_2proc 0000000000A0970E Unknown Unknown Unknown
myindia_2proc 00000000009D5DD5 Unknown Unknown Unknown
myindia_2proc 00000000009D5034 Unknown Unknown Unknown
myindia_2proc 0000000000837DAC Unknown Unknown Unknown
myindia_2proc 0000000000810918 Unknown Unknown Unknown
myindia_2proc 0000000000809C19 Unknown Unknown Unknown
myindia_2proc 0000000000404679 Unknown Unknown Unknown
myindia_2proc 000000000040450C Unknown Unknown Unknown
myindia_2proc 00000000004043EE Unknown Unknown Unknown
libc.so.6 0000002A95A701AE Unknown Unknown Unknown
myindia_2proc 000000000040432A Unknown Unknown Unknown
MPI Application rank 0 exited before MPI_Finalize() with status 152
forrtl: error (7 8): process killed (SIGTERM)
Image PC Routine Line Source
libc.so.6 0000002A95A82720 Unknown Unknown Unknown
libmpi.so.1 0000002A956E7FBA Unknown Unknown Unknown

Can anybody go through this problem. I am still in a confusion why one day it ran and one day it didnt ran.

Thanks
Francis.

User avatar
kate
Posts: 4089
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Runtime error with MPI

#2 Unread post by kate »

francis wrote: forrtl: severe (408): fort: (3): Subscript #1 of the array CVAL has value -1789532123 which is less than the lower bound of 1
Do you know which CVAL this is? It looks like it hasn't gotten to running inp_par yet, or at least not to the printing in it, well before the first MPI broadcast and the big memory allocations. One thing that can change from one day to the next is what other jobs are on the machine.

francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

#3 Unread post by francis »

Using debugging options I came to know that the CVAL which was referred is from inp_par.F file. But I didnt understand for what it is used and why it is giving problem. I am using bulk flux parameterization. Moreover, test cases are running using MPI and without MPI my application is running. At first, with MPI my application ran successfully. My belief is that it is due to some fortran variable allocation problem. But I am not getting how to recover it. Can anybody throw light on it.
Any help is appreciated.
Thanks.

regards,
Francis.

User avatar
kate
Posts: 4089
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

#4 Unread post by kate »

Cval is used here: status=decode_line(line, KeyWord, Nval, Cval, Rval)
and inside the decode_line function itself. I'd add print statements in there or use a debugger to see what is happening. Is it the first time decode_line gets called? But first, have you tried without optimization on your compiler?

francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

#5 Unread post by francis »

It is called first time only.
But first, have you tried without optimization on your compiler?
what do u mean by this? I didnt get ur point.......?
I am using ifort compiler.

With regards,
Francis.

User avatar
kate
Posts: 4089
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

#6 Unread post by kate »

Do you have the USE_DEBUG option on? This will turn off the compiler option that optimizes your code. Many compiler bugs are in the optimization phase of compiling, so that you get the right answer with say -O2 and the wrong answer with -O3.

francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

#7 Unread post by francis »

I didnt turn on the debugger option and I am giving '-checkall -traceback' option for debugging.

francis.

francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

#8 Unread post by francis »

I could solve the problem. It is fortran bug I think. When I list NVAL value first it gave that big value, but when I run the same code second time that value was gone and it had given the appropriate values. So that probelm was solved. But now when I run the same code it had given me the reading runtime error as below:
GET_STATE - Read state initial conditions, t = 15.0000
(File: roms_io4_ini.nc, Rec=0001, Index=1)
- free-surface
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated u-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated v-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- u-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- v-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- potential temperature
(Min = 1.26576555E+01 Max = 3.08079072E+01)
- salinity
(Min = 2.60195697E+01 Max = 4.10422279E+01)

forrtl: severe (193): Run-Time Check Failure. The variable 'get_ngfld_$NREC' is being used without being defined
Image PC Routine Line Source
myindia4 0000000000897A46 Unknown Unknown Unknown
myindia4 0000000000896C46 Unknown Unknown Unknown
myindia4 0000000000852A4E Unknown Unknown Unknown
myindia4 000000000081F115 Unknown Unknown Unknown
myindia4 000000000082078A Unknown Unknown Unknown
myindia4 000000000070B585 get_ngfld_ 105 get_ngfld.f90
myindia4 000000000063E20B get_data_ 75 get_data.f90
myindia4 0000000000405F07 initial_ 197 initial.f90
myindia4 0000000000404ED8 ocean_control_mod 106 ocean_control.f90
myindia4 00000000004044EF MAIN__ 86 master.f90
myindia4 00000000004043EE Unknown Unknown Unknown
libc.so.6 0000002A95B641C1 Unknown Unknown Unknown
myindia4 000000000040432A Unknown Unknown Unknown
forrtl: severe (193): Run-Time Check Failure. The variable 'get_ngfld_$NREC' is being used without being defined
Image PC Routine Line Source
myindia4 0000000000897A46 Unknown Unknown Unknown
myindia4 0000000000896C46 Unknown Unknown Unknown
myindia4 0000000000852A4E Unknown Unknown Unknown
myindia4 000000000081F115 Unknown Unknown Unknown
myindia4 000000000082078A Unknown Unknown Unknown
myindia4 000000000070B585 get_ngfld_ 105 get_ngfld.f90
myindia4 000000000063E20B get_data_ 75 get_data.f90
myindia4 0000000000405F07 initial_ 197 initial.f90
myindia4 0000000000404ED8 ocean_control_mod 106 ocean_control.f90
myindia4 00000000004044EF MAIN__ 86 master.f90
myindia4 00000000004043EEforrtl: severe (193): Run-Time Check Failure. The variable 'get_ngfld_$NREC' is being used without being defined
Image PC Routine Line Source
myindia4 0000000000897A46 Unknown Unknown Unknown
myindia4 0000000000896C46 Unknown Unknown Unknown
myindia4 0000000000852A4E Unknown Unknown Unknown
myindia4 000000000081F115 Unknown Unknown Unknown
myindia4 000000000082078A Unknown Unknown Unknown
myindia4 000000000070B585 get_ngfld_ 105 get_ngfld.f90
myindia4 000000000063E20B get_data_ 75 get_data.f90
myindia4 0000000000405F07 initial_ 197 initial.f90
myindia4 0000000000404ED8 ocean_control_mod 106 ocean_control.f90
myindia4 00000000004044EF MAIN__ 86 master.f90
myindia4 00000000004043EE Unknown Unknown Unknown
libc.so.6 0000002A95B641C1 Unknown Unknown Unknown
myindia4 000000000040432A Unknown Unknown Unknown
Unknown Unknown Unknown
libc.so.6 0000002A95B641C1 Unknown Unknown Unknown
myindia4 000000000040432A Unknown Unknown Unknown
MPI Application rank 1 exited before MPI_Finalize() with status 193
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libc.so.6 0000002A95B76660 Unknown Unknown Unknown
libmpi.so.1 0000002A956C3DD5 Unknown Unknown Unknown
libmpi.so.1 0000002A956AEE55 Unknown Unknown Unknown
libmpi.so.1 0000002A95711304 Unknown Unknown Unknown
libmpi.so.1 0000002A95707825 Unknown Unknown Unknown
libmpi.so.1 0000002A9571F941 Unknown Unknown Unknown
myindia4 00000000006EB910 mp_bcasti_ 144 distribute.f90
myindia4 000000000070B5AA get_ngfld_ 106 get_ngfld.f90
myindia4 000000000063E20B get_data_ 75 get_data.f90
myindia4 0000000000405F07 initial_ 197 initial.f90
myindia4 0000000000404ED8 ocean_control_mod 106 ocean_control.f90
myindia4 00000000004044EF MAIN__ 86 master.f90
myindia4 00000000004043EE Unknown Unknown Unknown
libc.so.6 0000002A95B641C1 Unknown Unknown Unknown
myindia4 000000000040432A Unknown Unknown Unknown
MPI Application rank 2 exited before MPI_Finalize() with status 193
It is unable to read bulk forcing file but I checked that file and it is ok. I had gone through the get_ngfld.f90 file also, there nrec is defined. The code will go to the 105 line when Iend is zero but my file has 441 I values.
I am not getting why it is showing error and that to the same code had run several times with same file before without any errors. Can anybody help me in this aspect.

Thanks.

with regards,
Francis.

francis
Posts: 21
Joined: Wed Jun 13, 2007 1:22 pm
Location: Indian National Center for Ocean Information Servi

Re: Runtime error with MPI

#9 Unread post by francis »

I came to know that it is due to ifort bug if u give check all as fortran flags. So when I removed it and give CB it ran successfully

Thanks
Francis.

Post Reply