Signed Integer Overflow

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Signed Integer Overflow

#1 Unread post by Fatima »

Dear all,
When I run my model after first timestep, I had this error:

ran1.f90:150:51: runtime error: signed integer overflow: -1832521919 +
-2138245891 cannot be represented in type 'integer(kind=4)'


But my model was run. After 20 timesteps were blow-up with this error
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP

Error termination. Backtrace:
#0 0x7f339e8687c2 in ???
#1 0x7f339e869289 in ???
#2 0x7f339e86a6a7 in ???
#3 0x405443 in myroms
at /home/sharifi/roms/test/upwelling/cilander/Build_romsG/master.f90:104
#4 0x40549a in main
at /home/sharifi/roms/test/upwelling/cilander/Build_romsG/master.f90:50

=================================================================
==95272==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 260930880 byte(s) in 1 object(s) allocated from:
#0 0x7f339ec4491f in __interceptor_malloc (/lib64/libasan.so.6+0xae91f)
#1 0x2c9d728 in __mod_diags_MOD_allocate_diags
/home/sharifi/roms/test/upwelling/cilander/Build_romsG/mod_diags.f90:143
#2 0x29d4bff in __mod_arrays_MOD_roms_allocate_arrays
/home/sharifi/roms/test/upwelling/cilander/Build_romsG/mod_arrays.f90:112
#3 0x40a539 in __roms_kernel_mod_MOD_roms_initialize
/home/sharifi/roms/test/upwelling/cilander/Build_romsG/roms_kernel.f90:122
#4 0x404cd1 in myroms
/home/sharifi/roms/test/upwelling/cilander/Build_romsG/master.f90:75
#5 0x40549a in main
/home/sharifi/roms/test/upwelling/cilander/Build_romsG/master.f90:50
#6 0x7f339dbb555f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

Could you please kindly help me to fix this problem??
Best regards,

Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Re: Signed Integer Overflow

#2 Unread post by Fatima »

Dear all,
I had a runtime error when I ran my model (after the first timestep, this error happened, but my model was not blown up.). Could you please help me to fix it?
ran1.f90:150:51: runtime error: signed integer overflow: -1832521919 + -2138245891 cannot be represented in type 'integer(kind=4)'
best regards,
Fatima

User avatar
robertson
Site Admin
Posts: 219
Joined: Wed Feb 26, 2003 3:12 pm
Location: IMCS, Rutgers University

Re: Signed Integer Overflow

#3 Unread post by robertson »

What compiler are you using? This error is almost certainly due to ROMS trying to take advantage of so called wrap-around addition for random number generation. Line 150 of your ran1.f90 is likely:

Code: Select all

      ranv(1:n)=ieor(nran(1:n),ranv(1:n))+mran(1:n)
Some compilers have decided to treat overflows as an error (usually for security reasons) rather than simply "wrap-around" from INT_MAX to INT_MIN (or vise versa) when the upper or lower limit of a number type (in this case integer(kind=4)) is reached.

If you are using gfortran you can try adding the -fno-range-check compile option to FFLAGS in your Compilers/Linux-gfortran.mk but this is untested and the gfortran documentation suggests that the result will be -Inf or +Inf which is likely to cause other issues but it might be worth a shot.

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Signed Integer Overflow

#4 Unread post by arango »

A great percentage of the posts in this forum do not provide adequate information or do not read carefully the error. As a consequence messages are ignored since we are very busy. With a little curiosity, you will notice that routine ran1.F and module ran_state.F exclusively uses 32-bit integers in the kind parameter i8b defined in ROMS/Modules/mod_kinds.F. Did you changed such defintion to use 16-bit integer representation instead? Or the compiler change it because 32-bit integer representation is not supported with SELECTED_INT_KIND(8).

This is not a ROMS issue but a compiler/architecture issue. If you look carefully at the error, you will notice cannot be represented in type integer(kind=4). There are no such integers used in ran1.F or module ran_state.F .

What compiler and computer are you using?

Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Re: Signed Integer Overflow

#5 Unread post by Fatima »

Dear Arango,
Thank you for your answer. I am using gfortran and fedora (35) Linux. I did not change the definition. How can I solve the problem?
I have access to intel Fortran and a server. Do you think I should switch to the server and intel Fortran, or can I solve this problem?
Best regards,
Fatima

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Signed Integer Overflow

#6 Unread post by arango »

There is no solution other than updating the gfortran compiler or using a different compiler. If you have access to another computer try to see what happens.

Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Re: Signed Integer Overflow

#7 Unread post by Fatima »

Dear Arango,
I switched to the HLRN server and ran my model with
export USE_MPI=on
export USE_MPIF90=on
export FORT=gfortran
I still have this problem. I can't understand why after the first timestep my KEchar = NaN and PEchar =NaN? Could you please help me to fix this problem?



NL ROMS/TOMS: started time-stepping: (Grid: 01 TimeSteps: 000000000001 - 000000000288)


TIME-STEP YYYY-MM-DD hh:mm:ss.ss KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME
C => (i,j,k) Cu Cv Cw Max Speed

0 0001-01-01 00:00:00.00 0.000000E+00 1.699491E+03 1.699491E+03 2.340735E+10
(000,000,00) 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00
DEF_HIS_NF90 - creating history file, Grid 01: roms_his.nc
WRT_HIS_NF90 - wrote history fields (Index=1,1) in record = 1
DEF_QUICK_NF90 - creating quicksave file, Grid 01: roms_qck.nc
WRT_QUICK_NF90 - wrote quicksave fields (Index=1,1) in record = 1
DEF_AVG_NF90 - creating average file, Grid 01: roms_avg.nc
DEF_DIAGS_NF90 - creating diagnostics file, Grid 01: roms_dia.nc
DEF_RST_NF90 - creating restart file, Grid 01: roms_rst.nc
1 0001-01-01 00:03:20.00 NaN NaN NaN NaN
(001,051,30) 2.382794E-07 Infinity 0.000000E+00 -1.000000E+20
Found Error: 01 Line: 321 Source: ROMS/Nonlinear/main3d.F
Found Error: 01 Line: 298 Source: ROMS/Drivers/nl_roms.h, ROMS_run

Blowing-up: Saving latest model state into RESTART file
REASON: KEchar = NaN, PEchar = NaN

WRT_RST_NF90 - wrote re-start fields (Index=1,2) in record = 1

Elapsed wall CPU time for each process (seconds):

Node # 8 CPU: 1.224
Node # 1 CPU: 1.224
Node # 2 CPU: 1.224
Node # 3 CPU: 1.224
Node # 4 CPU: 1.224
Node # 5 CPU: 1.224
Node # 7 CPU: 1.224
Node # 0 CPU: 0.676
Node # 6 CPU: 1.224
Total: 10.471
Average: 1.163
Minimum: 0.676
Maximum: 1.224

Nonlinear model elapsed CPU time profile, Grid: 01

Allocation and array initialization .............. 0.788 ( 7.5230 %)
Ocean state initialization ....................... 5.241 (50.0518 %)
Reading of input data ............................ 0.000 ( 0.0012 %)
Processing of input data ......................... 0.004 ( 0.0417 %)
Processing of output time averaged data .......... 0.000 ( 0.0036 %)
Computation of vertical boundary conditions ...... 0.001 ( 0.0114 %)
Computation of global information integrals ...... 0.039 ( 0.3684 %)
Writing of output data ........................... 2.332 (22.2670 %)
Model 2D kernel .................................. 0.475 ( 4.5406 %)
2D/3D coupling, vertical metrics ................. 0.094 ( 0.9005 %)
Omega vertical velocity .......................... 0.034 ( 0.3231 %)
Equation of state for seawater ................... 0.108 ( 1.0311 %)
3D equations right-side terms .................... 0.076 ( 0.7259 %)
3D equations predictor step ...................... 0.225 ( 2.1527 %)
Pressure gradient ................................ 0.033 ( 0.3113 %)
Harmonic stress tensor, S-surfaces ............... 0.027 ( 0.2542 %)
Corrector time-step for 3D momentum .............. 0.189 ( 1.8044 %)
Corrector time-step for tracers .................. 0.152 ( 1.4477 %)
Total: 9.817 93.7595 %

Unique kernel(s) regions profiled ................ 9.817 93.7595 %
Residual, non-profiled code ...................... 0.653 6.2405 %


All percentages are with respect to total time = 10.471


MPI communications profile, Grid: 01

Message Passage: 2D halo exchanges ............... 0.202 ( 1.9273 %)
Message Passage: 3D halo exchanges ............... 0.123 ( 1.1762 %)
Message Passage: 4D halo exchanges ............... 0.083 ( 0.7889 %)
Message Passage: data broadcast .................. 6.406 (61.1827 %)
Message Passage: data reduction .................. 0.014 ( 0.1315 %)
Message Passage: data gathering .................. 0.537 ( 5.1327 %)
Message Passage: data scattering.................. 0.549 ( 5.2416 %)
Message Passage: point data gathering ............ 0.000 ( 0.0032 %)
Message Passage: synchronization barrier ......... 0.001 ( 0.0088 %)
Total: 7.915 75.5930 %

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Dynamic and Automatic memory (MB) usage for Grid 01: 198x198x30 tiling: 3x3

tile Dynamic Automatic USAGE MPI-Buffers

0 153.24 19.84 173.08 12.22
1 0.00 19.84 19.84 0.00
2 0.00 19.84 19.84 0.00
3 0.00 19.84 19.84 0.00
4 0.00 19.84 19.84 0.00
5 0.00 19.84 19.84 0.00
6 0.00 19.84 19.84 0.00
7 0.00 19.84 19.84 0.00
8 0.00 19.84 19.84 0.00

TOTAL 153.24 178.56 331.80 12.22

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

ROMS/TOMS - Output NetCDF summary for Grid 01:
number of time records written in HISTORY file = 1
number of time records written in RESTART file = 1

Analytical header files used:

ROMS/Functionals/ana_drag.h
ROMS/Functionals/ana_btflux.h
ROMS/Functionals/ana_smflux.h
ROMS/Functionals/ana_stflux.h
ROMS/Functionals/ana_vmix.h

MAIN: Abnormal termination: BLOWUP.
REASON: KEchar = NaN, PEchar = NaN
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_OVERFLOW_FLAG
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_OVERFLOW_FLAG
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_OVERFLOW_FLAG
ERROR STOP

Error termination. Backtrace:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO
ERROR STOP

Error termination. Backtrace:
#0 0x403db5 in ???
#1 0x403a0c in ???
#2 0x2aaaac6ea554 in ???
#3 0x403a5b in ???
#4 0xffffffffffffffff in ???
#0 0x403db5 in ???
#1 0x403a0c in ???
#2 0x2aaaac6ea554 in ???
#3 0x403a5b in ???
#4 0xffffffffffffffff in ???
#0 0x403db5 in ???
#1 0x403a0c in ???
#2 0x2aaaac6ea554 in ???
#3 0x403a5b in ???
#4 0xffffffffffffffff in ???
#0 0x403db5 in ???
#1 0x403a0c in ???
#2 0x2aaaac6ea554 in ???
#3 0x403a5b in ???
#4 0xffffffffffffffff in ???
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[20958,1],0]
Exit code: 1
--------------------------------------------------------------------------
[bcn1007:327393] 10 more processes have sent help message help-opal-shmem-mmap.txt / mmap on nfs
[bcn1007:327393] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
best regards,
Fatima

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Signed Integer Overflow

#8 Unread post by jcwarner »

how about you post the whole output, not just the last part.
there might be something wrong with the init conditions.

Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Re: Signed Integer Overflow

#9 Unread post by Fatima »

Dear John,
Thank you for your answer. Here is my complete output for both fedora Linux and server.
Best regards,
Fatima
Attachments
server.log
(39.4 KiB) Downloaded 195 times
model.log
(27.46 KiB) Downloaded 214 times

jcwarner
Posts: 1172
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Signed Integer Overflow

#10 Unread post by jcwarner »

i am not sure the difference between those 2 output files.

the server.log has
- bathymetry at RHO-points: h
(Grid = 01, File: bowl_grid.nc)
(Min = 2.00018817E-02 Max = 1.00000000E+03)

Minimum Z-grid spacing, DZmin = 6.53658218E-04 m

that is a small dz somewhere. you have thetas =10. maybe adjust the vertical grid spacing so the cells are not so small to start with.
I am not sure if that is a reason for blowup.

the other file of model.log seems to be running.

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Signed Integer Overflow

#11 Unread post by arango »

It seems that you created a grid and initial conditions NetCDF files for the UPWELLING test case. However, your grid is incorrect. Especially, the bathymetry. You are unavailable to understand the information that ROMS is providing you. You need to master the basics of the ROMS vertical grid to design an application. There is plenty of information in wikiROMS and literature. You have 30 vertical levels and the shallowest bathymetry value is 1.75019817E-02. That is 1.7 centimeters. How in the world of modeling with terrain-following coordinates are you going to fit 30 vertical levels in 1.7 cm of water column thickness, and get a stable solution that does not violate the vertical CFL condition? In the original UPWELLING test case, the bathymetry has a range from hmin=25 to hmax=150. meters. Please do your homework and investigate with curiosity the information that ROMS is providing for you. We are really busy and don't have the time to explain and solve your modeling issues at every step. We are aware that the ROMS learning curve is high, but it is with failings, curiosity, and patience that the learning start. We have around 30 idealized and realistic test cases of various complexities for you to understand and practice. Check at your institution for courses or training in geophysical numerical modeling.

Post Reply