It is failing in the reading of "u", specifically in the floating point attributes of "u". This is a new initial file I made the same way as the last one which ROMS has read many times. The above failure was with ifort, trying again with gfortran doesn't fail at all, so I'm chalking it up to a compiler bug.
Hi Kate,
in my experience it has never happened that the compiler was wrong. Bug detected for one compiler but not for the other means bug.
In order to detect the bug in gfortran you can use compilation option -fcheck=all -fsanitize=address -fsanitize=undefined.
For Intel Fortran Compiler, options are -check all -warn interfaces,nouncalled -gen-interface.
There are other options for detecting NaN in the computation.
Thanks - those "check all" flags are scary! Both compilers warn about creating temporary arrays when reading parameter files (read_phypar, read_stapar, etc).
Ifort still fails in nf_fread3d when calling netcdf_get_fatt.
gfortran now fails in wclock_on because it is a nonrecursive procedure being called recursively (from the mp_barrier in there).
The temporary arrays are when you pass a A(1,:) array to a subroutine. Since the values are not aligned there is a need for a new array which of course slows things down. But it is no problem if done only in the input parameter reading.
It is of course a problem if wclock_on is called recursively. Solution to that is to declare a "RECURSIVE SUBROUTINE".
The fact that the error occurs in netcdf_get_fatt means that the bug happens in the netcdf routine itself. So, two possibilities:
(A) The bug is in the netcdf routine itself (rather unlikely). Then one needs to compile the netcdf itself with check all. Hard work to do that.
(B) Print the input to the function netcdf_get_fatt. Long time ago I had random errors occurring because of pointers erased by a previous call to a function. This pointer erasure can happen before the call to netcdf_get_fatt and create the problem. Since the compilers are free to organize memory as they want this can explain why it can work with gfortran but not for ifort.
I'm happy to ignore warnings during initialization.
Thanks, would have gotten to adding the recursive modifier, but had to leave yesterday. The gfortran case is now running past that.
The netcdf_get_fatt thing happens in the debugger when stepping into netcdf_get_fatt from nf_fread3d.
I can see the values of all eight arguments to netcdf_get_fatt and they are all fine. netcdf_get_fatt is a ROMS routine, so I should be able to step into it but no, that's when the error occurs for ifort.
I've been around long enough to believe in compiler bugs, no question.
Hernan, a remark on your point on "wrap-around integer". Actually in Fortran (and C/C++) the integer overflow is undefined behavior. See for example https://stackoverflow.com/questions/405 ... r-overflow
So, gfortran is right to stop at that.
Hi Kate,
have you managed to run this app with ifort?
I am building a metroms on recently deployed supercomputer (Betzy, in Norway), and run exactly into the same error the same place (reading attributes for u)
the toolchain I use on new supercomputer is the same as on previous.
Another question -- I remember you also use metroms, have you tried to build metroms with gfortran as well?
Oh gosh, that was two and a half years ago! I have no memory of it whatsoever.
As for metroms, I haven't played with that lately either. I might have to go back to it if I can't get this other monster (CESM) working on our supercomputer.
ok, back to the case,
well, in my case this error was due to stack size on our new supercomputer,
so it is solved by setting it to unlimited:
ulimit -s unlimited
Hi
I try to run my model in fedora 30 and i used gfortran without mpi. I have this error
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f92562bfd51 in ???
#1 0x7f92562bef15 in ???
#2 0x7f92555fbf3f in ???
#3 0x7f92565b17f7 in ???
#4 0x319998a in __mod_netcdf_MOD_netcdf_create
at /home/obuntooo/roms/upwelling1/Build_romsG/mod_netcdf.f90:5908
#5 0x2010b19 in def_his_nf90
at /home/obuntooo/roms/upwelling1/Build_romsG/def_his.f90:121
#6 0x20955b0 in __def_his_mod_MOD_def_his
at /home/obuntooo/roms/upwelling1/Build_romsG/def_his.f90:57
#7 0x52a648 in output_
at /home/obuntooo/roms/upwelling1/Build_romsG/output.f90:141
#8 0x41561e in main3d_
at /home/obuntooo/roms/upwelling1/Build_romsG/main3d.f90:235
#9 0x408b39 in __roms_kernel_mod_MOD_roms_run
at /home/obuntooo/roms/upwelling1/Build_romsG/roms_kernel.f90:175
#10 0x40531b in myroms
at /home/obuntooo/roms/upwelling1/Build_romsG/master.f90:86
#11 0x405462 in main
at /home/obuntooo/roms/upwelling1/Build_romsG/master.f90:50
Segmentation fault (core dumped)
please help me to fix it. Thank you.