ROMS Code failing in between when it is creating new NetCDF file to write the output

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
koushik
Posts: 12
Joined: Mon Aug 12, 2019 3:29 pm
Location: IISC

ROMS Code failing in between when it is creating new NetCDF file to write the output

#1 Unread post by koushik »

Hello All,

I am facing the below issue --

My ROMS code is failing sometimes in the middle while defining the NetCDF files phase.
It is completing successfully sometimes and sometimes its failing while defining new quick write files after some time steps (different timesteps in different runs).

------------------------- Log file ------------------------
DEF_QUICK - creating quicksave file, Grid 01: /mnt/lustre/cds/cdssen/roms_model/new_roms_model/gb/gb_output_data_parallel/qckspin_trunk_test_par_0009.nc
comment nf90_netcdf4 is called ! Comments Added
comment parallel IO is called ! Comments Added
comment hdf5 is called ! Comments Added
comment distribute is called ! Comments Added
mod_netcdf -> netcdf_create called ! Comments Added
mod_netcdf -> netcdf_enddef called ! Comments Added
mod_netcdf -> netcdf_inq_var called ! Comments Added
mod_netcdf -> netcdf_put_ivar_0d called ! Comments Added
mod_netcdf -> netcdf_put_lvar_0d called ! Comments Added
mod_netcdf -> netcdf_put_lvar_0d called ! Comments Added
-- FAILED HERE ----


----------------------- Error File --------------------------
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libifcoremt.so.5 00002AAAAC444555 for__signal_handl Unknown Unknown
libpthread-2.26.s 00002AAAAE7C9360 Unknown Unknown Unknown
libpthread-2.26.s 00002AAAAE7C7E42 write Unknown Unknown
libmpich_intel.so 00002AAAAB820AB3 ADIOI_CRAY_WriteC Unknown Unknown
libmpich_intel.so 00002AAAAB8063B3 MPIOI_File_write Unknown Unknown
libmpich_intel.so 00002AAAAB8092E1 MPI_File_write_at Unknown Unknown
libhdf5_parallel_ 00002AAAB08DBBDE Unknown Unknown Unknown
libhdf5_parallel_ 00002AAAB0693F77 H5FD_write Unknown Unknown
libhdf5_parallel_ 00002AAAB066DB6F H5F__accum_write Unknown Unknown
libhdf5_parallel_ 00002AAAB07A1C81 H5PB_write Unknown Unknown
libhdf5_parallel_ 00002AAAB067AB69 H5F_block_write Unknown Unknown
libhdf5_parallel_ 00002AAAB061E69F Unknown Unknown Unknown
libhdf5_parallel_ 00002AAAB088D54F H5VM_opvv Unknown Unknown
libhdf5_parallel_ 00002AAAB061D832 Unknown Unknown Unknown
libhdf5_parallel_ 00002AAAB06405C7 H5D__select_write Unknown Unknown
libhdf5_parallel_ 00002AAAB061D6C3 H5D__contig_write Unknown Unknown
libhdf5_parallel_ 00002AAAB06377AB H5D__write Unknown Unknown
libhdf5_parallel_ 00002AAAB0636E6F H5Dwrite Unknown Unknown
libnetcdf_paralle 00002AAAAF250221 NC4_put_vars Unknown Unknown
libnetcdf_paralle 00002AAAAF24FB16 NC4_put_vara Unknown Unknown
libnetcdf_paralle 00002AAAAF1D3080 nc_put_vara_int Unknown Unknown
libnetcdff_parall 00002AAAAB0FDDC2 nf_put_vara_int_ Unknown Unknown
libnetcdff_parall 00002AAAAB13B967 netcdf_mp_nf90_pu Unknown Unknown
oceanM 00000000008EBB0D mod_netcdf_mp_net 4393 mod_netcdf.f90
oceanM 0000000000880BB2 wrt_info_ 343 wrt_info.f90
oceanM 0000000000836CFE def_quick_ 828 def_quick.f90
oceanM 000000000071EBBD output_ 189 output.f90
oceanM 00000000004C21F0 main3d_ 233 main3d.f90
oceanM 0000000000403CB7 ocean_control_mod 179 ocean_control.f90
oceanM 0000000000404837 MAIN__ 108 master.f90
oceanM 0000000000403A52 Unknown Unknown Unknown
libc-2.26.so 00002AAAAEBF9F8A __libc_start_main Unknown Unknown
oceanM 000000000040396A Unknown Unknown Unknown


Can anybody let me know the probable reason for such error and suggest me any changes required ?

Thanks,
Koushik
Attachments
mspin.txt
(132.93 KiB) Downloaded 507 times
gbplume.h
(1.47 KiB) Downloaded 504 times
build.sh
(17.04 KiB) Downloaded 488 times

jcwarner
Posts: 1223
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: ROMS Code failing in between when it is creating new NetCDF file to write the output

#2 Unread post by jcwarner »

looks like it is failing while trying to write a variable into the quick files.
what is on line 4393 of Build/mod_netcdf.f90?
ceanM 00000000008EBB0D mod_netcdf_mp_net 4393 mod_netcdf.f90
look at that section of code and see what it is trying to write.

try to comile with debug=on and that will give oceanG. run that.

maybe try to turn off some of the vars being written to the quick file, and if it works, then one of those vars is the culprit.
etc.

Post Reply