Parallel I/O fully working

ROMS Code Release Announcements

Moderators: arango, robertson

Post Reply
User avatar
Site Admin
Posts: 1295
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University

Parallel I/O fully working

#1 Unread post by arango »

I am happy to announce that the parallel I/O with the NetCDF-4/HDF5 libraries is now fully working in ROMS, svn revison 308 :P I was finally able to find and fix a bug that caused to work in some applications and fail in others. It turns out that the value of CPP_options global attribute, that is written in all output files, was not broadcasted to all the MPI nodes in the group. This caused the HDF5 library to hang-up because of the diffent size of the global attribute between all parallel threads. Many thanks to MuQun Yang (HDF group) for his help in finding this bug. The clues that he gave me were great :D

Check the following :arrow: post for more details about parallel I/O in ROMS. The Unidata documentation says that parallel I/O is only possible with NetCDF-4/HDF5 format type files. This is absolutely the case when creating and writing into a file. However, I found out that ROMS can read the old NetCDF-3 classic format file in parallel. This is because of the way that parallel I/O is implemented in ROMS coarse-grained parallel structure. I still do not know how the NetCDF-4/HDF5 libraries are doing this. In all my tests the reading data is broadcasted correctly to all nodes when reading non-tiled data. In parallel I/O each MPI node in the group read their own tile data.

:idea: It is somewhat difficult to support both parallel and serial I/O in ROMS at the same time. It requires a lot of internal checking and duplicate structure for serial and parallel data processing with the appropriate distributed-memory communications.

However, in computers with parallel I/O architecture it maybe problematic or dangerous to read NetCDF-3 classic format files. So it is recommended to convert your input NetCDF-3 classic format files to NetCDF-4/HDF5 format. This is very easy to do with the ncdump and ncgen programs in Unix:

% ncdump4 | ncgen4 -b -o


% ncdump3 | ncgen4 -g -o

Here, ncdump3 is an alias to the ncdump program available in any of the netcdf-3.x libraries and ncdump4 is an alias for same program in the Netcdf-4/HDF5 library. Similarly, ncgen4 is an alias to the ncgen program available in Netcdf-4/HDF5 library. Recall, that all the NetCDF libraries are backward compatible. The above conversion command may take some time depending on the size of the file. Maybe we need to get a faster coverter. I found a python nc3tonc4 script in a Google search. I not very familiar with python. I pretty much dislike what python does to MPICH2, so I am not a python fan.

There are internal time stamps in the NetCDF-4/HDF5 format type files. Therefore is not longer possible to use the Unix diff command to check if files created with different partitions are indentical. This is a nice trick to check if there are parallel bugs in a particular ROMS application. Recall that you need first to activate the ROMS C-preprocessing option DEBUGGING to avoid writing executable global attributes in output files. However, the HDF5 library has the h5diff for such binary comparison. For example, we can compare:

% h5diff

:idea: :idea: It is very likely that you get the following error message after your run finish:

Code: Select all

ROMS/TOMS: DONE... Sunday - February 8, 2009 -  4:22:33 PM
Attempting to use an MPI routine after finalizing MPICH
Attempting to use an MPI routine after finalizing MPICH
Attempting to use an MPI routine after finalizing MPICH
Attempting to use an MPI routine after finalizing MPICH
This is not a ROMS error but a NetCDF-4/HDF5 library error. It seems that either the NetCDF-4 or HDF5 libraries are calling MPI_Finalize() when closing the files and cleanup. I will hope that this is fixed soon. ROMS needs a full control on the MPI_Finalize() call.

Post Reply