Custom Query (964 matches)
Results (325 - 327 of 964)
Ticket | Owner | Reporter | Resolution | Summary |
---|---|---|---|---|
#424 | Done | Tuned parallel I/O capabilities | ||
Description |
As the NetCDF-4/HDF5 libraries continue to evolve, I continue looking at the parallel I/O interface in ROMS. The NetCDF 4.1.1 made few changes and optimizations. However, the MPI interface in ROMS is efficient with low overhead making the serial I/O by the master thread very effective. In the past, the efficiency of parallel I/O in ROMS has been affected by various non-tiled variables that are written into output NetCDF files. I looked at this issue again and discovered that it is very inefficient to write characters in parallel I/O. Therefore, I changed the character variables representing logical switches to integers (0=.FALSE. and 1=.TRUE.). This improved the performance. I get nearly the same performance if I write or not these non-tiled variables. I added a routine interface, netcdf_get_lvar, to module mod_netcdf.F to read logical variables into ROMS. It checks if the input variable is an integer (0 or 1) or a character (F or T) and process the data accordingly. The only logical variable that it is needed in ROMS at input is the spherical switch. I also set as default the parallel access to collective for both non-tiled and tiled variables. Now, we have in mod_netcdf.F: integer, parameter :: IO_nontiled_access = 1 ! nf90_collective integer, parameter :: IO_tiled_access = 1 ! nf90_collective The parallel access flags nf90_independent and nf90_collective were missing in module netcdf.mod in early versions of the NetCDF 4.x library. Usually, nf_independent = 0, nf90_independent = 0 nf_collective = 1, nf90_collective = 1 Recall that two modes of parallel I/O access are possible: Independent and Collective. Independent I/O access means that processing do not depend on or be affected by other parallel processes (nodes). Contrarily, Collective I/O access implies that all parallel processes participate during processing. This is the case for tiled variables: each node in the group reads/writes their own tile data when parallel I/O is activated. I ran the ROMS benchmark with grid size: 512x64x30 on my desktop Linux box (2 cores, Xeon chip) and 8 processors with 4x2 partition, 500 steps, I/O every 50 steps in both history and averages files. I get the following timings when the files are written on my desktop disk: serial I/O 817.344u 15.729s 1:45.21 791.8% 0+0k 0+0io 18pf+0w parallel I/O 833.523u 4.140s 1:51.01 754.5% 0+0k 0+0io 17pf+0w The serial I/O is 5.81 elapsed time seconds faster than parallel I/O. Now if I write to another disk through a network file system, I get the following timings: serial I/O 892.785u 15.661s 1:55.96 783.4% 0+0k 0+0io 19pf+0w parallel I/O 1149.900u 4.899s 2:31.70 761.2% 0+0k 0+0io 20pf+0w The serial I/O is 35.74 elapsed time seconds faster than parallel I/O. These values may oscillate depending on the network traffic. Therefore, when benchmarking parallel I/O in ROMS you need to take into account the file system. |
|||
#425 | Fixed | Corrected typo in inp_par.F | ||
Description |
Corrected a typo in inp_par.F around line 3909, we need to have: WRITE (out,185) LtracerSrc(i,ng), 'LtracerSrc', itrc, & & 'Processing point sources/Sink on tracer ', itrc, & & TRIM(Vname(1,idTvar(itrc))) Many thanks to Bert Rubash for reporting this typo. |
|||
#426 | Fixed | small def_avg bug | ||
Description |
Restart failed because no potential vorticity in file. Fix attached if I can. |