Custom Query (964 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (325 - 327 of 964)

Ticket Owner Reporter Resolution Summary
#424 arango arango Done Tuned parallel I/O capabilities
Description

As the NetCDF-4/HDF5 libraries continue to evolve, I continue looking at the parallel I/O interface in ROMS. The NetCDF 4.1.1 made few changes and optimizations. However, the MPI interface in ROMS is efficient with low overhead making the serial I/O by the master thread very effective.

In the past, the efficiency of parallel I/O in ROMS has been affected by various non-tiled variables that are written into output NetCDF files. I looked at this issue again and discovered that it is very inefficient to write characters in parallel I/O. Therefore, I changed the character variables representing logical switches to integers (0=.FALSE. and 1=.TRUE.). This improved the performance. I get nearly the same performance if I write or not these non-tiled variables.

I added a routine interface, netcdf_get_lvar, to module mod_netcdf.F to read logical variables into ROMS. It checks if the input variable is an integer (0 or 1) or a character (F or T) and process the data accordingly. The only logical variable that it is needed in ROMS at input is the spherical switch.

I also set as default the parallel access to collective for both non-tiled and tiled variables. Now, we have in mod_netcdf.F:

      integer, parameter :: IO_nontiled_access = 1   ! nf90_collective
      integer, parameter :: IO_tiled_access    = 1   ! nf90_collective

The parallel access flags nf90_independent and nf90_collective were missing in module netcdf.mod in early versions of the NetCDF 4.x library. Usually,

             nf_independent = 0,    nf90_independent = 0
             nf_collective  = 1,    nf90_collective  = 1

Recall that two modes of parallel I/O access are possible: Independent and Collective. Independent I/O access means that processing do not depend on or be affected by other parallel processes (nodes). Contrarily, Collective I/O access implies that all parallel processes participate during processing. This is the case for tiled variables: each node in the group reads/writes their own tile data when parallel I/O is activated.

I ran the ROMS benchmark with grid size: 512x64x30 on my desktop Linux box (2 cores, Xeon chip) and 8 processors with 4x2 partition, 500 steps, I/O every 50 steps in both history and averages files. I get the following timings when the files are written on my desktop disk:

serial I/O     817.344u 15.729s 1:45.21 791.8% 0+0k 0+0io 18pf+0w
parallel I/O   833.523u  4.140s 1:51.01 754.5% 0+0k 0+0io 17pf+0w

The serial I/O is 5.81 elapsed time seconds faster than parallel I/O.

Now if I write to another disk through a network file system, I get the following timings:

serial I/O     892.785u 15.661s 1:55.96 783.4% 0+0k 0+0io 19pf+0w
parallel I/O  1149.900u  4.899s 2:31.70 761.2% 0+0k 0+0io 20pf+0w

The serial I/O is 35.74 elapsed time seconds faster than parallel I/O. These values may oscillate depending on the network traffic.

Therefore, when benchmarking parallel I/O in ROMS you need to take into account the file system.

#425 arango arango Fixed Corrected typo in inp_par.F
Description

Corrected a typo in inp_par.F around line 3909, we need to have:

            WRITE (out,185) LtracerSrc(i,ng), 'LtracerSrc', itrc,       &
     &            'Processing point sources/Sink on tracer ', itrc,     &
     &            TRIM(Vname(1,idTvar(itrc)))

Many thanks to Bert Rubash for reporting this typo.

#426 arango kate Fixed small def_avg bug
Description

Restart failed because no potential vorticity in file. Fix attached if I can.

Batch Modify
Note: See TracBatchModify for help on using batch modify.
Note: See TracQuery for help on using queries.