Opened 3 years ago

Closed 3 years ago

#884 closed upgrade (Done)

VERY IMPORTANT: Developments with Parallel I/O (PIO) library

Reported by: arango Owned by:
Priority: major Milestone: Release ROMS/TOMS 4.0
Component: Parallelism Version: 3.9
Keywords: Cc:

Description (last modified by arango)

This is a massive update to ROMS I/O to allow options to process input and output NetCDF files with either the Parallel-IO (PIO) library developed at NCAR (Hartnett and Edwards, 2021; unpublished paper) or the Software for Cashing Output and Reads for Parallel I/O (SCORPIO) library available in E3SM (Sreepathi et al., 2013; doi:10.1109/HiPC.2013.6799128). The PIO or SCORPIO libraries are primarily intended for ROMS distributed-memory (MPI) applications running on a large number of processes in an HPC system with a Parallel File System (like Lustre, GPFS, and so on) for high-performance I/O. The PIO and SCORPIO library uses the MPI-IO interface to facilitate the partitioning of the data across computational or dedicated I/O processes.

The SCORPIO library was forked from the PIO library several years ago and evolved separately. The generic interface for parallel I/O in ROMS works for both the PIO or SCORPIO libraries and available by activating the PIO_LIB C-preprocessing option. However, we recommend using the PIO library because it is more efficient than SCORPIO in processing I/O in our benchmark tests.

For more information about ROMS I/O and how to acquire and compile the PIO and SCORPIO libraries check the following pages in WikiROMS:

We expect our advanced users with access to supercomputers with Parallel File System to help us benchmark and improve the new Parallel I/O capabilities in ROMS.


There are two modes of parallel I/O in PIO AND SCORPIO:

  1. Synchronous: MPI intra-communications mode. A subset or all processes does I/O and also computations. The user specifies the total number of I/O tasks and how they are distributed across the HPC nodes as a function of the ROMS MPI-communicator object, OCN_COMM_WORLD. It is often desirable to shift the first I/O task away from the first computation task since it has higher memory requirements than other processes. If the MPI processes are spread over several computer nodes, it is highly recommended to spread all I/O tasks over all nodes. Avoid all I/O processes occupying the same node.
  1. Asynchronous: MPI inter-communications mode. The I/O tasks are a disjointed set of dedicated I/O processes and do not perform computations. It is possible to have groups of computational units running separate models (coupling) where all the I/O data are sent to dedicated processes. In ROMS, the asynchronous mode is possible by activation either ASYNCHRONOUS_PIO or ASYNCHRONOUS_SCORPIO. However, this capability is still under development and not recommended for use at this time.

In parallel I/O, writing is usually a more frequent and more complicated operation than reading. Generally, there are four strategies for writing (Mendez et al., 2019; Preprint):

  1. Single file, single writer: serial I/O. It is the default strategy in ROMS using the NetCDF3 or NetCDF4 libraries.
  1. Single file, multiple writers: parallel I/O with each tile writing its data to a single file. In ROMS, this capability is achieved by activating PARALLEL_IO and HDF5. It is only possible with the NetCDF4/HDF5 libraries.
  1. Single file, collective writers: Parallel I/O with either one or a subset of processes performing I/O operations. The I/O operations can be synchronous or asynchronous. In ROMS, this capability uses the PIO or SCORPIO libraries and available when PIO_LIB is activated.
  1. Multiple files, multiple writers: Each tile writes its own file, but post-processing is required. Currently, this capability is not available in ROMS, but it can be easily implemented with the changes introduced in this update.

What Is New:

The changes to ROMS are numerous but subtle, and around 285 files were changed to include additional code related to the PIO implementation.

  • Several files were renamed for uniqueness in coupled systems. The ocean descriptor is renamed to roms:

               OLD                               NEW
-----------------------------------  --------------------------------
Master/ocean.h                       Master/roms.h
Master/ocean_control.F               Master/roms_kernel.F
ROMS/Drivers/ad_ocean.h              ROMS/Drivers/ad_roms.h
ROMS/Drivers/adsen_ocean.h           ROMS/Drivers/adsen_roms.h
ROMS/Drivers/afte_ocean.h            ROMS/Drivers/afte_roms.h
ROMS/Drivers/fsv_ocean.h             ROMS/Drivers/fsv_roms.h
ROMS/Drivers/fte_ocean.h             ROMS/Drivers/fte_roms.h
ROMS/Drivers/hessian_op_ocean.h      ROMS/Drivers/hessian_op_roms.h
ROMS/Drivers/hessian_so_ocean.h      ROMS/Drivers/hessian_so_roms.h
ROMS/Drivers/i4dvar_ocean.h          ROMS/Drivers/i4dvar_roms.h
ROMS/Drivers/nl_ocean.h              ROMS/Drivers/nl_roms.h
ROMS/Drivers/op_ocean.h              ROMS/Drivers/op_roms.h
ROMS/Drivers/optobs_ocean.h          ROMS/Drivers/optobs_roms.h
ROMS/Drivers/pert_ocean.h            ROMS/Drivers/pert_roms.h
ROMS/Drivers/picard_ocean.h          ROMS/Drivers/picard_roms.h
ROMS/Drivers/r4dvar_ocean.h          ROMS/Drivers/r4dvar_roms.h
ROMS/Drivers/rbl4dvar_ocean.h        ROMS/Drivers/rbl4dvar_roms.h
ROMS/Drivers/rp_ocean.h              ROMS/Drivers/rp_roms.h
ROMS/Drivers/so_ocean.h              ROMS/Drivers/so_roms.h
ROMS/Drivers/so_semi_ocean.h         ROMS/Drivers/so_semi_roms.h
ROMS/Drivers/sp4dvar_ocean.h         ROMS/Drivers/sp4dvar_roms.h
ROMS/Drivers/split_i4dvar_ocean.h    ROMS/Drivers/split_i4dvar_roms.h
ROMS/Drivers/split_r4dvar_ocean.h    ROMS/Drivers/split_r4dvar_roms.h
ROMS/Drivers/split_rbl4dvar_ocean.h  ROMS/Driver/split_rbl4dvar_roms.h
ROMS/Drivers/split_sp4dvar_ocean.h   ROMS/Drivers/split_sp4dvar_roms.h
ROMS/Drivers/tl_ocean.h              ROMS/Drivers/tl_roms.h
ROMS/Drivers/tl_r4dvar_ocean.h       ROMS/Drivers/tl_r4dvar_roms.h
ROMS/Drivers/tl_rbl4dvar_ocean.h     ROMS/Drivers/tl_rbl4dvar_roms.h
ROMS/Drivers/tlcheck_ocean.h         ROMS/Drivers/tlcheck_roms.h
  • Added the following include files for Biology and Sediment:

    • ROMS/Nonlinear/Biology/ecosim_def_pio.h
    • ROMS/Nonlinear/Biology/ecosim_wrt_pio.h
    • ROMS/Nonlinear/Biology/fennel_def_pio.h
    • ROMS/Nonlinear/Biology/fennel_wrt_pio.h
    • ROMS/Nonlinear/Biology/hypoxia_srm_def_pio.h
    • ROMS/Nonlinear/Biology/hypoxia_srm_wrt_pio.h
    • ROMS/Nonlinear/Biology/nemuro_def_pio.h
    • ROMS/Nonlinear/Biology/nemuro_wrt_pio.h
    • ROMS/Nonlinear/Biology/npzd_Franks_def_pio.h
    • ROMS/Nonlinear/Biology/npzd_Franks_wrt_pio.h
    • ROMS/Nonlinear/Biology/npzd_Powell_def_pio.h
    • ROMS/Nonlinear/Biology/npzd_Powell_wrt_pio.h
    • ROMS/Nonlinear/Biology/npzd_iron_def_pio.h
    • ROMS/Nonlinear/Biology/npzd_iron_wrt_pio.h
    • ROMS/Nonlinear/Biology/oyster_floats_def_pio.h
    • ROMS/Nonlinear/Biology/oyster_floats_wrt_pio.h
    • ROMS/Nonlinear/Biology/red_tide_def_pio.h
    • ROMS/Nonlinear/Biology/red_tide_wrt_pio.h
    • ROMS/Nonlinear/Sediment/sediment_def_pio.h
    • ROMS/Nonlinear/Sediment/sediment_wrt_pio.h
  • The standard input file roms.in was modified to include all the parameters necessary for Parallel I/O:
    ! Input and Output files processing library to use:
    !
    !   [1] Standard NetCDF-3 or NetCDF-4 library
    !   [2] Serial or Parallel I/O with Parallel-IO (PIO) library (MPI only)
    
         INP_LIB =  1
         OUT_LIB =  1
    
    ! PIO library methods for reading/writing NetCDF files:
    !
    !   [0] parallel read and write of PnetCDF (CDF-5, not recommended)
    !   [1] parallel read and write of NetCDF3 (64-bit offset)
    !   [2] serial   read and write of NetCDF3 (64-bit offset)
    !   [3] parallel read and serial write of NetCDF4/HDF5
    !   [4] parallel read and write of NETCDF4/HDF5
    
      PIO_METHOD =  2
    
    ! PIO library MPI processes set-up:
    
     PIO_IOTASKS =  1                 ! number of I/O tasks to define
      PIO_STRIDE =  1                 ! stride in the MPI-ran between I/O tasks
        PIO_BASE =  0                 ! offset for the first I/O task
      PIO_AGGREG =  1                 ! number of MPI-aggregators to use
    
    ! PIO library rearranger methods for moving data between computational and I/O
    ! processes:
    !
    !   [1] Box rearrangement
    !   [2] Subset rearrangement
    
       PIO_REARR =  1
    
    ! PIO library rearranger flag for MPI communications between computational
    ! and I/O processes:
    !
    !   [0] Point-to-Point (low-level communications)
    !   [1] Collective (high-level grouped communications)
    
    PIO_REARRCOM =  0
    
    ! PIO library rearranger flow control direction flag for MPI communications
    ! between computational and I/O processes:
    !
    !   [0] Enable computational to I/O processes, and vice versa
    !   [2] Enable computational to I/O processes only
    !   [3] Enable I/O to computational processes only
    !   [4] Disable flow control
    
    PIO_REARRDIR = 0
    
    ! PIO rearranger options for computational to I/O processes (C2I):
    
      PIO_C2I_HS = T                  ! Enable C2I handshake (T/F)
    PIO_C2I_Send = T                  ! Enable C2I Isends (T/F)
    PIO_C2I_Preq = 64                 ! Maximum pending C2I requests
    
    ! PIO rearranger options for I/O to computational processes (I2C):
    
      PIO_I2C_HS = T                  ! Enable I2C handshake (T/F)
    PIO_I2C_Send = T                  ! Enable I2C Isends (T/F)
    PIO_I2C_Preq = 65                 ! Maximum pending I2C requests
    
    Check the GLOSSARY for the detailed description of these parameters. Notice the ROMS supports the Standard NetCDF library and the Parallel I/O at the same time. The user choose the library to use at run time for input (INP_LIB) and output (OUT_LIB) files. All these parameters are documented in the roms.in wiki page.
  • Added module set_pio.F to initialize and configure Parallel I/O. And the module mod_pio_netcdf.F including all the interfaces for same name routine overloading similar to the mod_netcdf.F for the standard NetCDF library.
  • The T_IO structure in mod_param.F was modified to include new fields for file IO type (IOtype), file descriptor (pioFile), and variables descriptors (pioVar(:) and pioTrc(:)).
  • The changes to all I/O routines (now modules) have the following design: (history file example)
    !
    !***********************************************************************
         SUBROUTINE def_his (ng, ldef)
    !***********************************************************************
    !
    !  Imported variable declarations.
    !
          logical, intent(in) :: ldef
    !
          integer, intent(in) :: ng
    !
    !  Local variable declarations.
    !
          character (len=*), parameter :: MyFile =                          &
         &  __FILE__
    !
    !-----------------------------------------------------------------------
    !  Create a new history file according to IO type.
    !-----------------------------------------------------------------------
    !
          SELECT CASE (HIS(ng)%IOtype)
            CASE (io_nf90)
              CALL def_his_nf90 (ng, ldef)
    
    #if defined PIO_LIB && defined DISTRIBUTE
            CASE (io_pio)
              CALL def_his_pio (ng, ldef)
    #endif
            CASE DEFAULT
              IF (Master) THEN
                WRITE (stdout,10) HIS(ng)%IOtype
      10        FORMAT (' DEF_HIS - Illegal output type, io_type = ',i0)
              END IF
              exit_flag=3
          END SELECT
          IF (FoundError(exit_flag, NoError, __LINE__, MyFile)) RETURN
    !
          RETURN
          END SUBROUTINE def_his
    
    So the subroutine def_his_nf90 has calls to the Standard NetCDF Fortran90 API. Contrarily, the subroutine def_his_pio call the PIO Fortran90 API.

Similarly, all the writing routines (now modules) have the following design:

!
!***********************************************************************
      SUBROUTINE wrt_his (ng, tile)
!***********************************************************************
!
!  Imported variable declarations.
!
      integer, intent(in) :: ng, tile
!
!  Local variable declarations.
!
#ifdef ADJUST_BOUNDARY
      integer :: LBij, UBij
#endif
      integer :: LBi, UBi, LBj, UBj
!
      character (len=*), parameter :: MyFile =                          &
     &  __FILE__
!
!-----------------------------------------------------------------------
!  Write out history fields according to IO type.
!-----------------------------------------------------------------------
!
#ifdef ADJUST_BOUNDARY
      LBij=BOUNDS(ng)%LBij
      UBij=BOUNDS(ng)%UBij
#endif
      LBi=BOUNDS(ng)%LBi(tile)
      UBi=BOUNDS(ng)%UBi(tile)
      LBj=BOUNDS(ng)%LBj(tile)
      UBj=BOUNDS(ng)%UBj(tile)
!
      SELECT CASE (HIS(ng)%IOtype)
        CASE (io_nf90)
          CALL wrt_his_nf90 (ng, tile,                                  &
#ifdef ADJUST_BOUNDARY
     &                       LBij, UBij,                                &
#endif
     &                       LBi, UBi, LBj, UBj)

#if defined PIO_LIB && defined DISTRIBUTE
        CASE (io_pio)
          CALL wrt_his_pio (ng, tile,                                   &
# ifdef ADJUST_BOUNDARY
     &                      LBij, UBij,                                 &
# endif
     &                      LBi, UBi, LBj, UBj)
#endif
        CASE DEFAULT
          IF (Master) WRITE (stdout,10) HIS(ng)%IOtype
          exit_flag=3
      END SELECT
      IF (FoundError(exit_flag, NoError, __LINE__, MyFile)) RETURN
!
  10  FORMAT (' WRT_HIS - Illegal output type, io_type = ',i0)
!
      RETURN
      END SUBROUTINE wrt_his
  • All readers (nf_fread*.F) and writers (nf_fwrite*.F) were enhanced to include the PIO reading and writing subroutines.
  • Added a new bitsum subroutine to get_hash.F. It is now the default method to compute the checksum. The floating-point data is converted to integers to facilitate the invariant order of the sum in tiled parallel applications (Hallberg and Adcroft, 2014; doi:10.1016/j.parco.2014.04.007). It allows MPI reduction operations for I/O processed with the PIO library.

Many thanks to Guangliang Liu and M.S. Yuhu Chen from the Shandong Computer Science Center (Jinan, China), who brought to my attention the SCORPIO library and provided their implementation in ROMS output files. After researching and checking the literature, I decided to start from scratch and implement the basic PIO API differently and generically. The PIO library was implemented to the entire ROMS I/O for both reading and writing.

Change History (3)

comment:1 by arango, 3 years ago

Description: modified (diff)

comment:2 by arango, 3 years ago

Description: modified (diff)

comment:3 by arango, 3 years ago

Component: NonlinearParallelism
Description: modified (diff)
Resolution: Done
Status: newclosed
Note: See TracTickets for help on using tickets.