Big update due to IO parallelization and data assimilation

ROMS Code Release Announcements

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Big update due to IO parallelization and data assimilation

#1 Unread post by arango »

This is a big update which will be tagged soon as ROMS/TOMS version 3.2. I have been working for awhile to update compleately the I/O interface to facilitate parallel I/O using the NetCDF4/HDF5 libraries. I just finished phase II of the parallel I/O implementation. This developent will be finished in phase III soon. Since ROMS NetCDF I/O is quite complex, I built and umbrella on top all NetCDF library calls. The purpose is to replace in the code almost all the calls from nf90_*** functions to netcdf_*** calls. All the netcdf_*** umbrella routines are defined in mod_netcdf.F:
  • netcdf_get_dim: inquires about all dimension names and values in a file. It can also be used to just inquire for a particular dimension and value when using the optional DimName argument. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. In distributed-memory, all the information is broadcasted to all parallel nodes.
  • netcdf_check_dim: inquires about all dimension names and values in a file. If a netCDF ID for the file is not provided (optional ncid argument), it will open requested file and will close it when done. In distributed-memory, all the information is broadcasted to all parallel nodes. Then, it checks these values against application dimension parameters for consistency. This is very important when running any application with input NetCDF files.
  • netcdf_close: closes requested file. If appropriate, it peforms additional tasks like updating global attributes. Sometimes the values of few attributes are not known when ROMS created the file.
  • netcdf_create: creates requested file. This is a very important routine for serial or parallel I/O.
  • netcdf_inq_var: inquires about all variables in a file including names, IDs, external data type, number of variable dimensions and IDs, and number of variable attributes. It can also be used to just inquire for a particular variable when using the optional myVarName argument is passed. In such cases, variable dimensions names and values, and variable attributes names and values are also inquired. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. In distributed-memory, all the information is broadcasted to all parallel nodes.
  • netcdf_inq_varid: inquires ID of requested variable. In distributed-memory, the ID is broadcasted to all parallel nodes.
  • netcdf_get_fvar: reads requested floating-point variable (scalar or arrays of rank 1 to 4). It is used to read non-tiled, floating-point data. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. It maybe used to read all values or a section of the data when the optional arguments start and total are passed. The start array contains the index where the first of the data values will be read along each dimension whereas total contains the number of data values to be read along each dimension. It the optional arguments min_val and max_val are present, it will output the minimum and maximun values read. In distributed-memory, the variable data and information are broadcasted to all parallel nodes.
  • netcdf_get_ivar: reads requested integer variable (scalar or arrays of rank 1 to 2). It is used to read non-tiled, integer data. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. It maybe used to read all values or a section of the data when the optional arguments start and total are passed. The start array contains the index where the first of the data values will be read along each dimension whereas total contains the number of data values to be read along each dimension. In distributed-memory, the variable data and information are broadcasted to all parallel nodes.
  • netcdf_get_svar: reads requested character string variable (scalar or arrays of rank 1). If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. It maybe used to read all values or a section of the data when the optional arguments start and total are passed. The start array contains the index where the first of the data values will be read along each dimension whereas total contains the number of data values to be read along each dimension. In distributed-memory, the variable data and information are broadcasted to all parallel nodes.
  • netcdf_open: open an existing file for serial or parallel access. The open mode flag, omode, is used to specify the access mode like read-only access (omode=0) or read and write access (omode=1). In distributed-memory, the file ID ncid is broadcasted to all parallel nodes.
  • netcdf_openid: determines association between pathnames and open files ID. It is used to get the ID of an open file from its name. This is done to avoid opening too many files simultaneously for the same dataset.
  • netcdf_put_fvar: writes requested floating-point variable (scalar or arrays of rank 1 to 4). It is used to write non-tiled, floating-point data. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. If a variable ID (optional varid argument) is not present, it will inquire file for it. It maybe used to write all values or a section of the data. The start array contains the index where the first of the data values will be written along each dimension whereas total contains the number of data values to be written along each dimension. In distributed-memory, the processing information is broadcasted to all parallel nodes.
  • netcdf_put_ivar: writes requested integer variable (scalar or arrays of rank 1 to 2). It is used to write non-tiled, integer data. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. If a variable ID (optional varid argument) is not present, it will inquire file for it. It maybe used to write all values or a section of the data. The start array contains the index where the first of the data values will be written along each dimension whereas total contains the number of data values to be written along each dimension. In distributed-memory, the processing information is broadcasted to all parallel nodes.
  • netcdf_put_svar: writes requested character string variable (scalar or arrays of rank 1). It is used to write non-tiled, floating-point data. If a file ID is not provided (optional ncid argument), it will open requested file and will close it when done. If a variable ID (optional varid argument) is not present, it will inquire file for it. It maybe used to write all values or a section of the data. The start array contains the index where the first of the data values will be written along each dimension whereas total contains the number of data values to be written along each dimension. In distributed-memory, the processing information is broadcasted to all parallel nodes.
  • netcdf_sync: synchronize to disk requested file with in-memory buffer to make data availabe to other process immediately after it is written. Nowadays, it is recommended to have the writer and readers open the file with nf90_share flag to improve performance.
Notice that a new character string variable SourceFile is added to mod_iounits.F. This will allow us to know where an error occurs when calling any of the above generic I/O routines. I am not using the CPP token __FILE__ or __LINE__ here because the string can be very long for users having a very long directory structure. This will result in compilation problems and we will need a lot of extra compilation conditionals. So I just fill the SourceFile variable by hand and provide more information when the call is from a file with multiple routines.


What is New:
  • Modified the number of arguments passed to routines mp_bcastf, mp_bcasti, mp_bcastl, and mp_bcasts. In order to avoid problems with diferent compilers due the module interface in distribute.F, all these broadcast routines now have only three arguments: (ng, model, A). Warning: The number of array elements to broadcast is eliminated :!: Here ng is the nested grid number, model is the calling model identifier used for profiling, and A is the variable to broadcast. These routines broadcast the variable to all processors in the group. If A is an array, all the array elements are broadcasted. Currently, mp_bcastf can be used to broadcast non-tiled, floating-poing variables (scalar or array of rank 1 to 4). In the same way mp_bcasti is used to broadcast integer variables (scalar or array of rank 1 ro 2), mp_bcastl broadcasts logical variables (scalar or array of rank 1), and mp_bcasts broadcast character string variables (scalar or array of rank 1).
  • Corrected bug in mp_bcasti_2d. The external data type argument to routine mpi_bcast was wrong. We have MP_FLOAT (an internal ROMS floating-poin flag) instead of MPI_INTEGER. This was a nasty parallel bug the took several days to find. Different compilers behaved chaotically.
  • Finally, eliminated routine opencdf.F. I wanted to do this for very long time :D This routine was complicated and have many nonrelated capabilities :oops: It is eliminated to facilitate paralle I/O. The same capabilities can be achieved by calling, for example:

    Code: Select all

    !
    !  Inquire about the dimensions and check for consistency.
    !
            CALL netcdf_check_dim (ng, iNLM, ncname)
            IF (exit_flag.ne.NoError) RETURN
    !
    !  Inquire about the variables.
    !
            CALL netcdf_inq_var (ng, iNLM, ncname)
            IF (exit_flag.ne.NoError) RETURN
  • Elimnated obsolete variables associated to opencdf from mod_ncparam.F. They include nvdims, vdims, vflag, nvars, tsize, and type. Equivalent variable can be found in mod_netcdf.F:

    Code: Select all

    !
    !  Local dimension parameters.
    !
          integer, parameter :: Mdims = 50  ! maximun number of dimensions
          integer, parameter :: Mvars = 500 ! maximun number of variables
          integer, parameter :: NvarD = 5   ! number of variable dimensions
          integer, parameter :: NvarA = 10  ! number of variable attributes
    !
    !  Generic information about current NetCDF for all dimensions and
    !  all variables.
    
          integer :: n_dim                  ! number of dimensions
          integer :: n_var                  ! number of variables
          integer :: n_gatt                 ! number of global attributes
          integer :: rec_id                 ! unlimited dimension ID
          integer :: rec_size               ! unlimited dimension value
          integer :: dim_id(Mdims)          ! dimensions ID
          integer :: dim_size(Mdims)        ! dimensions value
          integer :: var_id(Mvars)          ! variables ID
          integer :: var_natt(Mvars)        ! variables number of attributes
          integer :: var_flag(Mvars)        ! Variables water points flag
          integer :: var_type(Mvars)        ! variables external data type
          integer :: var_ndim(Mvars)        ! variables number of dimensions
          integer :: var_dim(NvarD,Mvars)   ! variables dimensions ID
    !
          character (len=40) :: dim_name(Mdims)      ! dimensions name
          character (len=40) :: var_name(Mvars)      ! variables name
    !
    !  Generic information about requested current variable.
    !
          integer :: n_vdim                 ! number of variable dimensions
          integer :: n_vatt                 ! number of variable attributes
          integer :: var_kind               ! external data type
          integer :: var_Dids(NvarD)        ! dimensions ID
          integer :: var_Dsize(NvarD)       ! dimensions values
          integer :: var_Aint(NvarA)        ! attribute integer values
          real(r8) :: var_Afloat(NvarA)     ! attribute float values
    !
          character (len=40) :: var_Aname(NvarA)     ! Attribute names
          character (len=40) :: var_Dname(NvarD)     ! dimension names
          character (len=80) :: var_Achar(NvarA)     ! Attribute char values
    Warning: The information in these variables is temporarily stored and will be overwritten on the next call to the inquiring routines :!:
  • Updated routines get_cycle, get_ngfld, get_ngfldr, get_2dfld, get_2dfldr, get_3dfld, and get_3dfldr. The call to opencdf routine was replaced by the new calls. In ditributed-memory, all the I/O information variables are knonw to all the processor in the group. All these routines are fully parallel.
  • Renamed few routines: frc_AD_adjust to Adjoint/ad_frc_adjust.F, frc_NL_adjust to Nonlinear/frc_adjust.F, frc_RP_adjust to Representer/rp_frc_adjust.F, frc_TL_adjust to Tangent/tl_frc_adjust.F. These routines were located in Utility/frc_adjust.F. It makes more sense to have these routines in their respective directories. Notice that all of them are new routines.
  • Deleted obsolete files frc_adjust.F and tl_ini_adjust.F and routine load_TLforcing.
  • Added a new option ADJUST_BOUNDARY to adjust open boundary conditions in any of the data assimilation algorithms. This required massive changes to all the data assimilation infrastructure. The is still more work to be done. I will code next the spatial convolution for the open boundary conditions error covariance modeling.
  • The error covariance input standard deviation and input/output normalization factors NetCDF files are split in four files each to a total of eight new files. This is done for convinience. We now need files for model, initial conditions, boundary conditions, and surface forcing. Notice that 4DVAR data assimilation input script template s4dvar.in now has:

    Code: Select all

    ! Input model, initial conditions, boundary conditions, and surface forcing
    ! standard deviation file names, [1:Ngrids].
    
           STDnameM == ocean_std_m.nc
           STDnameI == ocean_std_i.nc
           STDnameB == ocean_std_b.nc
           STDnameF == ocean_std_f.nc
    
    ! Input/output model, initial conditions, boundary conditions, and surface
    ! forcing error covariance normalization factors file name, [1:Ngrids].
    
           NRMnameM == ocean_nrm_m.nc
           NRMnameI == ocean_nrm_i.nc
           NRMnameB == ocean_nrm_b.nc
           NRMnameF == ocean_nrm_f.nc
    Notice that the model error covariance files are only need in weak constraint algorithms.
  • Similarly, we need to provide input parameters to model error convariance via spatial convolutions. Notice that s4dvar.in now has the following parameters:

    Code: Select all

    ! Switches (T/F) to create and write error covariance normalization
    ! factors for model, initial conditions, boundary conditions, and
    ! surface forcing. If TRUE, these factors are computed and written
    ! to NRMname(1:4) NetCDF files. If FALSE, they are read from NRMname(1:4)
    ! NetCDF file. The computation of these factors is very expensive and
    ! need to be computed only once for a particular application provided
    ! that grid land/sea masking, and decorrelation scales remains
    ! the same. Notice that four values are needed (1=initial conditions,
    ! 2=model, 3=boundary conditions, 4=surface forcing) per each nested
    ! grid, [4,1:Ngrids].
    
            LdefNRM == F F F F                ! Create a new normalization files
            LwrtNRM == F F F F                ! Compute and write normalization
    
    ! Switches to compute the correlation normalization coefficients for
    ! model error covariance.
    
     CnormM(isFsur) =  T                      ! 2D variable at RHO-points
     CnormM(isUbar) =  T                      ! 2D variable at U-points
     CnormM(isVbar) =  T                      ! 2D variable at V-points
     CnormM(isUvel) =  T                      ! 3D variable at U-points
     CnormM(isVvel) =  T                      ! 3D variable at V-points
     CnormM(isTvar) =  T T                    ! NT tracers
    
    ! Switches to compute the correlation normalization coefficients for
    ! initial conditions error covariance.
    
     CnormI(isFsur) =  T                      ! 2D variable at RHO-points
     CnormI(isUbar) =  T                      ! 2D variable at U-points
     CnormI(isVbar) =  T                      ! 2D variable at V-points
     CnormI(isUvel) =  T                      ! 3D variable at U-points
     CnormI(isVvel) =  T                      ! 3D variable at V-points
     CnormI(isTvar) =  T T                    ! NT tracers
    
    ! Switches to compute the correlation normalization coefficients for
    ! boundary conditions error covariance.
    
     CnormB(isFsur) =  T                      ! 2D variable at RHO-points
     CnormB(isUbar) =  T                      ! 2D variable at U-points
     CnormB(isVbar) =  T                      ! 2D variable at V-points
     CnormB(isUvel) =  T                      ! 3D variable at U-points
     CnormB(isVvel) =  T                      ! 3D variable at V-points
     CnormB(isTvar) =  T T                    ! NT tracers
    
    ! Switches to compute the correlation normalization coefficients for
    ! initial conditions error covariance.
    
     CnormF(isUstr) =  T                      ! surface U-momentum stress
     CnormF(isVstr) =  T                      ! surface V-momentum stress
     CnormF(isTsur) =  T T                    ! NT surface tracers flux
    
    ...
    
    ! Model error covariance: horizontal, isotropic decorrelation scales (m).
    ! This scales are only used in weak-constraint data assimilation.
    
    HdecayM(isFsur) ==  50.0d+3                               ! free-surface
    HdecayM(isUbar) ==  50.0d+3                               ! 2D U-momentum
    HdecayM(isVbar) ==  50.0d+3                               ! 2D V-momentum
    HdecayM(isUvel) ==  50.0d+3                               ! 3D U-momentum
    HdecayM(isVvel) ==  50.0d+3                               ! 3D V-momentum
    HdecayM(isTvar) ==  50.0d+3   50.0d+3                     ! 1:NT tracers
    
    ! Model error covariance: vertical, isotropic decorrelation scales (m).
    
    VdecayM(isUvel) == 100.0d0                                ! 3D U-momentum
    VdecayM(isVvel) == 100.0d0                                ! 3D V-momentum
    VdecayM(isTvar) == 100.0d0   100.0d0                      ! 1:NT tracers
    
    ! Initial conditions error covariance: horizontal, isotropic decorrelation
    ! scales (m).
    
    HdecayI(isFsur) == 100.0d+3                               ! free-surface
    HdecayI(isUbar) == 100.0d+3                               ! 2D U-momentum
    HdecayI(isVbar) == 100.0d+3                               ! 2D V-momentum
    HdecayI(isUvel) == 100.0d+3                               ! 3D U-momentum
    HdecayI(isVvel) == 100.0d+3                               ! 3D V-momentum
    HdecayI(isTvar) == 100.0d+3  100.0d+3                     ! 1:NT tracers
    
    ! Initial conditions error covariance: vertical, isotropic decorrelation
    ! scales (m).
    
    VdecayI(isUvel) == 100.0d0                                ! 3D U-momentum
    VdecayI(isVvel) == 100.0d0                                ! 3D V-momentum
    VdecayI(isTvar) == 100.0d0   100.0d0                      ! 1:NT tracers
    
    ! Boundary conditions error covariance: horizontal, isotropic decorrelation
    ! scales (m). A value is expected for each boundary edge in the following
    ! order:
    !                  1: west  2: south  3: east  4: north
    
    HdecayB(isFsur) == 100.0d+3 100.0d+3 100.0d+3 100.0d+3    ! free-surface
    HdecayB(isUbar) == 100.0d+3 100.0d+3 100.0d+3 100.0d+3    ! 2D U-momentum
    HdecayB(isVbar) == 100.0d+3 100.0d+3 100.0d+3 100.0d+3    ! 2D V-momentum
    HdecayB(isUvel) == 100.0d+3 100.0d+3 100.0d+3 100.0d+3    ! 3D U-momentum
    HdecayB(isVvel) == 100.0d+3 100.0d+3 100.0d+3 100.0d+3    ! 3D V-momentum
    HdecayB(isTvar) == 4*100.0d+3  4*100.0d+3                 ! 1:NT tracers
    
    ! Boundary conditions error covariance: vertical, isotropic decorrelation
    ! scales (m). A value is expected for each boundary edge in the following
    ! order:
    !                 1: west  2: south  3: east  4: north
    
    VdecayB(isUvel) == 100.0d0  100.0d0  100.0d0  100.0d0     ! 3D U-momentum
    VdecayB(isVvel) == 100.0d0  100.0d0  100.0d0  100.0d0     ! 3D V-momentum
    vdecayB(isTvar) == 4*100.d0  4*100.d0                     ! 1:NT tracers
    
    ! Surface forcing error covariance: horizontal, isotropic decorrelation
    ! scales (m).
    
    HdecayF(isUstr) == 100.0d+3                       ! surface U-momentum stress
    HdecayF(isVstr) == 100.0d+3                       ! surface V-momentum stress
    HdecayF(isTsur) == 100.0d+3  100.0d+3             ! 1:NT surface tracers flux
  • A new parameter NOBC is added to all the External/ocean_*.in input scripts to process open boundary conditions 4DVar adjustments. This parameter is similar to NSFF used to adjust surface focing:

    Code: Select all

            NSFF == 720
            NOBC == 720
    NOBC is the Number of time-steps between 4DVar adjustment of open boundary fields. In strong constraint, it is possible to adjust open boundaries at other time intervals in addition to initial time. This parameter is used to store the appropriate number of open boundary records in the output history NetCDF files: 1+NTIMES/NOBC records. NOBC must be a factor of NTIMES or greater than NTIMES. If NOBC > NTIMES, only one record is stored in the NetCDF files and the adjustment is for constant forcing with constant correction. This parameter is only relevant in 4DVar when ADJUST_BOUNDARY is activated.

Post Reply