Opened 2 years ago

Closed 2 years ago

#910 closed upgrade (Done)

VERY IMPORTANT: ROMS Metadata Overhaul

Reported by: arango Owned by:
Priority: major Milestone: Release ROMS/TOMS 4.1
Component: Nonlinear Version: 4.0
Keywords: Cc:

Description (last modified by arango)

This update is a modernization of the ROMS metadata structure. The files varinfo.dat and coupling_*.dat are deprecated and replaced with YAML files varinfo.yaml and coupling_*.yaml, respectively. The YAML files are simple, easy to follow, elegant, portable, and expandable.

As a transition, the new metadata design still works with the old varinfo.dat. However, if you update to the latest version of ROMS, which now starts at 4.1, I strongly recommend updating your application roms.in to use a varinfo.yaml instead:

! Input variable information file name. This file needs to be processed
! first, so all information arrays can be initialized properly.

     VARNAME = ROMS/External/varinfo.yaml

It is not a significant problem since Users rarely modify the varinfo metadata. However, notice that varinfo have now 641 variable entries. An entry has the following YAML blocking construct:

convention: CF

metadata:

  - variable:       ocean_time                                       # Input/Output
    standard_name:  time
    long_name:      time since initialization
    units:          second                                           # [s]
    field:          time
    time:           ocean_time
    index_code:     idtime
    type:           nulvar
    add_offset:     0.0d0
    scale:          1.0d0

  - variable:       zeta                                             # Input/Output
    standard_name:  sea_surface_elevation_anomaly
    long_name:      free-surface
    units:          meter                                            # [m]
    field:          free-surface
    time:           ocean_time
    index_code:     idFsur
    type:           r2dvar
    add_offset:     0.0d0
    scale:          1.0d0

Notice that now includes the standard_name and add_offset key/value pairs. Although, the add_offset information is not currently used. The field member is redesigned, and it now has a key/value pair that the user can modify if desired.

The field attribute in output NetCDF files can be used for creating labels during post-processing for plotting, visualization, and so on. It can be in any language represented by the 255 ASCII sets of characters, letters, symbols, and signs).

Notice that an alias and anchor are added to the processing of surface shortwave radiation in varinfo.yaml:

#  The surface shortwave radiation flux is handled with a YAML "alias"
#  and "anchor" to facilitate instantaneous or daily-averaged values. If
#  daily-averaged values, you need to activate the DIURNAL_SRFLUX option
#  in ROMS to modulate the shortwave radiation by the local daily cycle
#  at each timestep.

#shortwave:     &SWRAD swrad_daily
shortwave:      &SWRAD swrad

metadata:

# ...

  - variable:       *SWRAD                                           # Input/Output
    standard_name:  net_downward_shortwave_flux_at_sea_water_surface
    long_name:      solar shortwave radiation flux
    units:          watt meter-2                                     # Input:  [Watt/m2]
    field:          shortwave radiation                              # [Celsius m/s]
    time:           srf_time                                         # Output: [Watt/m2]
    index_code:     idSrad
    type:           r2dvar
    add_offset:     0.0d0
    scale:          1.0d0

In the test repository, we have both varinfo.yaml and varinfo_daily.yaml to allow both modes of processing the surface shortwave radiation. In this case, the parser will substitute the anchor SWRAD with the desired alias swrad.

A new CPP option METADATA_REPORT can be used to debug the YAML metadata. It writes the processed YAML dictionary to standard output.

Many thanks to Zafer Defne (USGS) for patiently looking at the standard_name metadata and his suggestions for CF compliance.


Added cell_methods to output time-averaged fields to indicate the data is a mean in the ocean_time coordinate.

  For Example, in roms_avg.nc, we have:

        double phytoplankton(ocean_time, s_rho, eta_rho, xi_rho) ;
                phytoplankton:standard_name = "mole_concentration_of_phytoplankton_expressed_as_nitrogen_in_sea_water" ;
                phytoplankton:long_name = "phytoplankton concentration" ;
                phytoplankton:units = "millimole_nitrogen meter-3" ;
                phytoplankton:time = "ocean_time" ;
                phytoplankton:cell_methods = "ocean_time: mean" ;
                phytoplankton:grid = "grid" ;
                phytoplankton:location = "face" ;
                phytoplankton:coordinates = "x_rho y_rho s_rho ocean_time" ;
                phytoplankton:field = "phytoplankton" ;

  and in roms_dia.nc, we have:

        double temp_hadv(ocean_time, s_rho, eta_rho, xi_rho) ;
                temp_hadv:standard_name = "sea_water_potential_temperature_term_due_to_horizontal_advection" ;
                temp_hadv:long_name = "potential temperature, horizontal advection term" ;
                temp_hadv:units = "Celsius second-1" ;
                temp_hadv:time = "ocean_time" ;
                temp_hadv:cell_methods = "ocean_time: mean" ;
                temp_hadv:grid = "grid" ;
                temp_hadv:location = "face" ;
                temp_hadv:coordinates = "x_rho y_rho s_rho ocean_time" ;
                temp_hadv:field = "temp horizontal advection" ;

  and in roms_his.nc, we have for no-averaging a point value:

        double LdetritusC(ocean_time, s_rho, eta_rho, xi_rho) ;
                LdetritusC:standard_name = "mole_concentration_of_large_detritus_expressed_as_carbon_in_sea_water" ;
                LdetritusC:long_name = "large carbon-detritus concentration" ;
                LdetritusC:units = "millimole_carbon meter-3" ;
                LdetritusC:time = "ocean_time" ;
                LdetritusC:cell_methods = "ocean_time: point" ;
                LdetritusC:grid = "grid" ;
                LdetritusC:location = "face" ;
                LdetritusC:coordinates = "x_rho y_rho s_rho ocean_time" ;
                LdetritusC:field = "large C-detritus" ;


I wrote a standalone YAML parser module, yaml_parser.F in ROMS. It is written in Fortran 2003 and includes a CLASS of type yaml_tree for parsing input YAML files.

Although several YAML parsers for Fortran exist, I coded a more straightforward and uncomplicated parser with substantial capabilities. It is a hybrid between standard and Object-Oriented Programming (OOP) principles but without the need for recurrency, polymorphism, and containers (another library). Check for example:

I use FCKit extensively in the ROMS-JEDI interface. It has all the capabilities available for processing YAML files in C++ and Fortran. It was developed at ECMWF, but it is a third-party library. On the other hand, the Fortran-YAML parser is very complicated and has limited capabilities. Moreover, it is written in a strict OOP style, and the elegance of the OOP principles is somewhat obscured. Similarly, yaFyaml is written in uncompromising OOP style that is easy to follow, but it needs a container library. However, it has more options than Fortran-YAML.

The only constraint in my parser is that the YAML file is read twice for simplicity and to avoid containers. My container is a Fortran vector! The first read determines the number indentation of blanks policy and the length of the collection vector, list(:) pairs object (CLASS yaml_pair). The first reading is quick. Overall, the parser is very fast and works in parallel. All PETs are involved in their dictionary copy to avoid overhead in collective MPI calls.

Currently, it supports the following options:

  • Single or multiple line comments start with a hash #. Also, comment after a key/value pair is allowed. All comments are skipped during processing.
  • It has an Unlimited nested structure (lists, mappings, hierarchies). Indentation of whitespace is used to denote structure.
  • It has an unrestricted schema indentation. However, some schema validators recommend or impose two whitespace indentations.
  • A colon follows a key to denote a mapping value like:
    ocean_model: ROMS
    
  • It supports Aliases and Anchors.
    ATM_component:   &ATM WRF
    
    metadata:
    
      - standard_name:       surface_eastward_wind
        long_name:           surface eastward wind
        short_name:          Uwind
        data_variables:      [uwind, time]
        source_units:        m s-1
        destination_units:   m s-1
        source_grid:         cell_center
        destination_grid:    cell_center
        add_offset:          0.0d0
        scale:               1.0d0
        debug_write:         false
        connected_to:        *ATM                                   # u10
        regrid_method:       bilinear
        extrapolate_method:  none
    
  • It supports blocking lists: members are denoted by a leading hyphen and space, which is considered part of the indentation.
  • It supports a flow sequence: a vector list with values enclosed in square brackets and separated by a comma-and-space, like a keyword: [val1, ..., valN].
  • The keyword value(s) is (are) processed and stored as strings but converted to a logical, integer, floating-point, or derived-type when appropriate during extraction. If particular derived-type values are needed, the caller can process such a structure outside the parser.
  • It removes unwanted control characters like tabs and separators (ASCII character code 0-31).
  • It is restricted to the English uppercase and lowercase alphabet but can be expanded to other characters (see yaml_ValueType routine).
  • Multiple or continuation lines are supported. So, for example, we can have:
    state variables: [sea_surface_height_anomaly,
                      barotropic_sea_water_x_velocity,
                      barotropic_sea_water_y_velocity,
                      sea_water_x_velocity,
                      sea_water_y_velocity,
                      sea_water_potential_temperature,
                      sea_water_practical_salinity]
    

A new routine get_metadata.F shows various strategies for accessing the needed metadata from the YAML tree dictionary.

All the idealized and realistic applications in the test repository were updated to use metadata from YAML files.

Change History (1)

comment:1 by arango, 2 years ago

Description: modified (diff)
Resolution: Done
Status: newclosed
Note: See TracTickets for help on using tickets.