YAML Parser

From WikiROMS
Jump to navigationJump to search
YAML Parser

Starting svn revision -r 1902 released on March 1, 2022, the ROMS metadata is managed with a YAML file, and the regular text file varinfo.dat is deprecated. The YAML files are simple, easy to follow, elegant, portable, and expandable. ROMS now can process YAML files with its parser module, yaml_parser.F. Therefore, there is no need to use third-party YAML parsers.

The ROMS YAML parser source code can be found in ROMS/Utility. It is written in Fortran 2003 and includes a CLASS of type yaml_tree for parsing input YAML files.

Introduction

Although several YAML parsers for Fortran exist, a more straightforward and uncomplicated parser with substantial capabilities was coded in ROMS, yaml_parser.F. It is a hybrid between standard and Object-Oriented Programming (OOP) principles but without the need for recurrency, polymorphism, and containers (another library).

The only constraint in the ROMS parser is that the YAML file is read twice for simplicity and to avoid containers. The container is a Fortran vector! The first read determines the number indentation of blanks policy and the length of the collection vector, list(:) pairs object (derived-type structure yaml_pair). The first reading is quick.

TYPE, PUBLIC :: yaml_pair

logical :: has_alias ! alias * token
logical :: has_anchor ! anchor & token
logical :: is_block ! block - list
logical :: is_sequence ! sequence [] tokens
logical :: is_logical ! logical value
logical :: is_integer ! integer value
logical :: is_real ! floating-point value
logical :: is_string ! string value

integer :: id ! key/value ID
integer :: parent_id ! parent ID
integer :: left_padding ! indent level: 0,1,..

character (len=:), allocatable :: line ! YAML line
character (len=:), allocatable :: key ! YAML keyword:
character (len=:), allocatable :: value ! YAML value(s)
character (len=:), allocatable :: anchor ! anchor keyword

END TYPE yaml_pair

The YAML file dictionary CLASS yaml_tree is defined as:

TYPE, PUBLIC :: yaml_tree

integer :: Nbranches ! total number of branches
integer :: Npairs ! total number of pairs
integer :: indent ! blank indentation policy

character (len=:), allocatable :: filename ! YAML file name

TYPE (yaml_pair), pointer :: list(:) ! collection pairs

CONTAINS ! CLASS objects

PROCEDURE :: create => yaml_tree_create
PROCEDURE :: destroy => yaml_tree_destroy
PROCEDURE :: dump => yaml_tree_dump
PROCEDURE :: extract => yaml_tree_extract
PROCEDURE :: fill => yaml_tree_fill
PROCEDURE :: fill_aliases => yaml_tree_fill_aliases
PROCEDURE :: has => yaml_tree_has
PROCEDURE :: read_line => yaml_tree_read_line

END TYPE yaml_tree

The yaml_tree object stores all the data contained in a specific YAML file. For Example, in ROMS the input YAML metadata dictionary is created and initialized as follows:

USE yaml_parser_mod, ONLY : yaml_initialize

logical :: Lreport ! verbose report switch
integer :: ErrorFlag ! processing error flag

TYPE (yaml_tree) :: self ! declare a dummy YAML object

IF (.not.ASSOCIATED(self%list)) THEN ! process input YAML file
Lreport = .TRUE.
ErrorFlag = yaml_initialize (self, 'varinfo.yaml', Lreport)
END IF

The error management is suppressed for clarity. Then, the needed data is extracted from the self object and loaded to the internal ROMS variables using overloaded API yaml_get:

INTERFACE yaml_get

MODULE PROCEDURE yaml_Get_i_struc ! Gets integer structure
MODULE PROCEDURE yaml_Get_l_struc ! Gets logical structure
MODULE PROCEDURE yaml_Get_r_struc ! Gets real structure
MODULE PROCEDURE yaml_Get_s_struc ! Gets string structure

MODULE PROCEDURE yaml_Get_ivar_0d ! Gets integer value
MODULE PROCEDURE yaml_Get_ivar_1d ! Gest integer values
MODULE PROCEDURE yaml_Get_lvar_0d ! Gets logical value
MODULE PROCEDURE yaml_Get_lvar_1d ! Gets logical values
MODULE PROCEDURE yaml_Get_rvar_0d ! Gets real value
MODULE PROCEDURE yaml_Get_rvar_1d ! Gets real values
MODULE PROCEDURE yaml_Get_svar_0d ! Gets string value
MODULE PROCEDURE yaml_Get_svar_1d ! Gets string values

END INTERFACE yaml_get

. Overall, the parser is very fast and works in parallel. All PETs are involved in their dictionary copy to avoid overhead in collective MPI calls.

Capabilities

Currently, it supports the following options:

  • Single or multiple line comments start with a hash #. Also, comment after a key/value pair is allowed. All comments are skipped during processing.
  • It has an unlimited nested structure (lists, mappings, hierarchies). Indentation of whitespace is used to denote structure.
  • It has an unrestricted schema indentation. However, some schema validators recommend or impose two whitespace indentations.
  • A colon follows a key to denote a mapping value like:
    ocean_model: ROMS
  • It supports Anchors and Aliases.
    ATM_component: &ATM WRF

    metadata:

    - standard_name: surface_eastward_wind
    long_name: surface eastward wind
    short_name: Uwind
    data_variables: [uwind, time]
    source_units: m s-1
    destination_units: m s-1
    source_grid: cell_center
    destination_grid: cell_center
    add_offset: 0.0d0
    scale: 1.0d0
    debug_write: false
    connected_to: *ATM # u10
    regrid_method: bilinear
    extrapolate_method: none
  • It supports blocking lists: members are denoted by a leading hyphen-and-space, which is considered part of the indentation.
  • It supports a flow sequence: a vector list with values enclosed in square brackets and separated by a comma-and-space, like a keyword: [val1, ..., valN].
  • The keyword value(s) is (are) processed and stored as strings but converted to a logical, integer, floating-point, or derived-type when appropriate during extraction. If particular derived-type values are needed, the caller can process such a structure outside the parser.
  • It removes unwanted control characters like tabs and separators (ASCII character code 0-31).
  • It is restricted to the English uppercase and lowercase alphabet but can be expanded to other characters (see yaml_ValueType routine).
  • Multiple or continuation lines are supported. So, for example, we can have:
    state variables: [sea_surface_height_anomaly,
    barotropic_sea_water_x_velocity,
    barotropic_sea_water_y_velocity,
    sea_water_x_velocity,
    sea_water_y_velocity,
    sea_water_potential_temperature,
    sea_water_practical_salinity]