4DVAR Observations Metadata Design

Discussion about tangent linear and adjoint models, variational data assimilation, and other related issues.

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
arango
Site Admin
Posts: 1351
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

4DVAR Observations Metadata Design

#1 Unread post by arango »

Hi All,

The observations are stored in a single NetCDF file. This file also contains information about the observation error, latest two iterations of nonlinear and tangent linear model values at observation locations, and the interpolation weights. The unlimited dimension, datum, is used to store each observation in space and time.

The NetCDF dimensions are:

Code: Select all

  record                     Number of saved iteration records
  survey                     Number of different time surveys
  weight                     Number of interpolation weights
  datum                      Observations counter, unlimited dimension
The record dimension is always equal to two. This dimension is used to store the latest two iteration values of the model data at observation locations. The latest two records are used in the descent algorithm to compute the extrema step-size search for each iteration. The survey dimension indicates how many different survey times are available in the data set. The weight dimension is a function of the interpolation scheme. Currently, only linear interpolation is supported. It has a value of four in 2D configurations and eight in 3D configurations. The datum dimension is the unlimited dimension and it is equal to the total number, in space and time, of observation values available in the data set. Since this file is processed sequentially forward and backward, all the observation data must be sorted and stored in ascending time order.

The NetCDF variables are:

Code: Select all

  spherical                    Grid type logical switch (T/F)
  Nobs(survey)                 Number of observations per time survey
  survey_time(survey)          Survey time (days)
  obs_type(datum)              State variable ID associated with observation
  obs_time(datum)              Time of observation (days)
  obs_lon(datum)               Longitude of observation (degrees_east)
  obs_lat(datum)               Latitude of observation (degrees_north)
  obs_depth(datum)             Depth of observation (meters or level)
  obs_Xgrid(datum)             X-grid observation location (nondimensional)
  obs_Ygrid(datum)             Y-grid observation location (nondimensional)
  obs_Zgrid(datum)             Z-grid observation location (nondimensional)
  obs_error(datum)             Observation error, assigned weight
  obs_value(datum)             Observation value
  NLmodel_value(record,datum)  Nonlinear model interpolated value
  TLmodel_value(record,datum)  Tangen linear model interpolated value
  Hmat(weight,datum)           Interpolation weights
The number of observations per survey, Nobs, is used to process the appropriate number observations (and compute start and end datum indices) at particular time in the model time trajectory. The survey_time variable indicates the different survey times and it is used in the model to activate internal switches to process the observations available within a single time-step. Each observation has an identification integer flag, obs_type. This flag has multiple purposes. For example, it is used when computing the misfit cost function and misfit adjoint forcing terms. Currently, its value is associated with model state variables ID, as follows:

Code: Select all

  obs_type = 1               Free-surface
  obs_type = 2               Vertically-integrated u-momentum component
  obs_type = 3               Vertically-integrated v-momentum component
  obs_type = 4               Total u-momentum component
  obs_type = 5               Total v-momentum component
  obs_type = 6               Potential temperature
  obs_type = 7               Salinity
  obs_type = ...             Other passive tracers, NAT+1:NT
Of course, there are more complicated data structures like acustic tomography. We will address other observations types later.

The obs_lon and obs_lat are not used in the model directly. However, they are used in the pre-processing software to compute the fractional grid coordinates (nondimensional): obs_Xgrid and obs_Ygrid. In orthogonal curvilinear applications, this fractional coordinates are computed via interpolation. The coordinate search for each observation datum is not trivial and very expensive to do it inside the model over and over during descent iterations. The utility to compute these fractional coordinates include the hindices, try_range, and inside routines which are part of the model. The depth of observations, obs_depth, can be specified as an actual depth (negative value) or a model level (positive values from 1:N).

The interpolation weights matrix, Hmat(1:8,iobs), is as follows:

Code: Select all

                                8____________7
                                /.          /| (i2,j2,k2)
                               / .         / |
                             5/___________/6 |
                              |  .        |  |
                              |  .        |  |         Grid Cell
                              | 4.........|..|3
                              | .         |  /
                              |.          | /
                   (i1,j1,k1) |___________|/
                              1           2
The other variables are pretty straight forward. The data for NLmodel_value, TLmodel_value and Hmat are written and re-written in each iteration. Because of this, it is highly recommended to keep a backup copy of the original observation NetCDF file in case that you need to start all over.

Notice that the metadata model is generic. Several CDL file are provided to generate the observations NetCDF file:

Code: Select all

  roms_2dobs.cdl             2D Cartesian applications
  roms_2dobs_geo.cdl         2D spherical applications

  roms_3dobs.cdl             3D Cartesian applications
  roms_3dobs_geo.cdl         3D spherical applications
Here, the designation 2D and 3D applications is used in terms of the CPP switch SOLVE3D and not as the dimensionality of the observations. In 2D applications only four interpolation weights are needed where as in 3D applications eight weights are needed. Therefore, the dimension value for weight is assigned accordingly.

The above CDL files can downloaded from:

http://mwww.myroms.org/links/4dvar_cdl.tar.gz

Recall that a NetCDF file can be generated from a CDL file by simply typing:

ncgen -b roms_3dobs_geo.cdl

However, before doing so, edit the CDL file and provide the appropriate value for the survey dimension. This means that you need to know this value in advance. Then, any kind of software can be use to process the observation data and write it to the NetCDF file. I usually use Matlab and the NetCDF toolbox to read and write data very quickly with just few commands. The NetCDF toolbox for Matlab can be found at

http://woodshole.er.usgs.gov/operations ... excdf.html

Finally, the observations need to be processed using the following rules:

1) Need to know ahead of time how many assimilation cycles or survey times occur whitin the data set. That is, how many different observations times are available. This is not a big problem because its value is known when processing the observations. This value need to be entered in the survey dimension of the CDL file.

2) Count the observations available per survey time and write its value in Nobs(survey). Also write the associated survey time in survey_time(survey). These values are extremely important inside the model to process the appropriate data in each assimilation cycle.

3) The observations need to be sorted in ascending time order. It is also a good idea, but not necessary, to sort the observations according its state variable ID value (obs_type), so all similar data is processed together and sequentially.

Good luck,

Hernan G. Arango
arango@imcs.rutgers.edu

Post Reply