4. Design Details

4.1. Data Model Performance

There are two primary costs associated with strdata: reading data and spatially mapping data. Time interpolation is relatively cheap in the current implementation. As much as possible, redundant operations are minimized. Fill and mapping weights are generated at initialization and saved. The upper and lower bound mapped input data is saved between time steps to reduce mapping costs in cases where data is time interpolated more often than new data is read. If the input data timestep is relatively small (for example, hourly data as opposed to daily or monthly data) the cost of reading input data can be quite large. Also, there can be significant variation in cost of the data model over the coarse of the run, for instance, when new inputdata must be read and interpolated, although it’s relatively predictable. The present implementation doesn’t support changing the order of operations, for instance, time interpolating the data before spatial mapping. Because the present computations are always linear, changing the order of operations will not fundamentally change the results. The present order of operations generally minimizes the mapping cost for typical data model use cases.

4.2. Data Model Limitations

There are several limitations in both options and usage within the data models at the present time. Spatial interpolation can only be performed from a two-dimensional latitude-longitude input grid. The target grid can be arbitrary but the source grid must be able to be described by simple one-dimensional lists of longitudes and latitudes, although they don’t have to be equally spaced.

4.3. IO Through Data Models

At the present time, data models can only read netcdf data, and IO is handled through either standard netcdf interfaces or through the PIO library using either netcdf or pnetcdf. If standard netcdf is used, global fields are read and then scattered one field at a time. If PIO is used, then data will be read either serially or in parallel in chunks that are approximately the global field size divided by the number of IO tasks. If pnetcdf is used through PIO, then the pnetcdf library must be included during the build of the model. The pnetcdf path and option is hardwired into the Macros.make file for the specific machine. To turn on pnetcdf in the build, make sure the Macros.make variables PNETCDF_PATH, INC_PNETCDF, and LIB_PNETCDF are set and that the PIO CONFIG_ARGS sets the PNETCDF_PATH argument.

Beyond just the option of selecting IO with PIO, several namelist variables are available to help optimize PIO IO performance. Those are TODO - list these. The total mpi tasks that can be used for IO is limited to the total number of tasks used by the data model. Often though, using fewer IO tasks results in improved performance. In general, [io_root + (num_iotasks-1)*io_stride + 1] has to be less than the total number of data model tasks. In practice, PIO seems to perform optimally somewhere between the extremes of 1 task and all tasks, and is highly machine and problem dependent.

4.4. Restart Files

Restart files are generated automatically by the data models based on a flag sent from the driver. The restart files must meet the CIME naming convention and an rpointer file is generated at the same time. An rpointer file is a restart pointer file which contains the name of the most recently created restart file. Normally, if restart files are read, the restart filenames are specified in the rpointer file. Optionally though, there are namelist variables such as restfilm to specify the restart filenames via namelist. If those namelist variables are set, the rpointer file will be ignored.

In most cases, no restart file is required for the data models to restart exactly. This is because there is no memory between timesteps in many of the data model science modes. If a restart file is required, it will be written automatically and then must be used to continue the previous run.

There are separate stream restart files that only exist for performance reasons. A stream restart file contains information about the time axis of the input streams. This information helps reduce the startup costs associated with reading the input dataset time axis information. If a stream restart file is missing, the code will restart without it but may need to reread data from the input data files that would have been stored in the stream restart file. This will take extra time but will not impact the results.

4.5. Data Structures

The data models all use three fundamental routines.

  • $CIMEROOT/src/utils/shr_dmodel_mod.F90

  • $CIMEROOT/src/utils/shr_stream_mod.F90

  • $CIMEROOT/src/utils/shr_strdata.F90

These routines contain three data structures that are leveraged by all the data model code.

The most basic type, shr_stream_fileType is contained in shr_stream_mod.F90 and specifies basic information related to a given stream file.

type shr_stream_fileType
   character(SHR_KIND_CL) :: name = shr_stream_file_null     ! the file name
   logical                :: haveData = .false.              ! has t-coord data been read in?
   integer  (SHR_KIND_IN) :: nt = 0                          ! size of time dimension
   integer  (SHR_KIND_IN),allocatable :: date(:)             ! t-coord date: yyyymmdd
   integer  (SHR_KIND_IN),allocatable :: secs(:)             ! t-coord secs: elapsed on date
end type shr_stream_fileType

The following type, shr_stream_streamType contains information that encapsulates the information related to all files specific to a target stream. These are the list of files found in the domainInfo and fieldInfo blocks of the target stream description file (see the overview of the Stream Description File).

type shr_stream_streamType
   !private                                    ! no public access to internal components
   !--- input data file names and data ---
   logical                   :: init           ! has stream been initialized?
   integer  (SHR_KIND_IN),pointer :: initarr(:) => null()! surrogate for init flag
   integer  (SHR_KIND_IN)    :: nFiles         ! number of data files
   character(SHR_KIND_CS)    :: dataSource     ! meta data identifying data source
   character(SHR_KIND_CL)    :: filePath       ! remote location of data files
   type(shr_stream_fileType), allocatable :: file(:) ! data specific to each file

   !--- specifies how model dates align with data dates ---
   integer(SHR_KIND_IN)      :: yearFirst      ! first year to use in t-axis (yyyymmdd)
   integer(SHR_KIND_IN)      :: yearLast       ! last  year to use in t-axis (yyyymmdd)
   integer(SHR_KIND_IN)      :: yearAlign      ! align yearFirst with this model year
   integer(SHR_KIND_IN)      :: offset         ! offset in seconds of stream data
   character(SHR_KIND_CS)    :: taxMode        ! cycling option for time axis

   !--- useful for quicker searching ---
   integer(SHR_KIND_IN) :: k_lvd,n_lvd         ! file/sample of least valid date
   logical              :: found_lvd           ! T <=> k_lvd,n_lvd have been set
   integer(SHR_KIND_IN) :: k_gvd,n_gvd         ! file/sample of greatest valid date
   logical              :: found_gvd           ! T <=> k_gvd,n_gvd have been set

   !---- for keeping files open
   logical                 :: fileopen         ! is current file open
   character(SHR_KIND_CL)  :: currfile         ! current filename
   type(file_desc_t)       :: currpioid        ! current pio file desc

   !--- stream data not used by stream module itself ---
   character(SHR_KIND_CXX):: fldListFile       ! field list: file's  field names
   character(SHR_KIND_CXX):: fldListModel      ! field list: model's field names
   character(SHR_KIND_CL) :: domFilePath       ! domain file: file path of domain file
   character(SHR_KIND_CL) :: domFileName       ! domain file: name
   character(SHR_KIND_CS) :: domTvarName       ! domain file: time-dim var name
   character(SHR_KIND_CS) :: domXvarName       ! domain file: x-dim var name
   character(SHR_KIND_CS) :: domYvarName       ! domain file: y-dim var name
   character(SHR_KIND_CS) :: domZvarName       ! domain file: z-dim var name
   character(SHR_KIND_CS) :: domAreaName       ! domain file: area  var name
   character(SHR_KIND_CS) :: domMaskName       ! domain file: mask  var name

   character(SHR_KIND_CS) :: tInterpAlgo       ! Algorithm to use for time interpolation
   character(SHR_KIND_CL) :: calendar          ! stream calendar
end type shr_stream_streamType

Finally, the shr_strdata_type is the heart of the CIME data model implemenentation and contains information for all the streams that are active for the target data model. The first part of the shr_strdata_type is filled in by the namelist values read in from the namelist group (see the stream data namelist section).

 type shr_strdata_type
   ! --- set by input namelist ---
  character(CL)  :: dataMode          ! flags physics options wrt input data
  character(CL)  :: domainFile        ! file   containing domain info
  character(CL)  :: streams (nStrMax) ! stream description file names
  character(CL)  :: taxMode (nStrMax) ! time axis cycling mode
  real(R8)       :: dtlimit (nStrMax) ! dt max/min limit
  character(CL)  :: vectors (nVecMax) ! define vectors to vector map
  character(CL)  :: fillalgo(nStrMax) ! fill algorithm
  character(CL)  :: fillmask(nStrMax) ! fill mask
  character(CL)  :: fillread(nStrMax) ! fill mapping file to read
  character(CL)  :: fillwrit(nStrMax) ! fill mapping file to write
  character(CL)  :: mapalgo (nStrMax) ! scalar map algorithm
  character(CL)  :: mapmask (nStrMax) ! scalar map mask
  character(CL)  :: mapread (nStrMax) ! regrid mapping file to read
  character(CL)  :: mapwrit (nStrMax) ! regrid mapping file to write
  character(CL)  :: tintalgo(nStrMax) ! time interpolation algorithm
  integer(IN)    :: io_type           ! io type, currently pnetcdf or netcdf

  !--- data required by cosz t-interp method, ---
  real(R8)     :: eccen   ! orbital eccentricity
  real(R8)     :: mvelpp  ! moving vernal equinox long
  real(R8)     :: lambm0  ! mean long of perihelion at vernal equinox (radians)
  real(R8)     :: obliqr  ! obliquity in degrees
  integer(IN)  :: modeldt ! data model dt in seconds (set to the coupling frequency)

  ! --- data model grid, public ---
  integer(IN)     :: nxg          ! data model grid lon size
  integer(IN)     :: nyg          ! data model grid lat size
  integer(IN)     :: nzg          ! data model grid vertical size
  integer(IN)     :: lsize        ! data model grid local size
  type(mct_gsmap) :: gsmap        ! data model grid global seg map
  type(mct_ggrid) :: grid         ! data model grid ggrid
  type(mct_avect) :: avs(nStrMax) ! data model stream attribute vectors

  ! --- stream specific arrays, stream grid ---
  type(shr_stream_streamType)    :: stream(nStrMax)
  type(iosystem_desc_t), pointer :: pio_subsystem => null()
  type(io_desc_t)    :: pio_iodesc(nStrMax)
  integer(IN)        :: nstreams          ! actual number of streams
  integer(IN)        :: strnxg(nStrMax)   ! stream grid lon sizes
  integer(IN)        :: strnyg(nStrMax)   ! stream grid lat sizes
  integer(IN)        :: strnzg(nStrMax)   ! tream grid global sizes
  logical            :: dofill(nStrMax)   ! true if stream grid is different from data model grid
  logical            :: domaps(nStrMax)   ! true if stream grid is different from data model grid
  integer(IN)        :: lsizeR(nStrMax)   ! stream local size of gsmapR on processor
  type(mct_gsmap)    :: gsmapR(nStrMax)   ! stream global seg map
  type(mct_rearr)    :: rearrR(nStrMax)   ! rearranger
  type(mct_ggrid)    :: gridR(nStrMax)    ! local stream grid on processor
  type(mct_avect)    :: avRLB(nStrMax)    ! Read attrvect
  type(mct_avect)    :: avRUB(nStrMax)    ! Read attrvect
  type(mct_avect)    :: avFUB(nStrMax)    ! Final attrvect
  type(mct_avect)    :: avFLB(nStrMax)    ! Final attrvect
  type(mct_avect)    :: avCoszen(nStrMax) ! data assocaited with coszen time interp
  type(mct_sMatP)    :: sMatPf(nStrMax)   ! sparse matrix map for fill on stream grid
  type(mct_sMatP)    :: sMatPs(nStrMax)   ! sparse matrix map for mapping from stream to data model grid
  integer(IN)        :: ymdLB(nStrMax)    ! lower bound time for stream
  integer(IN)        :: todLB(nStrMax)    ! lower bound time for stream
  integer(IN)        :: ymdUB(nStrMax)    ! upper bound time for stream
  integer(IN)        :: todUB(nStrMax)    ! upper bound time for stream
  real(R8)           :: dtmin(nStrMax)
  real(R8)           :: dtmax(nStrMax)

  ! --- internal ---
  integer(IN)        :: ymd  ,tod
  character(CL)      :: calendar          ! model calendar for ymd,tod
  integer(IN)        :: nvectors          ! number of vectors
  integer(IN)        :: ustrm (nVecMax)
  integer(IN)        :: vstrm (nVecMax)
  character(CL)      :: allocstring
end type shr_strdata_type