Timers

CIME includes a copy of the General Purpose Timing Library (GPTL) and timers are placed throughout the CIME driver. CIME-driven models typically also have GPTL timers in their code and very detailed timing information can be obtained.

To enable timers, set the xml variable CHECK_TIMING to TRUE with ./xmlchange CHECK_TIMING=true, this will produce files in $CASEROOT/timing.

Controlling timers

User customization of timers is done via the xml variables TIMER_LEVEL and TIMER_DETAIL.

Level

Timer level using the XML variable TIMER_LEVEL: This is the maximum code stack depth of enabled timers.

Detail

Timer detail using the XML variable TIMER_DETAIL: This is the maximum detail level of enabled timers.

This is an integer indicating maximum detail level to profile. This xml variable is used to set the namelist variable timing_detail_limit. This namelist variable is used by perf_mod (in $CIMEROOT/src/share/timing/perf_mod.F90) to turn timers off and on depending on calls to the routine t_adj_detailf. If in the code a statement appears like t_adj_detailf(+1), then the current timer detail level is incremented by 1 and compared to the time_detail_limit obtained from the namelist. If the limit is exceeded then the timer is turned off.

Find-grain control

Further control of timers is then done via modifications of the prof_inparm namelists in the file drv_in. This is done via keyword-value settings in user_nl_cpl. As an example, if you want to set the namelist variable profile_barriers to .true., add the following line in your $CASEROOT/user_nl_cpl:

profile_barriers = .true.

Timing data

Every model run produces three types of timing output that you can examine:

Model

The timing output can be found in $CASEROOT/timing/$model_timing.$CASE.$datestamp.

This is the most useful way to quickly determine timing summaries across components The following describes the most important parts of this timing file:

An example timing file of this type is:

---------------- TIMING PROFILE ---------------------
Case        : b.e20.BHIST.f09_g17.20thC.297_02
LID         : 9459679.chadmin1.180517-114852
Machine     : cheyenne
Caseroot    : /glade/p/cesmdata/cseg/runs/cesm2_0/b.e20.BHIST.f09_g17.20thC.297_02
Timeroot    : /glade/p/cesmdata/cseg/runs/cesm2_0/b.e20.BHIST.f09_g17.20thC.297_02/Tools
User        : hannay
Curr Date   : Thu May 17 12:42:27 2018
grid        : a%0.9x1.25_l%0.9x1.25_oi%gx1v7_r%r05_g%gland4_w%ww3a_m%gx1v7
compset     : HIST_CAM60_CLM50%BGC-CROP_CICE_POP2%ECO_MOSART_CISM2%NOEVOLVE_WW3_BGC%BDRD
run_type    : hybrid, continue_run = FALSE (inittype = TRUE)
stop_option : nyears, stop_n = 1
run_length  : 365 days (364.958333333 for ocean)

component       comp_pes    root_pe   tasks  x threads instances (stride)
---------        ------     -------   ------   ------  ---------  ------
cpl = cpl        3456        0        1152   x 3       1      (1     )
atm = cam        3456        0        1152   x 3       1      (1     )
lnd = clm        2592        0        864    x 3       1      (1     )
ice = cice       864         864      288    x 3       1      (1     )
ocn = pop        768         1152     256    x 3       1      (1     )
rof = mosart     2592        0        864    x 3       1      (1     )
glc = cism       3456        0        1152   x 3       1      (1     )
wav = ww         96          1408     32     x 3       1      (1     )
esp = sesp       1           0        1      x 1       1      (1     )

total pes active           : 12960
mpi tasks per node               : 36
pe count for cost estimate : 4320

Overall Metrics:
Model Cost:            3541.30   pe-hrs/simulated_year
Model Throughput:        29.28   simulated_years/day

Init Time   :     242.045 seconds
Run Time    :    2951.082 seconds        8.085 seconds/day
Final Time  :       0.008 seconds

Actual Ocn Init Wait Time     :     768.737 seconds
Estimated Ocn Init Run Time   :       0.248 seconds
Estimated Run Time Correction :       0.000 seconds
(This correction has been applied to the ocean and total run times)

Runs Time in total seconds, seconds/model-day, and model-years/wall-day
CPL Run Time represents time in CPL pes alone, not including time associated with data exchange with other components

TOT Run Time:    2951.082 seconds        8.085 seconds/mday        29.28 myears/wday
CPL Run Time:     248.696 seconds        0.681 seconds/mday       347.41 myears/wday
ATM Run Time:    2097.788 seconds        5.747 seconds/mday        41.19 myears/wday
LND Run Time:     545.991 seconds        1.496 seconds/mday       158.24 myears/wday
ICE Run Time:     389.173 seconds        1.066 seconds/mday       222.01 myears/wday
OCN Run Time:    2169.399 seconds        5.944 seconds/mday        39.83 myears/wday
ROF Run Time:      42.241 seconds        0.116 seconds/mday      2045.41 myears/wday
GLC Run Time:       1.049 seconds        0.003 seconds/mday     82364.16 myears/wday
WAV Run Time:     517.414 seconds        1.418 seconds/mday       166.98 myears/wday
ESP Run Time:       0.000 seconds        0.000 seconds/mday         0.00 myears/wday
CPL COMM Time:   2464.660 seconds        6.752 seconds/mday        35.06 myears/wday

---------------- DRIVER TIMING FLOWCHART ---------------------
..........................

TIMING PROFILE is the first section in the timing output. It summarizes general timing information for the run. The total run time and cost are given in several metrics to facilitate analysis and comparisons with other runs. These metrics includ pe-hrs per simulated year (cost), simulated years per wall day (thoughput), seconds, and seconds per model day. The total run time for each component and the time for initialization of the model also are provided. These times are the aggregate over the total run and do not take into account any temporal or processor load imbalances.

DRIVER TIMING FLOWCHART is the second section in the timing output. It provides timing information for the driver in sequential order and indicates which processors are involved in the cost. Finally, the timings for the coupler are broken out at the bottom of the timing output file.

Component

The timing output can be found in $CASEROOT/timing/$model_timing_stats.$date.

Provides an overall detailed timing summary for each component, including the minimum and maximum of all the model timers.

Coupler

The timing out put can be found in $RUNDIR/cpl.log.$datestamp or $DOUT_S_ROOT/logs/cpl.log.$datestamp.

Contains the run time for each model day during the run and is output during the run. You can search for tStamp in the cpl.log file to see the information, which is useful for tracking down temporal variability in cost due to inherent model variability or to hardware. The model daily cost generally is pretty constant unless I/O is written intermittently, such as at the end of the month.

Advice on setting your wallclock time

When you look at the $model_timing.$CASE.$datestamp file for “Model Throughput”, you will find output like this:
Overall Metrics:
Model Cost: 327.14 pe-hrs/simulated_year (scale= 0.50)
Model Throughput: 4.70 simulated_years/day

The model throughput is the estimated number of model years that you can run in a wallclock day. Based on this, you can maximize your queue limit and change $STOP_OPTION and $STOP_N.

For example, say a model’s throughput is 4.7 simulated_years/day, and the maximum runtime limit on your machine is 12 hours. 4.7 model years/24 hours * 12 hours = 2.35 years. On the massively parallel computers, there is always some variability in how long it will take a job to run. On some machines, you may need to leave as much as 20% buffer time in your run to guarantee that jobs finish reliably before the time limit. For that reason, set your model to run only one model year/job. In this example, set your wallclock at 12 hours and invoke xmlchange in CASEROOT as shown here:

./xmlchange STOP_OPTION=nyears
./xmlchange STOP_N=1
./xmlchange REST_OPTION=nyears
./xmlchange REST_N=1