1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
|
= Performance Metrics
include::include-html.ad[]
== Related Links
include::includes/special.adoc[]
include::includes/external.adoc[]
== Table of Contents
* link:#intro[Introduction]
* link:#budget[Statistics Summary]
* link:#quality[Quality of Service]
'''''
[[intro]]
== Introduction
This page describes several statistics provided in the NTP specification
and reference implementation and how they determine the accuracy and
error measured during routine and exceptional operation. These
statistics provide the following information.
* Nominal estimate of the server clock time relative to the client clock
time. This is called _clock offset_ symbolized by the Greek letter θ.
* Roundtrip system and network delay measured by the on-wire protocol.
This is call _roundtrip delay_ symbolized by the Greek letter δ.
* Potential clock offset error due to the maximum uncorrected system
clock frequency error. This is called _dispersion_ symbolized by the
Greek letter ε.
* Expected error, consisting of the root mean square (RMS) nominal clock
offset sample differences in a sliding window of several samples. This
is called _jitter_ symbolized by the Greek letter φ.
Figure 1 shows how the various measured statistics are collected and
compiled to calibrate NTP performance.
image:pic/stats.gif[]
Figure 1. Statistics Budget
The data represented in boxes labeled Server are contained in fields in
packet received from the server. The data represented in boxes labeled
Peer are computed by the on-wire protocol, as described below. The
algorithms of the box labeled Selection and Combining Algorithms process
the peer data to select a system peer. The System box represents summary
data inherited from the system peer. These data are available to
application programs and dependent downstream clients.
[[budget]]
== Statistics Summary
Each NTP synchronization source is characterized by the offset θ and
delay δ samples measured by the on-wire protocol, as described on the
link:warp.html[How NTP Works] page. In addition, the dispersion ε sample
is initialized with the sum of the source precision ρ~R~ and the client
precision ρ (not shown) as each source packet is received. The
dispersion increases at a rate of 15 μs/s after that. For this purpose,
the precision is equal to the latency to read the system clock. The
offset, delay and dispersion are called the sample statistics.
[NOTE]
In very fast networks where the client clock frequency is not
within 1 PPM or so of the server clock frequency, the roundtrip
delay may have small negative values. This is usually a temporary
condition when the client is first started. When using the roundtrip
delay in calculations, negative values are assumed zero.
In a window of eight (offset, delay, dispersion) samples, the algorithm
described on the link:filter.html[Clock Filter Algorithm] page selects
the sample with minimum delay, which generally represents the most
accurate offset statistic. The selected offset sample determines the
_peer offset_ and _peer delay_ statistics. The _peer dispersion_ is a
weighted average of the dispersion samples in the window. These
quantities are recalculated as each update is received from the source.
Between updates, both the sample dispersion and peer dispersion continue
to grow at the same rate, 15 μs/s. Finally, the _peer jitter_ φ is
determined as the RMS differences between the offset samples in the
window relative to the selected offset sample. The peer statistics are
recorded by the +peerstats+ option of the
link:monopt.html#filegen[+filegen+] command. Peer variables are
displayed by the +rv+ command of the link:ntpq.html#peer[+ntpq+]
program.
The clock filter algorithm continues to process updates in this way
until the source is no longer reachable. Reachability is determined by
an eight-bit shift register, which is shifted left by one bit as each
poll packet is sent, with 0 replacing the vacated rightmost bit. Each
time a valid update is received, the rightmost bit is set to 1. The
source is considered reachable if any bit is set to 1 in the register;
otherwise, it is considered unreachable. When a source becomes
unreachable, a dummy sample with "infinite" dispersion is inserted in
the filter window at each poll, thus displacing old samples. This causes
the peer dispersion to increase eventually to infinity.
The composition of the source population and the system peer selection
is redetermined as each update from each source is received. The system
peer and system variables are determined as described on the
link:prefer.html[Mitigation Rules and the +prefer+ Keyword] page. The
system variables Θ, Δ, Ε and Φ are updated from the system peer
variables of the same name and the system stratum set one greater than
the system peer stratum. The system statistics are recorded by the
+loopstats+ option of the link:monopt.html#filegen[+filegen+] command.
System variables are displayed by the +rv+ command of the
link:ntpq.html#system[+ntpq+] program.
Although it might seem counterintuitive, a cardinal rule in the
selection process is, once a sample has been selected by the clock
filter algorithm, older samples are no longer selectable. This applies
also to the clock select algorithm. Once the peer variables for a source
have been selected, older variables of the same or other sources are no
longer selectable. The reason for these rules is to limit the time delay
in the clock discipline algorithm. This is necessary to preserve the
optimum impulse response and thus the risetime and overshoot.
This means that not every sample can be used to update the peer
variables, and up to seven samples can be ignored between selected
samples. This fact has been carefully considered in the discipline
algorithm design with due consideration for feedback loop delay and
minimum sampling rate. In engineering terms, even if only one sample in
eight survives, the resulting sample rate is twice the Nyquist rate at
any time constant and poll interval.
[[quality]]
== Quality of Service
This section discusses how an NTP client determines the system
performance using a peer population including reference clocks and
remote servers. This is determined for each peer from two statistics,
_peer jitter_ and _root distance._ Peer jitter is determined from
various jitter components as described above. It represents the expected
error in determining the clock offset estimate. Root distance represents
the maximum error of the estimate due to all causes.
The root distance statistic is computed as one-half the _root delay_ of
the primary source of time; i.e., the reference clock, plus the _root
dispersion_ of that source. The root variables are included in the NTP
packet header received from each source. At each update the root delay
is recomputed as the sum of the root delay in the packet plus the peer
delay, while the root dispersion is recomputed as the sum of the root
dispersion in the packet plus the peer dispersion.
[NOTE]
In order to avoid timing loops, the root distance is adjusted to
the maximum of the above computation and a _minimum threshold._ The
minimum threshold defaults to 1 ms, but can be changed according to
client preference using the +mindist+ option of the
link:miscopt.html#tos[+tos+] command.
A source is considered selectable only if its root distance is less than
the _select threshold_, by default 1.5 s, but can be changed according
to client preference using the +maxdist+ option of the
link:miscopt.html#tos[+tos+] command. When an upstream server loses all
sources, its root distance apparent to dependent clients continues to
increase. The clients are not aware of this condition and continue to
accept synchronization as long as the root distance is less than the
select threshold.
The root distance statistic is used by the select, cluster and
mitigation algorithms. In this respect, it is sometimes called the
_synchronization distance_ often shortened simply to _distance_. The
root distance is also used in the following ways.
* Root distance defines the maximum error of the clock offset estimate
due to all causes as long as the source remains reachable.
* Root distance defines the upper and lower limits of the correctness
interval. This interval represents the maximum clock offset for each of
possibly several sources. The clock select algorithm computes the
intersection of the correctness intervals to determine the truechimers
from the selectable source population.
* Root distance is used by the clock cluster algorithm as a weight
factor when pruning outliers from the truechimer population.
* The (normalized) reciprocal of the root distance is used as a weight
factor by the combine algorithm when computing the system clock offset
and system jitter.
* Root distance is used by the mitigation algorithm to select the system
peer from among the cluster algorithm survivors.
The root distance thus functions as a metric in the selection and
weighting of the various available sources. The strategy is to select
the system peer as the source with the minimum root distance and thus
the minimum maximum error. The reference implementation uses the
Bellman-Ford algorithm described in the literature, where the goal is to
minimize the root distance. The algorithm selects the _system peer_,
from which the system root delay and system root dispersion are
inherited.
The algorithms described on the link:prefer.html[Mitigation Rules and
the +prefer+ Keyword] page deliver several important statistics. The
_system offset_ and _system jitter_ are weighted averages computed by
the clock combine algorithm. System offset is best interpreted as the
maximum-likelihood estimate of the system clock offset, while system
jitter, also called estimated error, is best interpreted as the expected
error of this estimate. _System delay_ is the root delay inherited from
the system peer, while _system dispersion_ is the root dispersion plus
contributions due to jitter and the absolute value of the system offset.
The maximum system error, or _system distance_, is computed as one-half
the system delay plus the system dispersion. In order to simplify
discussion, certain minor contributions to the maximum error statistic
are ignored. If the precision time kernel support is available, both the
estimated error and maximum error are reported to user programs via the
+ntp_adjtime()+ kernel system call. See the link:kern.html[Kernel Model
for Precision Timekeeping] page for further information.
'''''
include::includes/footer.adoc[]
|