Requested revision
Standard: | 802.1AS-2011 | Clause: | 11 |
Clause title: | MDPdelayReq State Machine (Figure 11-8) |
Rationale for revision
There exists numerous scenarios wherein asCapable, once set TRUE, can
immediately be set FALSE with little or no delay.
A time sensitive network utilizing 802.1AS's gPTP is likely to be
extremely sensitive to any spurious indication of asCapable being FALSE
in an otherwise operational environment. Consider an AVB network
utilizing MSRP, per IEEE 802.1BA, the existing stream reservations
would be torn down if a system allowed asCapable to be set FALSE
momentarily (as port would be an SRPdomainBoundary).
Three prime causes of this "hair trigger":
Consider the WAITING_FOR_PDELAY_INTERVAL_TIMER of the MDPdelayReq State
Machine (Figure 11-8).
Specifically the IF-ELSE statement quoted below:
if ((neighborPropDelay <= neighborPropDelayThresh) &&
(rcvdPdelayRespPtr->sourcePortIdentity.clockIdentity != thisClock) &&
neighborRateRatioValid)
asCapable = TRUE;
else
asCapable = FALSE;
Thus, if at any time during normal path delay packet exchange, an
implementation ever computes neighborPropDelay to exceed
neighborPropDelayThresh, then asCapable will immediately be set false.
The computation of neighborPropDelay is not set by the standard
(10.2.4.7, end of 11.1.2 "any scheme...is acceptable"), and thus may (or
may not) have suitable averaging and error handling to limit spurious
values.
Similarly, the determination of neighborRateRatioValid is not set by the
standard (11.2.15.1.11, 11.2.15.2.3 - "Any scheme ... is acceptable"),
and thus may result in an implementation momentarily setting this value
to FALSE.
Finally, a single received PdelayResp with an improper clockIdentity
would cause asCapable to be FALSE.
Beyond these three immediate "hair triggers", there may be value in
greater hysteresis to prevent asCapable from being "quickly" reset to
FALSE once it has been set to TRUE.
Currently, when a Pdelay_Req is sent, but a response is not received
(lost) or received but is invalid (still considered "lost"), then
asCapable will eventually be set to FALSE.
Specifically in the RESET state of Figure 11-8, asCapable will be set to
FALSE as soon as lostResponses > allowedLostResponses.
allowedLostResponses is defined to be 3 (11.5.3). lostResponses is
initialized to zero by the state machine, and incremented only after the
value is checked, hence on the fourth lost/invalid response to a
Pdelay_Req, asCapable will be set to FALSE.
Proposed text
These possible remedies are only this commenter's observations and
require further study, ongoing within the IEEE 802.1ASbt working group.
No strong recommendation is given; however, two complementary
possibilities are explored below.
Specific to the allowedLostResponses hysteresis issue:
The behavior could be altered to increase the hysteresis by changing the
allowedLostResponses value to a value substantially greater than 3, but
only if asCapable has been set for "a time" - but this must be balanced
with the need to rapidly detect a link-local issue and fail over to an
alternate network path (if available).
A trivial approach may address this by defining a boolean
"asCapableIsStable" that is set to TRUE in the
WAITING_FOR_PDELAY_INTERVAL_TIMER if asCapable=TRUE for 60 consecutive
times. Once asCapableIsStable is TRUE, then the value of
allowedLostResponses could be increased from 3 to 10 (and restored to 3
when asCapableIsStable is FALSE. This approach has the problem of
greatly increasing the delay before asCapable is FALSE in the face of a
true problem with Pdelay exchanges.
=======
Specific to the hair-trigger behavior behavior in
WAITING_FOR_PDELAY_INTERVAL_TIMER. This state could be amended to
preclude the possibility of asCapable being set FALSE immediately if one
of the three "immediate" hair-triggers is encountered. This could be as
simple as adding a counter "detectedFaults" similar to
"allowedLostResponses"
For example, asCapable could be set to FALSE only if 3 faults are
detected in a row, by changing the else statement in this state:
From just "asCapable=FALSE;" to an if/else statement as follows:
if ((neighborPropDelay <= neighborPropDelayThresh) &&
(rcvdPdelayRespPtr->sourcePortIdentity.clockIdentity != thisClock) &&
neighborRateRatioValid)
{
asCapable = TRUE;
detectedFaults = 0;
}
else
{
if (detectedFaults <= allowedFaults)
detectedFaults += 1;
else
{
asCapable = FALSE;
detectedFaults = 0;
}
}
where detectedFaults is properly initialized to zero in
INITIAL_SEND_PDELAY_REQ and allowedFaults is defined, perhaps to 3 to be
consistent with the current allowedLostResponses. This solution could
also be adjusted to not require the faults to be "in a row", however if
this adjustment were pursued, a window would have to be defined (eg: "3
faults in 60" would cause asCapable to be FALSE )
======
A better approach may be to still allow for a rapid fault detection
mechanism to allow asCapable to be set FALSE when an issue is
encountered, but provide stronger guidance on the neighborPropDelay and
neighborRateRatioValid. Perhaps testable minimum requirements such as
requiring neighborPropDelay outlier rejection and/or averaging
mechanisms and requirements over certain windows.
Impact on existing networks
Any changes that address the raised hair-trigger issues should only
improve stability of the overall time sensitive network, balanced with
application needs for rapid start up and rapid fault detection and
fail-over.
Originator
Name: | Bob Noseworthy | Email: | ren@iol.unh.edu |
Affiliation: | University of New Hamsphire's InterOperability Lab | ||
Submitted: | 2014-07-15 |