Clock and Data Recovery/Buffer Memory (Elastic Buffer)/Cascades of Buffers and CDRs, delays and tolerance

TRANSIT DELAY inside the elastic buffer
The elastic buffer essential function is to add a delay to the information stream that, added to the jittered delay in the incoming stream, makes the total delay a fixed amount:
 * first the additional delay is calculated
 * and then it is added.

In general, the total delay is referred to a clock (the read clock) that is not necessarily synchronous with the incoming information stream. An example of elastic buffer is shown in the following figure:
 * The read operation is made sampling the output of the register stage chosen by the read pointer (=multiplexer).
 * The multiplexer is driven by the up and down counting of the write clock pulses minus the read clock pulses.
 * The pointer jumps between adjacent stages every time one of the two clocks changes the counter, but, as long as the time difference between corresponding pulses of the two clocks is constant, the pointer is always placed on the same stage of the shift register when the reading takes place.
 * If the write clock is faster, sooner or later it gives one more increment to the counter, and the reading takes place in the next stage (positive justification).
 * If the read clock is faster, sooner or later it gives one more decrement to the counter, and the reading takes place in the previous stage (negative justification).
 * When the end of the register is exceeded, a slip takes place and the counter is corrected (for instance to its mid point, or just slipping one step to come back within range, or..).

The following figure shows a particular application where the elastic buffer is used with synchronous clocks ( =from the same clock domain) and implements the function of a phase aligner. The bit stream enters the "Elastic memory" with the transmission line clock obtained after several regenerations, it is further delayed by the necessary amount, and gets out with the system clock. The bit stream is again perfectly synchronous with the system timings.

It is easy to measure the total delay in periods of the system clock at the output of the elastic memory, but it includes the delay added by the elastic buffer. The delay of the elastic buffer must be deducted, and the transmission delay exactly computed. (In the system of the figure above it would also include the transit time added by the 9 regenerators!).


 * It is important to understand that, inside the elastic buffer, the information of how much delay the buffer memory is adding in any given moment is available.


 * This point may be relevant when –for instance- it is necessary to measure the transmission line length measuring the transit time. In such case the time spent during the transit throughout the elastic buffer must be deducted.


 * In some cases the elastic buffer is written with blocks of bits at every write operation, and discharged by the same amount of bits at every read operation. In this case the frequency of the write and of the read clock is a corresponding submultiple of the incoming serial clock and of the output serial clock.
 * In those cases the write and read cycles treat an entire byte or an entire frame at each time. The elastic buffer is then a byte-aligner or a frame aligner.
 * It should also be noted that an elastic buffer is sometimes built by several sections that operate on parallel bits (words), but still at the line pulse frequency. One example can be the re-sync of truly parallel streams of bits received from a parallel interface, another can be where each received pulse of a serial transmission is represented by more than one bit, for subsequent processing in a DSP.

OVERFLOW and UNDERFLOW (SLIPS)
Two clocks that differ slightly in frequency (=that wander) with respect to each other, or that jitter significantly with respect to the one another,

will -from time to time- accumulate more phase difference than the buffer can absorb. Then an abrupt correction becomes necessary.

The resulting correction makes the output serial flow (or byte flow) “slip” with respect to the incoming flow.

An interval of bits disappears or is repeated, as well as the ensuing correction, is therefore called a slip.

When the reading takes place synchronously or before the corresponding writing cycle (reading clock faster than the writing clock), then an underflow occurs (with a repetition of some information elements).

If the reading clock is too slow with respect to the writing clock and is unable to read one element before the next arrives, then an overflow occurs (and some information elements are skipped = cancelled).

In case of over/underflow, the reading pointer is made slip by the number of reading operations that corresponds to the time (phase) interval to be added to (or subtracted from) the output serial flow.

The type of correction is different whether the elastic buffer is placed between different clock domains or inside the same clock domain (i.e. its clocks, writing and reading, belong to the same clock domain).


 * The second figure at the beginning of this page shows a case where two different clocks of the same clock domain (the system clock and its copy after many regenerations) are to be re-synchronized together.


 * This is not the most general case, because in other cases the two clocks may belong to different clock domains.

Elastic buffer slips

 * The buffer operates within one clock domain.

The clock used for transmission at the beginning of the long path is separated from the last regenerated clock by a fixed delay (transit times along the path) plus an a.c. jitter and wander.

There is no d.c. component of wander, because the frequencies of the two clocks are locked to the same master clock.

The writing clock into the elastic buffer (= the last regenerated clock of the long path) and its reading clock (something like the initial transmit clock, or a clock that has accumulated different delay, different wander and different jitter) are to be reconciled by the buffer.

The buffer accepts the writing clock as it is, but -owing to its "elastic" characteristic- shifts the reading clock to make the total delay between the two clocks (transit time plus buffer delay) a fixed value (that is an integral value of a minimum "buffer tolerance").

In this case the elastic buffer depth shall conceptually be at least as large as the maximum peak-to-peak value of the wander/jitter, plus twice its justification step (justification step = the phase step taken when adjusting the buffer delay during normal operation).

Before the writing clock is properly locked to the incoming signal, the buffer is kept frozen at its middle point. When it starts operation and correct phase lock is achieved, it is often so well centered that it will operate indefinitely without the need of any slip.

But the lock-in phase may be troubled by irregular transients, or severe disturbances may affect the transmission later on.

The average delay added by the buffer may then be well distant from the average value of the wander/jitter.

If the wander/jitter deviates then significantly from its average value, the buffer may reach its closer end. A slip would becomes necessary.
 * If the slip depth is less than the maximum wander/jitter peak, then the shifting of the reading clock does not reach far enough, and a jitter peak later on may again cause another slip.
 * An optimum centering of the buffer may require more than one slip. It may look like the elastic buffer simply saturates, in the sense that the reading clock pointer remains at the min or max value allowed, and that the reading clock itself overflows or under flows by small steps until the elastic buffer achieves a good "centering" of normal range of elasticity its around the position of "zero" jitter.

The first case of small depth offers less transit delay, but carries the risk of more slips. The second corrects efficiently at the first slip, but maintains a longer transit delay. In either case, the buffer depth may be further increased by considerations about the opportunity to execute a slip that preserves some framing characteristics of the payload, as the next subsection outlines.
 * If the slip depth is larger that the maximum wander/jitter peak, a slip would recenter the buffer enough to prevent future slips, as long as the wander/jitter stays within its maximum peak-to-peak range.


 * The buffer operates between two clock domains.
 * The two clocks do not share a link with a common clock source (=they belong to different clock domains).

The buffer action is complicated by the fact that they exhibit a frequency difference, although small, that makes the phase difference grow slowly but without limit. (The frequency difference generates a d.c. wander in addition to the a.c. wander and jitter).
 * Slips are inevitable, and the elastic buffer function is to make them rare enough at the expenses of the average transit delay (that is proportional to the minimum time between slips that has been chosen to set the buffer depth).
 * The minimum buffer depth must be, in addition to the requirement of the case before (maximum peak-to-peak value of the wander/jitter, plus twice its slip depth), also allow for d.c. wader, which means plus twice the depth corresponding to the minimum time between slips generated by the worst case d.c. wander.
 * The elastic buffer, of depth equal to n clock cycles, is at the boundary between different clock domains of frequencies f1 and f1-Δf. It will be subject to periodic slips (under or overflows), with a period approximately given by:

$ \tfrac{\left ( \tfrac{buffer\_depth \ - \ 1}{2} \right ) }{\Delta f}$
 * The buffer depth in this second case is bigger than in the case of a buffer inside a single clock domain.


 * Controlled slips (overflow or underflow execution with minimum inconvenience to the “payload”)
 * In some cases, when the incoming data flow is framed, if a slip is inevitable it may be convenient to slip (cancel or replicate) an entire frame, exactly from beginning to end (e.g. TDM frames of layer 1 synchronization).
 * Or, if the incoming data have a layer 2 packet structure, it may be convenient to drop or duplicate inter-frame idle characters (e.g. Ethernet inter-frame characters).
 * To slip an entire frame or character, when the reading clock gets close (e.g. within +/- 5% of the buffer size) to either end of the buffer and a slip appears inevitable within a short time, the decision to slip is taken, but it is put to effect a little later, in coincidence with the start of the next frame. Obviously, the buffer must be long at least 10% more than one entire frame, but this minimum requirement is usually largely exceeded.
 * The cost is not significantly different whether the buffer memory is organised by bit or by byte. The saving in transit time when organising by bit with respect to organising by byte, is not significant either, as the average delay is expected to be several bytes anyway.

Introduction
Tolerating input jitter and filtering out its high frequency part are conflicting requirements.

This is evident from the fact that a CDR model has one characteristic frequency that represents both:
 * the frequency beyond which the tolerance is reduced to its bare minimum and
 * the frequency beyond which the low pass rejection (of the same input jitter) becomes effective.

Increasing the characteristic frequency the tolerance improves but more jitter passes trough, and vice-versa.

Linear models are useful even beyond boundary of linearity
Linear mathematical models are -strictly speaking- valid only within the domain of linear systems.

Their usefulness however extends beyond that, because the behavior of any system including non-linearities can be described with good approximation by linear models, as long as the signal has small excursions around a fixed working point.

This is the often case for CDRs, when they are in locked mode with reasonably low input jitter.

" In locked mode, a SEC (SDH equipment slave clocks) will generally mimic the behaviour of a 2nd order linear analogue phase locked loop.

This allows the use of the terms (equivalent) 3 dB bandwidth and (equivalent) damping factor, as they are used in analog PLL theory,

irrespective of the fact that in the implementation of a SEC, digital and/or non-linear techniques may be used. "

'' It should be remarked that the Recommendation implicitly refers to a 2nd order PLL of type 1, not of type 2. ''

'' This is in accordance with the relative merits and preferred applications of the 2nd order loop models presented in this book, because the SEC equipment implements a sophisticated slave CDR and possibly the regeneration of a transmission link. ''

See also, for another instance, where the linear model is used in computing jitter and wander accumulation in a chain of identical regenerators.

In this page linear models of PLL will be used as a good approximation of real CDRs in locked mode, used in cascade, for their jitter transfer functions and characteristics, for their jitter tolerance functions and characteristics and for the accumulation of generated jitter.

The (approximate and simplified, useful in "normal" cases) rule is:
 * the jitter transfer characteristics (in dB) add for two stages in cascade;
 * the jitter tolerance characteristics of two stages in cascade are obtained by the AND of the two tolerance regions.

The jitter tolerance curves and masks look like one or more steps of a stairway
The following figure is a specifications of jitter tolerance in form of a mask, and is shaped pretty much like with a stairway with treads and risers!

A very similar masks is present in ITU-T Recommendation G.823 Fig.13 and similar ones in other ITU-T Recommendations.

The ITU-T explains that staircase input jitter tolerance masks are the result of requiring compliance of the input jitter with more "single-step" jitter tolerance masks.

To better understand how the stairway masks come about, reference can be made to the jitter tolerance functions derived from the mathematical models: [[File:Jitter tolerance phase aligners.png|thumb|center|800px|alt = Three magnitude functions of jitter tolerance for the three main loop models. For each the lateral eye opening is the same, as is the same the length of the elastic buffer (=phase adder).|Three magnitude functions of jitter tolerance for the three main loop models in the case of phase-aligner architectures. For each the lateral eye opening is the same, as is the same the length of the elastic buffer (=phase adder).

The two important 2nd order PLLs are plotted as well, in the under-damped but significant condition of &zeta; = 0.5.

The natural frequency is 500 KHz (&omega;n= 3.14 e6 rad/sec) and is the same for each of the three loops (yellow marker).]] Some general properties of the tolerance functions can be inferred from the figure above:
 * A flat tread corresponds to a elastic buffer. The horizontal y-axis value can be seen as the buffer margin (positive or negative, whichever is smaller), to absorb the actual wander or jitter.
 * A sloping riser (of asymptotic slope of 20 dB/dec or 40 dB/dec according to the PLL type being respectively 1st or 2nd type) corresponds to a PLL that tracks the input wander or jitter with a phase error that decreases with decreasing frequency correspondingly.


 * The ITU-T generally specifies its masks with risers of 20 dB/dec, that coincides with asymptotic slopes of type 1 systems, both 1st order or 2nd order (20 dB/dec). (A type 2 system (40 dB/dec) fits easier a mask riser (20 dB/dec)).


 * The corner frequency between a tread and the riser to its left is also the corner frequency of the jitter transfer curve of the corresponding PLL.
 * If a tread has a riser to its left, that buffer that it relates to has writing and reading clocks that are slave to the same master.
 * If the stairway ends to the left with a riser, a single clock domain is involved, and all PLLs (that are suggested in the case of a tolerance mask or modeled in the case of a tolerance function) are slaved to the incoming signal.
 * If the stairway ends to its left with a tread, two clock domains are involved, and the wander (or jitter) between them may make the largest buffer (the upper thread) spill with slips. (Note that this is the case of the phase aligners mask requirements and mathematical models, because the tolerance masks or tolerance functions do not require both clocks to belong to the same clock domain. Nonetheless, their practical application and operation is meant to be within a single clock domain.)

ITU-T jitter tolerance masks covering wander and jitter have two, three or four treads. (The 4 treads masks apply only to asynchronous tributary channels that "tunnel" SDH islands of higher clock rates, the other masks have two or three treads). For more details, see

( Note that a tolerance masks would not require that the actual implementation of a complying equipment follows the considerations listed. It simply implies that an implementation following them is possible )


 * The rightmost two treads (at higher frequencies, above 10 Hz) are related to the tolerance to jitter:
 * the rightmost is the tolerance (LEO) of the wideband CDRs in the regenerators along the path,
 * the next is the tolerance of the node de-jitterers (see below).
 * One or two steps to the left are in the wander range (below 10 Hz).
 * The (possible) first "plateau" to the left (= lowest frequency tread) takes into account all wander sources, in particular the clock wander if in "hold-over" (=free-running) mode of the SDH islands that are traversed by the tributary that the mask refers to.
 * The wander "plateau" (=tread) at higher frequencies takes into account the other causes (excluding the SDH islands clock wander, that does not contribute when plesiochronous channels are not involved) like:
 * bit stuffing to accommodate asynchronous multiplexing of signals;
 * Byte stuffing (pointer justifications demapping the tributaries;
 * Phase transients of the SDH Equipment Clock

The jitter transfer and tolerance of cascaded CDRs
The CDR circuit met by the signal received at the end of a long segment of the physical line (be it twisted pair, coax or fiber, followed by the input stages of the receiver that perform amplification, equalization and filtering) takes care of regenerating the clock with a large jitter tolerance, but -inevitably, with a correspondingly large jitter filtering bandwidth.


 * This is, for instance, the typical case for the CDR at the line input port of a regenerator equipment.


 * In a regenerator “equipment” there is much more than a single regenerator CDR circuit. and it is normally requested that the regenerator equipment "dejitterizes" all jitter above fC ≤ fp / 10000


 * In a regenerator (CDR or "equipment"), the frequency range of interest (jitter transfer) around its cut-off frequency is about +/- 2 decades


 * Both SONET and SDH have dual specifications for full and reduced jitter transfer/tolerance regenerator equipment, referred to respectively as Type A (normal) and Type B (reduced) regenerators. …,  but most of the present day interest is for the so called type A regenerator equipment which is the only one considered in recent Recommendations where only a reference can be found, about cautions to use when mixing type A and type B, and how to solve the  inter-networking implications, while the most recent Recommendations just consider implicitly the full spec (i.e. type A) regenerator equipment. 


 * A type A regenerator equipment for STM-64 and OT-2 (10 Gbps) shall respect 1 MHz max jitter transfer (fC/fBIT ≤ 1/10000) and 0.1 dB jitter peaking. This allows a chain of 50 regenerators with Q>=30. 

A chain of identical line regenerator equipment behaves much like a single regenerator, but with cumulated transfer function for the jitter entering the first regenerator, corresponding (if represented with dB magnitudes) to the sum of the individual transfer functions and with a jitter tolerance almost equal to the one of a single regenerator. (A more accurate simulation would account also for the jitter generated at each link of the chain, but would not produce drastically different conclusions.)

The following figure instead represents the general case of the tolerance and transfer curves of two CDRs with the same clock frequency but different characteristic frequencies, following each other in a simple cascade.
 * The characteristic frequencies (transfer and tolerance low-pass corner) are different in the two CDRs. The corner frequency of the transfer curve and the corner frequency of the tolerance curve a CDR coincide if the CDR follows a linear model. In a non linear CDR they are typically different. The figure refers to CDR linear models.
 * The cascade tolerates the amount of sinusoidal jitter that is tolerated by the first and then by the second CDR (the tolerance region of the cascade is the AND of the two regions of the individual CDRs).
 * When the cascaded CDRs are not linear, the CDR with the larger tolerance is located before the CDR with tighter tolerance.
 * Beware of possible overshoots of the transfer curves (gain peakings), that may add if the characteristic frequencies are close.

The line regenerators might track well even relatively high frequency jitter, if their corner frequency was high enough.

But a high corner frequency is  incompatible with the requirement of reduced jitter at the output of the node.Rec. ITU-T G.8251 (09/2010), IV.4 Jitter generation of regenerators using parallel serial conversion; pages 56..57 Regenerators using only one PLL, i.e., the clock recovery, may have requirements which could be contradictory. They have to perform some filtering and their bandwidth has to be large enough to fulfil the jitter tolerance requirement.

The jitter tolerance requires a bandwidth which has to be above the frequency where the first 1/f slope starts. This could lead to a relatively high jitter generation exceeding the maximum allowed value. Generally speaking it is not the purpose of the clock recovery to minimize jitter.

Main requirements for the clock recovery in order to optimize the bit error performance are: • keeping the sampling time for the data retiming flipflop independent of the clock frequency at the position of the optimum eye opening (e.g., by using an integrating control loop); • following the phase modulation of the incoming signal without deviating too much from the ideal sampling time (i.e., jitter tolerance); • generating a low intrinsic jitter in terms of peak-to-peak values which should not exceed a small portion of the usable eye opening. This last bullet point clearly does not contain any requirement regarding the spectral distribution of the intrinsic jitter. Unlike the measurement of jitter using band-limiting filters, the jitter generated in the clock recovery has to be considered without any filtering because it describes the deviation of the ideal sampling time. In the case of very high bit rates it could be a problem having a clock recovery which is optimized for error performance while not taking care of the output jitter. The concept to overcome this difficulty is the use of a serial parallel conversion where the incoming signal normally is converted into bytes. This so-called deserializer often uses a structure of 16 parallel bits. At this level, the frequency where the data processing can be done is reduced by a factor of 16. This allows the use of a phase-locked loop that performs a dejitterizer function with a reduced bandwidth. At the output of such a regenerator the only jitter is that of this PLL and the reasonably low jitter of the PLL performing the multiplexer function and multiplying the clock frequency by a factor of 16. These concepts allow for higher values of low frequency intrinsic jitter because of the narrower bandwidth of the dejitterizer function. This function filters these phase noise components in regenerators to such a degree that the accumulation in chains, defined in the HRM of Appendix III, does not exceed the network limit.

The equation modelling the VCO Power Spectral Density is a single zero first order function shown schematically in the following FIGURE           Figure IV.2-2 – Schematic of VCO power spectral density (the figure is intended to be a log-log plot; the actual curve would be 3 dB above the breakpoint at frequency fb). The frequency fb is given by: fb = f0/2Q where f0 is the line rate (oscillator frequency) and Q is the quality factor....

Cascaded CDRs for regeneration with phase alignment
The obvious case for using two cascaded CDRs is where the clock of a serial signal (that has traveled along a chain of serial links) meets a clock from the same master, and this second clock is affected by much less jitter (ideally zero jitter).

A first CDR is used to regenerated with large jitter tolerance and -inevitably- with poor jitter filtering.

As a clock with reduced jitter is available at the node, an elastic buffer, exactly like the one presented at the beginning of the page, can be used to eliminate all the jitter that has been accumulated with respect to the read (i.e. local) clock.




 * In this case the delay line can not compensate a d.c. wander, if this represents a drift beyond the delay line extremes.
 * This is shown by the limit of the combined tolerance at low frequencies, that is due to the elastic buffer.

An elastic buffer following a wideband slave CDR, like in the case just described, would:
 * filter ALL the jitter present in the incoming signal (meaning all the jitter of the incoming signal with respect to the read clock), but
 * be unable to track any d.c. wander (larger than Tol2 in the figure above) of the incoming signal with respect to the read clock.
 * The buffer is normally centered because it recenters after every slip outside its extremes.

This approach can be used when both clocks belong to a single clock domain, and the application is called a "phase aligner", that was defined earlier.

A read clock may reach the node using special protocols (PTP, Precision Time Protocols). The read clock would have accumulated almost no jitter with respect to the master clock of the clock domain and could be used to de-jitter the clock of CDR1.

Cascaded CDRs for regeneration with de-jitterizing
In other cases a clean clock for the reading may not available, and the requirements for jitter transfer and tolerance may still be incompatible with the performances of a single CDR (typical case of a line regenerator equipment).

The regenerator equipment of modern day networks is made up of a cascade of two clock extractions, because a single CDR cannot satisfy contemporarily both jitter tolerance and jitter transfer requirements.

The characteristic cut-off frequency of CDR2 is lower than CDR1, and sets the overall regenerator bandwidth. The resulting cascade of two PLLs is made of a first CDR of 2nd order and type 2, and a subsequent CR (i.e. Clock Recovery circuit, because the Data recovery has been already done by the first stage) of 2nd order and type 1.

If the circuit speed is not a concern, and if buffer memory is kept to a minimum, a possible structure to de-jitterize (i.e. a "de-jitterizer" ) is shown below: The second CDR is designed for a tighter transfer bandwidth and a larger high frequency tolerance, and is slave to the clock of the first: combined error_signal = P + 1/2 C, where P is the pointer cell index (integer, negative for a delay shorter than half register, 0 in the centre cell, etc.) and where C is the comparator output, ranging from -1 to +1.
 * 1) CDR2 tolerance at low frequencies is inferior to the one of CDR1, but is in general more than adequate;
 * 2) CDR2 tolerance at high frequencies is not affected by the LEO tolerance of CDR1, because CDR2 samples clock pulses that have already been regenerated: its tolerance is limited by the range of its own phase comparator, that is widened with special circuitry;
 * 3) the range of the phase comparator is the limit to high frequency jitter tolerance in CDR2; in the case of figure it has been widened combining:
 * 4) a normal comparator (± 0.5 U.I.) at the line frequency, with
 * 5) the added information of the pointer, in accordance to:

The CDR2, very much as any other CDR that jitters but keeps its locking, keeps its local clock jittering around the mid point of the interval covered by its comparator (apart from the residual sampling error, minimum for a type 1 loop, zero for a type 2 loop).

As the composite phase comparator is made as wide as the elastic buffer, in steady state this brings the pointer to the centre cell of the shift register, and the local clock to sampling from it after half a unit interval (U.I.).


 * The example shown above is a conceptual implementation of a de-jitterizer. Applications of de-jitterzers in geographical networks (based of chains of regenerator equipment) use a slightly different approach and structure for the elastic buffer implementation and for CDR2, but are based on the very same concepts.
 * CDR2 remains preferably of 2nd order and of type 1, primarily because its jitter transfer bandwidth (&omega;n2 in the figure below, that defines also the overall de-jitterizer bandwidth) is best controlled with this architecture (using an accurate VCO and a linear phase comparator).