All the audio traffic on an AVB network is synchronized using a global clock so that audio can be played and recorded while remaining in time from multiple sources. Obviously, the more audio traffic on a network, the more critical this becomes. For users familiar with traditional digital audio devices (ADAT, S/PDIF, etc.) the idea of a global clocking device will not seem unfamiliar. PreSonus AVB devices have two clocks: one wordclock and one PTP clock.

All AVB devices on the network are synchronized to a common reference time using the IEEE 802.1AS Precision Time Protocol (PTP). Each stream includes a presentation time that all the devices in the network use to align their playback by comparing the presentation time in each stream packet. An advantage to this design is that AVB networking supports multiple simultaneous sample rates and sample clock sources which is important for applications where audio and video need to be synchronized, even though they travel along different paths with different sample rates.

Wordclock

Analog audio is transferred through a cable as a continuous electrical waveform at almost the speed of light. Because of this, audio signal traveling from one analog audio device to another arrives nearly instantaneously, for all practical purposes. Therefore, you don’t have to synchronize analog audio passing from one analog device to other analog devices.

Transferring digital audio is a very different matter. Computers and other digital devices operate one step at a time, which happens very quickly but it’s not instantaneous, and digital signals are not inherently in perfect time. While uncompressed digital audio plays at a fixed rate (i.e., the sampling frequency), digital clocks are not perfect; their frequency can drift, and they almost always have at least some irregular errors, known as jitter. Therefore, two devices, each following its own clock, are highly unlikely to stay in agreement about precisely when a sample starts and ends. The result is usually an artifact, like a pop or a glitch in the audio.

To avoid this problem, all digital devices in communication with one another need to follow a single primary clock. That means the primary clock must send a signal that essentially says, “everyone start at this moment and follow me!”

Even if the primary clock’s timing is imperfect, all the secondary devices will follow the timing errors exactly and will stay in sync with each other, eliminating timing-related artifacts. In general, the better the primary clock, the better the resulting audio will sound, so whenever possible, use the best clock you have, or experiment with your rig to find the best result.

Whenever digital audio devices are synchronized, it is necessary to designate one device as the “primary” wordclock device to which all other digital devices are synced. Once you’ve determined which device is to be your primary clock, you will need to sync the remaining digital devices.

The problem of designating a primary wordclock is not handled by AVB and must be set with an AVB controller. Depending on the device, this can be done manually, by the user, or managed automatically. For example, when setting up a StudioLive rack mixer as a stage box from a StudioLive console mixer, the rack mixer is automatically setup to sync from the console mixer’s media clock.

Multiple unrelated wordclocks can co-exist on an AVB network. While there are a few ways to match wordclocks between a talker and listener, PreSonus AVB devices currently only support recovering the media clock by listening to the first AVB stream.

Precision Time Protocol (PTP)

Precision Time Protocol or PTP is used to synchronize clocks throughout a computer network. PTP is capable of achieving clock accuracy in the sub-microsecond range, making it suitable for local area networks that require tight timing. Similar to wordclock, PTP uses a primary-secondary architecture for clock distribution. This protocol defines clock primary, link delay and network queuing (both measurement and compensation), as well as clock-rate matching and adjustments for Layer 2 network devices.

In this architecture, there are several different clock types:

  • Ordinary Clock. This is a device with a single network connection and either the sync source (primary) or the sync destination (secondary).
  • Boundary Clock. This device has multiple network connections and can accurately synchronize one network segment to another. A primary clock is selected for each segment in the system using a root timing reference generated by the grandmaster clock. The grandmaster sends synchronization information to the all the clocks on its network segment. The boundary clocks on that segment then send accurate time to other network segments to which they’re connected. Every AVB Talker is required to be capable of functioning as the grandmaster; however any network node can be the grandmaster, as long as it can either source or derive timing from a grandmaster-capable device.

On an AVB network, PTP generates timestamps so that every listener knows when to playback the audio from the talker. In other words, the PTP clock is used to align audio samples from multiple sources in time. Each grandmaster-capable device broadcasts its clocking using announce messages. The best primary clock is then selected from the available announce messages.

Multiple Stream Reservation Protocol (MSRP)

Multiple Stream Reservation Protocol (MSRP) is the IEEE standard used to reserve stream bandwidth for audio on an AVB network. This allows endpoints to automatically route data and reserve bandwidth and eliminates the need for the user to manually configure Quality of Service (QoS) across network devices. MSRP looks at the end-to-end bandwidth that is currently available before an audio stream is sent out. It then reserves a maximum of 75% of the total bandwidth available on that AVB switch’s port. If the bandwidth is available, it is locked down along the entire data path, from the Talker, out to every assigned Listener until it is released.

Bandwidth reservations are made based on talker and listener declarations on the switch ports. Talker declarations come in several forms:

  • Advertise Declaration. This declaration announces that a stream doesn’t have any bandwidth or network constraints on its designated path. This means that any destined Listener can create a reservation for QoS. Talkers advertise messages contain all the information necessary to make the reservation.
  • Failed Declaration. As its name indicates, this announces that a stream is not available to a Listener because the necessary bandwidth is not available or because of other limitations somewhere in the network path between the Talker and the destined Listener.

Listeners also make declarations within an AVB network. These consist of the following types:

  • Ready. One or more Listeners are requesting a stream and sufficient bandwidth and resources are available from the Talker along the network path to every intended Listener.
  • Ready Fail. In this instance, one or more Listeners are requesting a stream, but not all of them have sufficient bandwidth and resources on the path to the Talker. In other words, at least one Listener has encountered an obstacle and at least one Listener has not.
  • Fail. When a Fail declaration is sent, one or more Listener has requested a stream and every Listener that has done so does not have sufficient bandwidth or resources in the network path to attach to the desired stream.

End-to-end stream reservation is successful as soon as the Listener receives the Talker’s Advertise Declaration and the Talker receives the Listener’s Ready Declaration.

AVB stream reservation process:

AVB streams are only forwarded once a successful reservation is made. After the streams stop, switch resources are then released and the process can begin again or make room for other network traffic. In this way, the AVB standard ensures that audio data streams always have the highest QoS and all other data is secondary or tertiary.