Most people who have used a digital mixer in the last ten years are familiar with incorporating networking technology into their audio application. Remote control over wireless LAN networks, proprietary audio-over-Ethernet protocols, and extensible audio networking platforms have all become relatively commonplace. As networking speed and reliability have increased, and the underlying technology has become more affordable, transporting audio over an Ethernet cable now offers dramatic savings of time and money, making it more attractive than ever.
While there are several protocols currently in use for audio networking, AVB has many unique benefits that have made it the protocol of choice for the latest generation of PreSonus® pro audio equipment. This article explains the basics of AVB networking, and much of the information here is relevant for other IEEE 1722.1-compliant AVB devices, in addition to supported PreSonus AVB products. PreSonus StudioLive® Series III console and rack mixers, NSB-series Stageboxes, and EarMix™ 16M Personal Monitor Mixers are fully compliant with the IEEE 1722.1 standard, which is the protocol for discovery, enumeration, connection management, and control of AVB devices, also known as AVDECC.
Note: Earlier generations of PreSonus AVB products (StudioLive RM-AI and RML-AI mixers, StudioLive CS18AI, and AI-series consoles equipped with the SL-AVB-MIX option card) are not 1722.1 AVDECC-compliant and can only be used with each other. These products are not compatible with IEEE 1722.1 devices like the StudioLive Series III mixers or other third-party AVB products that follow the 1722.1 AVDECC standard.
What is AVB?
AVB (Audio Video Bridging) is an extension to the Ethernet standard designed to provide guaranteed quality of service, which simply means that audio samples will reach their destinations on time. AVB allows you to create a single network for audio, video, and other data, such as control information, using an AVB-compatible switch. This allows you to mix normal network data and audio network data on the same network, making it easier to create both simple and complex networks. It has been adopted by numerous audio companies, and more companies are adding it all the time.
Audio-over-Ethernet has become increasingly attractive in pro audio applications, especially for distribution in large-scale systems, such as those used in sporting venues, concert halls, and educational institutions. The problem is that most solutions are proprietary, making these systems too expensive and too complex for most smaller applications. AVB is intended to change that by providing an open-source collection of IEEE standards for use by the pro audio market and its manufacturing community.
AVB networking offers several features that make it ideal for audio applications:
- Long, light cable runs. A single lightweight CAT5e or CAT6 cable can be run up to 100 meters (328 feet). This makes it easy to have audio I/O located in different rooms (or even different venues in the same building) and run multi-channel audio between them in real time.
- Low, predictable latency. AVB provides latency of no longer than 2 ms sending an audio stream point-to-point over up to seven “hops” (trips through switches or other devices).
- Scalable, with high channel counts. AVB’s bandwidth is sufficient to carry hundreds of real-time channels using a single Ethernet cable. This offers the future possibility of expanding your system with additional devices that contain different kinds of audio I/O, multiple controllers, and other useful functions.
- Integrated clock signal. In a digital audio system with multiple devices, having a master clock is critical to maintain audio fidelity. The AVB specification defines such a clock to be accurately distributed to all devices in the system.
How does AVB work?
On the simplest level, AVB works by reserving a portion of the available Ethernet bandwidth for its own traffic. Because packets of AVB data are sent regularly in allocated slots within the reserved bandwidth, there are no interruptions or interference, making AVB extremely reliable.
What makes AVB ideal for audio networking is that it splits network traffic into real-time traffic and everything else. All real-time traffic is transmitted on an 8 kHz pulse. Anything that’s not real-time traffic is then transmitted around that pulse. Every 125 μs, all real-time streams send their data. Other packets are transmitted when there is more real-time data ready to be transmitted. To make sure that there is enough bandwidth available for all prioritized real-time traffic, the Stream Reservation Protocol (SRP, IEEE 802.1Qat) is used.
Every AVB compliant switch between each talker and listener will then make sure sufficient bandwidth is available using SRP, making it a foundational building block of the AVB standard. Every switch and AVB device on the network must implement SRP and send real-time traffic at the 8 kHz pulse. If one of the devices on the network does not employ this standard, then real-time traffic could be potentially delayed, causing jitter in the output.
AVB Hardware Components
In an AVB network, every device to and from which audio is flowing must adhere to the AVB standard. These devices consist of the following types:
- AVB Talkers. These devices act as the source for an AVB stream, sending out audio onto the network.
- AVB Listeners. These devices are the destinations for the streams sent out by the talkers.
- AVB Switches. This is the network hub to which every talker and listener must be connected. At its most basic level, an AVB switch analyzes and prioritizes traffic on the network. It should be noted that just as there can be multiple talkers and listeners on the same AVB network, there can also be multiple AVB switches.
- AVB Controllers. A controller can be a talker, a listener, or neither. These devices handle routing, clock, and other settings for AVB devices using AVDECC.
The most important rule to keep in mind when setting up an AVB network is that the talker (device sending audio) and listener (device receiving audio) must be connected to an AVB-compatible switch.
All AVB devices on the network must share a virtual clock that defines when the AVB packet should be played. As previously mentioned, devices communicate on an AVB network as “talkers” and “listeners.” An AVB talker transmits one or more audio streams to the network. AVB listeners receive one or more of these streams from the network. It should be noted that an AVB device, like the StudioLive Series III mixers or NSB-series Stageboxes, can be both a talker and a listener. For example, StudioLive 32 can simultaneously “talk” (send channels out to the network) and “listen” (receive channels from the network).
AVB devices stay in sync by selecting the best master PTP clock after the devices connect with one another. This ensures that every AVB device on the network will maintain precise timing, which is critical to audio quality.
The AVB switch guarantees that real-time audio data packets maintain their timing without losing information. AVB switches do this by allowing a maximum of 75% of each port to be used for AVB traffic. This prevents non-AVB data from being delayed or lost.
When an AVB network is configured, the talkers and listeners identify one another automatically.
Timing is Everything
All the audio traffic on an AVB network is synchronized using a global clock so that audio can be played and recorded while remaining in time from multiple sources. Obviously, the more audio traffic on a network, the more critical this becomes. For users familiar with traditional digital audio devices (ADAT, S/PDIF, etc.) the idea of a global clocking device will not seem unfamiliar. PreSonus AVB devices have two clocks: one word clock and one PTP clock.
All AVB devices on the network are synchronized to a common reference time using the IEEE 802.1AS Precision Time Protocol (PTP). Each stream includes a presentation time that all the devices in the network use to align their playback by comparing the presentation time in each stream packet. An advantage to this design is that AVB networking supports multiple simultaneous sample rates and sample clock sources which is important for applications where audio and video need to be synchronized, even though they travel along different paths with different sample rates.
Analog audio is transferred through a cable as a continuous electrical waveform at almost the speed of light. Because of this, audio signal traveling from one analog audio device to another arrives nearly instantaneously, for all practical purposes. Therefore, you don’t have to synchronize analog audio passing from one analog device to other analog devices.
Transferring digital audio is a very different matter. Computers and other digital devices operate one step at a time, which happens very quickly but it’s not instantaneous, and digital signals are not inherently in perfect time. While uncompressed digital audio plays at a fixed rate (i.e., the sampling frequency), digital clocks are not perfect; their frequency can drift, and they almost always have at least some irregular errors, known as "jitter." Therefore, two devices, each following its own clock, are highly unlikely to stay in agreement about precisely when a sample starts and ends. The result is usually an artifact, like a pop or a glitch in the audio.
To avoid this problem, all digital devices in communication with one another need to follow a single master clock. That means the master clock must send a signal that essentially says, “everyone start at this moment and follow me!”
Even if the master clock’s timing is imperfect, all slave devices will follow the timing errors exactly and will stay in sync with each other, eliminating timing-related artifacts. In general, the better the master clock, the better the resulting audio will sound, so whenever possible, use the best clock you have, or experiment with your rig to find the best result.
Whenever digital audio devices are synchronized, it is necessary to designate one device as the “master” word clock device to which all other digital devices are synced, or “slaved.” Once you’ve determined which device is to be your master clock, you will need to sync the remaining digital devices.
Designating a master word clock is not handled by AVB and must be set with an AVB controller. This can either be done manually, by the user, or it is fixed by the firmware of the device. In the case of the latter (as when using a StudioLive Series III rackmount mixer as a stage box for a StudioLive Series III console mixer), this function is not exposed to the user and is set automatically.
Multiple unrelated word clocks can coexist on an AVB network. While there are a few ways to match word clocks between a talker and listener, PreSonus AVB devices only support recovering the word clock by listening to an AVB stream.
Precision Time Protocol (PTP)
Precision Time Protocol, or PTP, is used to synchronize clocks throughout a computer network. PTP is capable of achieving clock accuracy in the sub-microsecond range, making it suitable for local area networks that require tight timing. Similar to word clock, PTP uses a master-slave architecture for clock distribution. This protocol defines clock master, link delay, and network queuing (both measurement and compensation), as well as clock-rate matching and adjustments for Layer 2 network devices.
In this architecture, there are several different clock types:
- Ordinary Clock. This is a device with a single network connection and either the sync source (master) or the sync destination (slave).
- Boundary Clock. This device has multiple network connections and can accurately synchronize one network segment to another. A master clock is selected for each segment in the system using a root timing reference generated by the grandmaster clock. The grandmaster sends synchronization information to the all the clocks on its network segment. The boundary clocks on that segment then send accurate time to other network segments to which they’re connected. Every AVB talker is required to be capable of functioning as the grandmaster; however, any network node can be the grandmaster, as long as it can either source or derive timing from a grandmaster-capable device.
On an AVB network, PTP generates timestamps so that every listener knows when to playback the audio from the talker. In other words, the PTP clock is used to align audio samples from multiple sources in time. Each grandmaster-capable device broadcasts its clocking using announce messages. The best master clock is then selected from the available announce messages.
Multiple Stream Reservation Protocol (MSRP)
Multiple Stream Reservation Protocol (MSRP) is the IEEE standard used to reserve stream bandwidth for audio on an AVB network. This allows endpoints to automatically route data and reserve bandwidth and eliminates the need for the user to manually configure Quality of Service (QoS) across network devices. MSRP looks at the end-to-end bandwidth that is currently available before an audio stream is sent out. It then reserves a maximum of 75% of the total bandwidth available on that AVB switch’s port. If the bandwidth is available, it is locked down along the entire data path, from the talker out to every assigned listener, until it is released.
Bandwidth reservations are made based on talker and listener declarations on the switch ports. Talker declarations come in several forms:
- Advertise Declaration. This declaration announces that a stream doesn’t have any bandwidth or network constraints on its designated path. This means that any destined listener can create a reservation for QoS. Talker advertise messages contain all the information necessary to make the reservation.
- Failed Declaration. As its name indicates, this announces that a stream is not available to a listener because the necessary bandwidth is not available or because of other limitations somewhere in the network path between the talker and the destined listener.
As mentioned earlier, listeners also make declarations within an AVB network. These consist of the following types:
- Ready. One or more listeners are requesting a stream, and sufficient bandwidth and resources are available from the talker along the network path to every intended listener.
- Ready Fail. In this instance, one or more listeners are requesting a stream but not all of them have sufficient bandwidth and resources on the path to the talker. In other words, at least one listener has encountered an obstacle, and at least one listener has not.
- Fail. When a Fail declaration is sent, one or more listener has requested a stream, and every listener that has done so does not have sufficient bandwidth or resources in the network path to attach to the desired stream.
End-to-end stream reservation is successful as soon as the listener receives the talker’s Advertise Declaration, and the talker receives the listener’s Ready Declaration. AVB streams are only forwarded once a successful reservation is made. After the streams stop, switch resources are then released and the process can begin again or make room for other network traffic. In this way, the AVB standard ensures that audio data streams always have the highest QoS and all other data is secondary or tertiary.
SRP works with the 802.1Qav Queuing and Forwarding Protocol (Qav) to ensure that once bandwidth is reserved for an AVB stream, it is locked down from end to end. Qav schedule time-sensitive streaming information to minimize latency. Together, SRP and Qav make sure that all reserved media streams are delivered on time.
In this way, the AVB network has some intelligence as to how much non-media traffic, as well as how many media packets, are on the system at any given time. This means that on an AVB network, the worst-case travel time is known throughout the entire system. Because of this, only a small amount of buffering is needed, lowering latency to 2 ms over 7 switch hops on a 100 Mbps Ethernet network. On gigabit networks, even lower latencies can be achieved.
Channels and Streams
AVB streams can be thought of as the pipeline that carries a predefined number of channels between two or more AVB devices. In a PreSonus StudioLive Series III mixer, for example, there are seven input streams and seven output streams are available, each carrying eight channels.
It should be noted that each stream can carry any combination of eight channels.