The Truth About Digital Audio Latency
By Wesley Elianna Smith
In the audio world, “latency” is another word for “delay.” It’s the time it takes for the sound from the front-of-house speakers at an outdoor festival to reach you on your picnic blanket. Or the time it takes for your finger to strike a piano key, for the key to move the hammer, for the hammer to strike the string, and for the sound to reach your ear.
Your brain is wired so that it doesn’t notice if sounds are delayed 3 to 10 milliseconds (ms). Studies have shown that sound reflections in an acoustic space must be delayed by 20 to 30 ms before your brain will perceive them as separate. However, by around 12 to 15 ms (depending on the listener), you will start to “feel” the effects of a delayed signal. It is this amount of delay that we must battle constantly when recording and monitoring digitally.
When Good Latency Goes Bad
Roundtrip latency in digital-audio applications is the amount of time it takes for a signal (such as a singing voice or a face-melting guitar solo) to get from an analog input on an audio interface, through the analog-to-digital converters, into a DAW, back to the interface, and through the digital-to-analog converters to the analog outputs. Any significant amount of latency can negatively impact the performer’s ability to play along to a click track or beat — making it sound like they’re performing in an echoing tunnel (unless they have a way to monitor themselves outside of the DAW application, such as a digital mixer or one of our AudioBox™ VSL-series interfaces).
What’s Producing the Delay: a Rogue’s Gallery
In practical terms, the amount of roundtrip latency you experience is determined by your audio interface’s A/D and D/A converters, its internal device buffer, its driver buffer, and the buffer setting you have selected in your digital audio workstation software (Mac®) or Control Panel (Windows®).
Converters. Analog-to-digital converters in your interface transform an analog signal from a microphone or instrument into digital bits and bytes. This is a ferociously complex process and takes a little more than half a millisecond on average. On the other end of a long chain we’re about to describe are the digital-to-analog converters that change the digital stream back into electrical impulses you can hear through a monitor speaker or headphones. Add another half-millisecond or so.
Buffers. A buffer is a region of memory storage used to temporarily hold data while it is being moved from one place to another. There are four of these in the digital signal chain.
- USB Bus Clock Front Buffer
- ASIO (Driver) Input Buffer
- ASIO (Driver) Output Buffer
- USB Clock Back Buffer
Each buffer contributes to the total delay present between the time you play that hot guitar solo and the time you hear it back in your headphones.
Fast Drivers and Slow Drivers
The biggest variable that contributes to how long this process will take is driver performance.
In computing, a driver is a computer program allowing higher-level computer programs to interact with a hardware device. For example, a printer requires a driver to interact with your computer. A driver typically communicates with the device through the computer bus or communications subsystem to which the hardware connects. Drivers are hardware-dependent and operating-system-specific.
One of the primary goals for engineers who design audio-interface drivers is to provide the best latency performance without sacrificing system stability.
Imagine that you’re playing an old, run-down piano and that there is a catch in the hammer action—so great a catch, in fact, that when you strike a key, it takes three times longer than normal for the hammer to strike the string. While you may still be able to play your favorite Chopin etude or Professor Longhair solo, the “feel” will be wrong because you’ll have to compensate for the delayed hammer-strikes.
You will have a similar problem if the buffer-size setting is too large when you overdub a part while monitoring through your DAW.
Take a Couple Buffers and Call Us in the Morning
A buffer is designed to buy time for the processor; with the slack the buffer provides, the processor can handle more tasks. When the buffer size is too large, it’s delaying the data—adding latency—more than is necessary for good computer performance.
But if the buffer size is too small, the processor has to work faster to keep up, making it more vulnerable to overload, so your computer-recording environment becomes less stable.
Consider this scenario: You’re playing your favorite virtual instrument, trying to add one more pad part to a nearly finished song. All 42 tracks are playing back, and all of them use plug-ins. And then it happens: Your audio starts to distort, or you start hearing pops and clicks, or, worse, your DAW crashes because your CPU is overloaded. The 64-sample buffer size you have set, in conjunction with the amount of processing that your song requires, overtaxes your computer.
If you increase the buffer size, you can get the software crashing to probably go away. But it’s not that simple.
The more that you increase the buffer size — for example, up to 128 samples — the more you notice the latency when trying to play that last part. Singing or playing an instrument with the feel you want becomes extremely difficult because you have essentially the same problem as with that rickety piano’s delayed hammer-strikes. What you play and what you hear back in your headphones or monitor speakers get further and further apart in time. Latency is in the way. And you’re in that echo-y tunnel again.
Let’s look at our piano example again, this time with a fully functioning baby grand and not that antique piano in desperate need of repair. For simplicity’s sake, let’s pretend that there is no mechanical delay between the time your finger strikes the key and the hammer strikes the string. Sound travels 340 meters/second. This means that if you’re sitting one meter from the hammer, the sound will not reach your ears for a little more than 3 ms. So why does 3 ms not bother you a bit when you’re playing your grand piano, but a buffer setting of 2.9 ms (128 samples at 44.1 kHz) in your DAW make it virtually impossible for you to monitor your guitar through your favorite guitar amp modeling plug-in?
As mentioned earlier, roundtrip latency is the amount of time it takes for a signal (such as a guitar solo) to get from the analog input on an audio interface, through the A/D converters, into a DAW, back to the interface, and through the D/A converters to the analog outputs. But you can only control one of part of this chain: the input latency—that is, the time it takes for an input signal such as your guitar solo to make it to your DAW.
This is where driver performance enters the picture. There are two layers to any isochronous driver (used for both FireWire and USB interfaces). The second layer provides the buffer to Core Audio and ASIO applications like PreSonus Studio OneTM and other DAWs. This is the layer over which you have control.
To make matters worse, you usually are not given this buffer-size setting as a time-based number (e.g., 2.9 ms); rather, you get a list of sample-based numbers from which to choose (say, 128 samples). This makes delay conversion more complicated. And most musicians would rather memorize the lyrics to every Rush song than remember that 512 samples equates to approximately 11 to 12 ms at 44.1 kHz! (To calculate milliseconds from samples, simply divide the amount of samples by the sample rate. For example, 512 samples/44.1 kHz = 11.7 ms.)
The buffer size that you set in your DAW (Mac) or in your device’s Control Panel (Windows) determines both the input and the output buffer. If you set the buffer size to 128 samples, the input buffer and the output buffer will each be 128 samples. At best, then, the latency is twice the amount you set. However, the best case isn’t always possible due to the way audio data is transferred by the driver.
For example, if you set your ASIO buffer size to 128 samples, the output latency could be as high as 256 samples. In that case, the two buffers combine to make the roundtrip latency 384 samples. This means that the 2.9 ms of latency you set for your 44.1 kHz recording has become 8.7 ms.
The analog-to-digital and digital-to-analog converters in an audio interface also have latency, as do their buffers. This latency can range from 0.2 to 1.5 ms, depending on the quality of the converters. An increase of 1 ms of latency isn’t going to affect the quality of anyone’s performance. However, it does add to the total roundtrip latency. For our 128-sample example setting, adding 0.5 ms for each converter brings the roundtrip latency to 9.7 ms. But 9.7 ms is still below the realm of human perception, and it shouldn’t affect your performance.
So Where Does the Extra Delay Really Come From?
The culprit is that first mysterious audio-driver layer that no one ever discusses. This lowest layer has no relationship to audio samples or sample rate. In the case of USB, it is a timer called the USB Bus clock. (There is a similar clock for FireWire processes but we will only discuss the USB Bus clock here.)
The USB Bus clock is based on a one-millisecond timer. At an interval of this timer, an interrupt occurs, triggering the audio processing. The problem that most audio manufacturers face is that without providing control over the lower-layer buffer, users cannot tune the driver to the computer as tightly as they would like. The reason for not exposing this layer is simple: The user could set this buffer too low and crash the driver—a lot.
To get around this, most manufacturers fix this buffer at approximately 6 milliseconds. Depending on the audio driver, this could be 6 ms input latency and 6 ms output latency. But like the ASIO buffer discussed earlier, even if these buffer sizes are set to the same value, the resulting output latency can differ from the input latency.
For our example, let’s keep things simple and say that latency is 6 ms in both directions. Our mystery is solved: With most audio interfaces, there is at least 12 ms of roundtrip latency built into the driver before the signal ever reaches your DAW, in addition to the 9.7 ms latency we calculated earlier.
Thus, you set 2.9 ms of delay in your DAW and end up with 21.7 ms of roundtrip latency. (All of the numbers in our examples are based on averages. However, some manufacturers are able to optimize driver performance to minimize these technical limitations.)
Overcoming the Problem
Many audio-interface manufacturers have solved the problem of monitoring latency through a DAW by providing zero-latency monitoring solutions onboard their interfaces.
One of the earliest solutions was the simple analog Mixer knob on the front panel of the PreSonus FirePod. This allowed users to blend the FirePod’s analog (pre-converter) input signal with the stereo playback stream from the computer. This basic monitoring solution is still available on such interfaces as the PreSonus AudioBox USB, AudioBox 22VSL, and AudioBox 44VSL. Another solution, used in the PreSonus FireStudio™ family and many others, is to include an onboard DSP mixer that is managed using a software control panel.
While both of these solutions resolve the problem of latency while monitoring, they provide a flat user experience by giving control only over basic mix functions like volume, panning, solo, and mute.
Anyone who has ever recorded using one of our StudioLive™ mixers (anyone who has ever tracked with any mixer, for that matter) knows how important it is to be able to record a track while hearing effects (as well as compression and equalization). For example, if reverb on a vocal is going to be part of the final mix, it’s almost impossible to record the vocal “dry” — phrasing and timing are totally different when you can’t hear the duration and decay of the reverb.
The developers at PreSonus were intrigued by the idea that they could conceivably provide the user with some level of control over the USB Bus clock buffer and perhaps offer another way of monitoring outside the DAW (while adding effects and reverb). After much experimentation, they discovered that most modern computers can easily and stably perform at a much lower USB Bus clock buffer than previously thought. On average, a 2 to 4 ms USB Bus clock buffer offers both excellent performance and stability. On a powerful computer like a fully loaded Mac Pro, they’ve been able to lower this buffer to the lowest USB Bus clock setting possible: 1 ms.
Given these discoveries, not giving the user control over the USB Bus clock buffer and telling them that the only latency controls available are the ASIO and Core Audio buffer sizes seems at best duplicitous, and at worst a failure to provide customers with the best latency performance a modern computer can provide.
This is where AudioBox VSL-series interfaces enter the picture. This new series of interfaces takes advantage of these technological discoveries and provides users with the ultimate monitor-mixing experience, without including expensive onboard DSP and the proportional cost increase to customers.
Tracking with Reverb and Effects... without Being in a Tunnel
The Virtual StudioLive software that comes with our AudioBox 22VSL, 44VSL and 1818VSL interfaces looks like — and performs like — the Fat Channel on our StudioLive 16.0.2 mixer.
You get compression, limiting, 3-band semi-parametric EQ, noise gate, and high-pass filter. We’ve even included 50 channel presets from the 16.0.2 just to get you started. Plus you get an assortment of 32-bit reverbs and delay, each with customizable parameters.
Optimizing AudioBox VSL Software
AudioBox VSL monitoring software runs between the USB Bus clock buffer and the ASIO/ Core Audio buffer on your computer, so it is only subject to the latency from the USB Bus clock buffer.
Unlike many manufacturers, PreSonus did not fix this buffer at 6 ms; rather, AudioBox VSL offers a choice of three buffer sizes. To reduce the confusion of presenting the user with two types of buffer settings, these USB Bus clock buffer settings are labeled “Performance Mode.”
This setting is available from the Setup tab in AudioBox VSL, and it directly affects the amount of latency you will hear in monitor mixes from AudioBox VSL software.
At the Fast setting, AudioBox VSL runs at a USB Bus clock buffer setting of 2 ms, while Normal sets the buffer to 4 ms, and Safe sets it to 8 ms. So when you set your AudioBox VSL to run at the Fast USB Bus clock buffer setting, roundtrip latency will be approximately 3.5 ms, including the time it
takes for the A/D – D/A converters to change analog audio to 1s and 0s and back to analog again.
To optimize these buffer settings for your particular computer:
- Begin by creating a monitor mix in AudioBox VSL and setting the Performance mode to Fast.
- Listen carefully for pops and clicks and other audio artifacts at a variety of sample rates.
- Now load the AudioBox VSL with compressors, EQs, reverbs, and delays.
- If you hear audio artifacts, raise the Performance mode to Normal. On most machines, Normal will provide the best performance with the most stability. If you have an older machine with a slower processor and a modest amount of RAM, you may need to raise this setting to Safe. Keep in mind, however, that even at 9 ms, AudioBox VSL is running at a lower latency than monitoring through most DAWs at the best ASIO/ Core Audio buffer setting—and the best buffer setting will not work on a slower computer anyway.
- Once you have Performance mode tuned, the next latency component of the driver to tune is the ASIO buffer size (Windows) or Core Audio buffer size (Mac). This time, load a large session into your DAW and experiment with the buffer settings. Again, you are listening for pops and clicks and other audio artifacts.
If your DAW includes a CPU-performance meter (as Studio One does), you can use this to help you find the best buffer setting for your computer.
No matter how you set your ASIO/Core Audio buffer size, the monitoring latency in VSL is not affected. So you can set this buffer fairly high and lower it only when you are playing virtual instruments. Keep in mind that it’s still important to determine the lowest threshold at which your DAW can still perform stably.