Digital Audio Basics: Sample Rate and Bit Depth
Although discussions of digital audio conversion have filled several books, a fundamental understanding of two terms is particularly important to correctly using your computer-based recording system: sample rate and bit depth.
The conversion process is complex, and there are multiple ways to accomplish it. But no worries: We’re just going to discuss sample rate and bit depth at a basic level, as applied to linear pulse-code modulation (PCM), one of the most common conversion technologies.
At the most basic level, computers operate one step at a time by turning a succession of switches on or off at very high speed. Since computers “think” in discrete steps, in order to convert analog audio signals to the digital domain, it’s necessary to describe the continuous analog waveform mathematically as a succession of discrete amplitude values.
In an analog-to-digital converter, this is accomplished by capturing, at a fixed rate, a rapid series of short “snapshots”—samples —of a specified size. Each audio sample contains data that provides the information necessary to accurately reproduce the original analog waveform. Things like dynamic range, frequency content, and so on are all contained within this datastream. The instantaneous amplitude level in each sample is given the value of the nearest measuring increment—a process called quantization. By reproducing these values and playing them back in the same order and at the same rate at which they were captured, a digital-to-analog converter produces a practically identical (in theory) copy of the original waveform.
The rate of capture and playback is called the sample rate. The sample size—more accurately, the number of bits used to describe each sample—is called the bit depth or word length. The number of bits transmitted per second is the bit rate. Let’s take a look at this as it applies to digital audio.
Digging A Bit Deeper
The on/off status of each switch in a computer is represented as 1 or 0, a system known as binary. Thus, a string of binary digits—bits —is used to describe anything a computer does, including manipulating and displaying text, images, and audio. Computers can manage entire strings of these bits at a time; a group of 8 bits is known as a byte; one or more bytes compose a digital word. Sixteen bits (two bytes) means that there are 16 digits in a word, each of them a 1 or 0; 24 bits (three bytes) means that there are 24 binary digits per word; and so on.
The number of bits in a word determines how precise the values are. Working with a higher bit depth is like measuring with a ruler that has finer increments: you get a more precise measurement. When the values are in finer increments, the converter doesn’t have to quantize as much to get to the nearest measuring increment.
Thus, a higher bit depth enables the system to accurately record and reproduce more subtle fluctuations in the waveform (see Fig. 1). The higher the bit depth, the more data will be captured to more accurately re-create the sound. If the bit depth is too low, information will be lost, and the reproduced sample will be degraded. For perspective, each sample recorded at 16-bit resolution can contain any one of 65,536 unique values (216). With 24- bit resolution, you get 16,777,216 unique values (224)—a huge difference!
The most important practical effect of bit depth is that it determines the dynamic range of the signal. In theory, 24-bit digital audio has a maximum dynamic range of 144 dB, compared to 96 dB for 16-bit but today’s digital audio converter technology cannot come close to that upper limit. As of this writing, the 24-bit converters in StudioLive™ (including StudioLive AI-series) digital mixers and the FireStudio™ Mobile interface offer a dynamic range of 118 dB, which is close to the best dynamic range attainable with current technology.
The Going Rate
As noted earlier, in the digital conversion process, the converters record and play samples at specified sample rates. The Nyquist-Shannon sampling theorem states that in order to accurately reconstruct a signal of a specified bandwidth (that is, a definable frequency range, such as 20 Hz to 20 kHz), the sampling frequency must be greater than twice the highest frequency of the signal being sampled. If lower sampling rates are used, the original signal’s information may not be completely recoverable from the sampled signal (see Fig. 2).
If the sampling frequency is too low, aliasing distortion can result. Aliasing is a major concern when using analog-to-digital conversion. Improper sampling of the analog signal will cause high-frequency components of the signal to be aliased with
genuine lower-frequency components. If this happens, the digital-to-analog conversion will create an incorrectly reconstructed signal.
In addition, higher sampling rates enable you to record very high frequencies above the normal range of human hearing. While inaudible by themselves, these ultrasonic frequencies can interact, creating intermodulation distortion (such as beating) that results in audible frequency content that many engineers believe to impart subtle psychoacoustic effects.
For a variety of reasons, then, many recording engineers rely on sampling rates of 88.2, 96, and even 192 kHz to ensure extremely accurate recordings that capture every detail.
Which rate you choose depends at least in part on the product you need to deliver. For example, audio CDs and MP3s are delivered at 44.1 kHz, so sampling at 88.2 kHz makes the converter’s calculations relatively simple. Digital broadcast uses 48 kHz, so a 96 kHz sampling rate is an obvious choice. That said, some engineers believe that today’s sample-rate conversion is good enough that it’s not necessary to choose a rate based on keeping the math simple. For these engineers, the higher rate is generally considered better.
The High-Resolution Frontier
Finally, one often encounters the term “high-resolution audio” but it is rarely defined. That’s because there is no agreed-upon definition. For many years, “resolution” referred to bit depth, but in recent years, the term has been used more broadly to refer to both sample rate and bit depth. And “high resolution,” in particular, is a relative term. When 8-bit audio was in common use, 16-bit was “high resolution.” Today, 24-bit, 96 kHz audio is considered “high resolution.” In the future, it might be 32-bit, 192 kHz and beyond.