Introduction to audio measurements and terms

Christopher Scott
Bowling Green, KY

Author's note:  This is a bit of a work in progress, and is considerably lacking in continuity and polish. I switch between coverage for broadcast audio purposes and amateur radio. In time I hope to clarify it. Please forgive me.
Having worked as a radio broadcast engineer for nearly 40 years, I've invested a lot of time fine-tuning am and fm radio stations to be loud and clear..  Conducting measurements which quantify this performance has played a big part.  It is amusing to see the colorful descriptions so often used by "audiophiles" when describing certain qualities. Too often these same folks, despite their exuberance, are completely unfamiliar with  basic audio terms and quantifiers, the measurement of which form the basis of comparisons.  There are some who will argue that sonic quality can best be judged with ears alone;  indeed in the case of transducers such as microphones, headphones, and loudspeakers, some experts agree.  In the electronics realm however, analyzing and comparing input and output waveforms of the device under test remains the accepted method. 
There seems to be some interest in certain amateur quarters about applying broadcast style audio treatments to ssb (and AM) transmission for communications, in the hope of achieving really "good audio." Although this is a subjective and moving target, it can in fact be quantified to a large degree.  Once the quality "bottlenecks" are identified, they can often be improved.  This note hopes to breakdown some of the mystery involved, and to investigate some of these bottlenecks.

Despite great strides in audio quality by digital techniques, audio is in fact, analog.  At the source as well as the final transducer this will always be true.  For the last 75 years there have been essentially three measurements which quantify audio quality, and they remain critically important.  These are;

  1. Frequency response,
  2. Distortion (wave form linearity), and,
  3. Signal to noise ratio or dynamic range.
The frequency response of an audio system is perhaps what is most noticed by people when judging quality.  It is simply the bandwidth, usually described by upper and lower frequency limits, often at the point where the response has reduced to half power, or -3 dB.  "CD" quality is often cited as the benchmark of excellent quality, and its frequency response is approximately, with practical equipment, 20 Hz to 20KHz, at -3db.  Analog FM broadcast audio, when properly adjusted, is arguably excellent quality.  For many years the FCC required annual audio proof-of-performance testing, and specified the limits at 50 Hz to 15 KHz.   In my experience, critical listeners using blind test methods can just barely perceive the difference between this and "CD" grade frequency response.   Figure 1 shows a frequency response graph of a popular dynamic microphone - far from "flat" response.

AM broadcast frequency response is generally considered 50 Hz to 8 KHz, with the upper limit artificially imposed by audio filtering designed to limit occupied bandwidth.  This is actually not bad sounding frequency response for music, but very few receivers allow the higher frequencies, e.g. above 3-4 kHz to pass due to difficulties with nighttime propagated hetrodynes and other
interference.  The resulting 50 Hz to 3.5 kHz is pretty good for speech, but comparatively poor for music.  SSB is, depending upon the filters involved, 300 Hz to 2700 Hz which is about the bare minimum for intelligible speech.  Newer rigs allow for broader transmit filtering, and I've found that 100 to 3000 Hz works very well.

Distortion can take many forms, and the definition has many variations. But the most basic measurement is Total Harmonic distortion (THD) which is measured in percent.  Due to the nature of the usual test methods which null out the test tone and measure the residual harmonic energy, the measurement is more precisely THD+N, or THD plus residual noise. Distortion can best be thought of as waveform non-linearities.  This may be  best understood by looking at the classic example - a case where an audio stage is overdriven to the point where it is said to "clip" the waveform.  Imagine a sinewave where the extreme top and bottom are flattened - the audio stage was incapable of faithfully following the excursion beyond the its power supply rails, and reached the "clip point".  This process generates third harmonic energy where there was none before.  Where the sinewave was formerly a clean, pure tone at its fundamental frequency only, now with third harmonic energy added, it sounds gritty, or has a bit of an edge.  The extent of these undesirable artifacts is determined by how much of the sine wave was clipped. 

Another important distortion test is IM, or intermodulation distortion.  Two pure and distinct tones are injected into the device under test ("DUT"), with the analyzer measuring the levels of the sum, difference and other product frequencies.  In a perfectly linear amplifier, these tones will not mix and produce sum and difference products. On the other hand, most are familiar with the hetrodyning process, where two frequencies are deliberately mixed. This is the same function.

Dynamic range is closely related to signal-to-noise ratio, which is the third independent quality factor.  Simply put, it represents the ratio of the loudest sound to the noise floor.  This noise floor may be residual white noise internally generated in the electronics, hum, or other undesirable background noise.  Traditionally an operating level is established with a constant tone corresponding to "0 VU."   With actual program material, brief peaks would actually exceed this point by 10-20 DB, so headroom above this point is required in order to avoid clipping.  Usually a a maximum level is reached in the electronics where a specified amount of harmonic distortion is measured - usually 1-3%.  This level then defines the clip point.  For amateur ssb on the HF bands,all this becomes simplified, because of the amount of static and noise we work through. Typically, if we have a 25 dB signal to noise ratio, it sound good, With essentially all amplitude modulated modulation methods including ssb, the easiest way to increase signal to noise ratio is to use more transmit power. Switching from 100 watts to 1000 watts increases s/n 10 dB. Not that an S unit is about 6 dB on most radios. A more directive antenna system can help in the same way way and produces a similar benefit on receive.

Back to dynamic range. Quantifying the dynamic range involves noting the absolute output level of the clip point (or operating level), removing the test tone while keeping the input normally terminated, and measuring the residual voltage, typically using a "weighting" filter which better emulates the sensitivity of the human ear.   The voltage ratio difference, represented in dB between the clip point and the noise floor defines the total dynamic range that is possible.  The signal-to-noise ratio is the same measurement between the operating level and the noise floor.  Therefore, dynamic range equals signal-to-noise ratio plus headroom.  In recent years with peak meters replacing VU meters, and with everything referenced to the absolute digital clip point of dBbfs - dB below full scale, the two terms have blurred together.  What's important is to use the same definition when comparing specifications - is headroom included in the numbers, or is the spec really total available dynamic range?

Quality grades.

People have different tastes in music and tonal quality.  Some believe that sound reproduction is best at 20% distortion as long as it's loud.  I disagree. The basic goal of the audio engineer is to faithfully reproduce the original sound.  Beyond this, certain "enhancements" can be added based upon taste, but the benchmark for quality comparison must always be the degree of faithful reproduction achieved.  The greater the artifacts produced in the reproduction process, the poorer the quality.  If the signal-to-noise (s/n) is infinite, the distortion is 0% and the frequency response 20Hz to 20 KHz is equal, humans will perceive it to be exactly like the original live performance.  In the real world however, this is rarely achieved.  Agreement between experts about what constitutes "high fidelity" is equally rare. It is important to understand that frequency response, distortion, and noise measurements are largely independent of each other.  A recording which is wonderful in two categories can still be terrible quality because of poor performance in the third. It is important to be clear about which element we are discussing when the subject is "quality." I shall instead describe some examples of real-world audio, and provide some typical quality numbers.

Despite what some LP enthusiasts claim, true "CD" audio is excellent quality.  From a good consumer-grade machine, I've measured about 93 dB dynamic range (96 dB is the theoretical maximum with 16 bit words describing each sample - at 6 dB, a doubling of voltage for each bit.)   The frequency response is typically 20 Hz to 20 KHz within 0.3 dB - essentially perfect, and the THD - total harmonic distortion measured at 1000 Hz is less than .05%.  This is not to say that every CD and player will achieve this, but the medium is capable of it. 

The next quality grade we shall explore is that associated with computer sound card recording and playback.  With 16 bits, itcan be true CD quality as previously described.  But this is only achieved with high quality sound cards, usually in the $200+ price class.  Sound cards are in fact the major limiting factor.  Even though all else is the same, including bit depth and sample rate, many consumer style cards are very poor performers. I've seen $150 cards with published specifications of 93 dB s/n (they really mean dynamic range) actually measure 65 dB playback and 55 dB on record.  When the manufacturer was contacted, it seemed their marketing department had  "calculated" the performance specifications.  Frequency response and THD were also worse than published.

On the other hand, there are excellent sound cards available which really do measure close to the 16 bit theoretical limit.  Digital Audio Labs and Audioscience both appear to have audio engineers on staff who've actually measured their product.  These cards and other high quality units do achieve true CD quality.

Sample rate describes how many waveform quantifications per second are done to reconstruct the audio.  These range from 11 to 48 KHz. Its important to understand that (when using good sound cards), this quality  determinant  affects primarily just frequency response - signal to noise and distortion are largely unnaffected, being most determined by the number of word bits.  The Nyquist frequency is simply double the maximum frequency response desired.  For example, 20 KHz frequency response is achieved with a sample rate of 40 KHz.  In the real world however, imperfect filters are required, and some additional safety margin on the Nyquist frequency is needed - about 10%.  So the CD process uses 44.1 KHz as the sample rate.

The "Presence Band" describes the region of treble frequencies that add clarity to speech. This ranges from approximately 2,000 Hz to 6,000 Hz. In the context of an ssb bandwidth restricted voice channel, 2,000 to 3,000 Hz. Most voice communications can benefit from some EQ boost in this area - but too much boost can sound unnatural - sometimes referred to as "thin" or "tinny".
There are now abundant lossy compression (distinguish from dynamic compression which increases average loudness) algorithms such as Mpeg two and three, AAC, etc., which trade many things for reduced audio file size or required data throughput.  Most  people find the resulting audio quality acceptably good when these data reductions are done in moderation.  This is however, a very murky area where intermodulation distortion is increased, and at least dynamically, frequency response is often decreased.  In addition, some background sounds which are believed to be masked by louder sounds may completely dissappear.  From a purist's viewpoint, these degradations are awful, but the trade offs are nevertheless done to allow use of certain limited (digital) bandwidth systems.  In the author's opinion, one cannot speak of high-fidelity in the same sentence as lossy compression. In addition, a whole new set of distorion types appear, many of which are very hard to measure, so the sound quality becomes totally subjective, which leads to ambiguity.

Back to W4NEQ main page