Loudspeaker Phase Accuracy and Musical Timing

Home > Support: Library > Speaker Design > Time and Phase Coherence > AIG article


Originally published in Audio Ideas Guide, Winter 1997.


Loudspeaker phase accuracy and musical timing

A speaker designer looks at a too-often ignored aspect of loudspeaker design.

By Roy Johnson, loudspeaker designer, Green Mountain Audio, Inc.


By time and tone do we perceive music. Over time, tones develop and change to create and resolve the tensions of music. Changes across time carry the information of music. We rely on timing to unravel the acoustic world around us, subconsciously and continuously. A complex wave is a unique structure formed from many individual elements. When their relative timing is changed, so is the information we decode.

Now for the bad news: All loudspeakers distort the timing between high and low sounds, regardless of how they operate. They differ by how much delay and where in the frequency range it is imposed. Many of the impressions that speakers make, such as 'boxy,' 'shrill,' 'spitty,' or 'forward,' are caused by the unique time delay that each imparts into various portions of the musical spectrum.

Bell Labs first analyzed this phenomenon in the late 1920's when designing and installing sound systems for movie theaters. Their conclusion? Time-delay distortion was indeed audible and should be minimized as much as possible. Yet, designers often add time delays as unavoidable by-products of engineering decisions. Many of them believe that 'small' timing errors are not really audible.

However, if you experience a speaker without these added timing problems, you'll quickly come to understand how sensitive we are to timing information. Some obvious examples would be the micro-moment occupied by the pause between two phrases, or the subtle shaping of a note as it begins and ends. Each lies at the heart of the emotion of music. Since we all hear and appreciate these very tiny events, how can we say that distorting their timing by a factor of 10 or more will not change the sound? Here's what we (and others) have found over the years:

  • Time-delay, or phase distortion, hides the small things that help to define reality. We have a nearly subconscious response to the sounds that mouths make while opening and closing, the precise timing between the hands on a string instrument and between all four limbs on the piano, organ, or drum kit. Timing is inherent in the interplay of breath control and fingering on a wind instrument.

  • As events are smeared in time, it is the echoes of these sounds which appear first -- the smaller sounds that delineate the size of an instrument or location of the artist.

  • Time delays move things around in space -- from front to rear and back again. Instruments and voices 'slide' back and forth, often times faster than they could ever do in real life, if they could have (or should have) even moved at all.

  • Phase distortion causes the timbre of an instrument to change -- the unique voice or texture created by its harmonic structure.

  • The attack and decay of a note become fractured. Voices lisp, strings irritate, basses become 'boxy,' speakers can be difficult to position 'just right.'

The causes and cures for time-delay distortion are well known. Perhaps the designer does not understand how this distortion can overwhelm subtle inflections. After all, these are the nuances that distinguish the best musicians, and how does one know if they are even there in the recording?

While these subtleties are as important to music as brushstrokes are to a painting, we ignore the shortcomings in loudspeakers to enjoy the music, just as we can be happy with a photographic substitute for a painting. However, a speaker that minimizes phase distortion is simply more musical, dynamic, and transparent -- it is more realistic.




Some background

Over the last 70 years, improvements to loudspeakers have come from two independent lines of research: Mechanics and psycho-acoustics. Physicists and electrical engineers first made large gains in performance when they transformed a speaker's mechanical characteristics into its electric-circuit equivalents, an area where the math is better developed. A solution was then translated back into mechanical terms to create a new suspension or type of enclosure to modify the shape of a cone or magnetic field.

Even now, as computers simulate complex systems, the accuracy of any model is limited by the quality of the translations to and from the electrical world. Any translation must now include how we perceive sound because the ear is so much more sensitive than test equipment. And perception can be very difficult to quantify.

I decided many years ago that if Bell Labs was right and time-delays were audible, then a speaker built without timing problems would clearly show what would happen as they were added to the design, one by one. It was interesting to verify that psycho-acousticians had it right all along -- that we use timing differently in every frequency range.

The advantage of my method was that I was able to use the complex sounds of music to verify the effects, instead of using simple clicks and tone bursts. Complex sound waves are a combination of both steady and other tones of different frequency and duration which start at different times, combine at different times and at different loudnesses, and originate from different directions.

These are many differences with which to keep track, and so we decide what's important about 'one' sound in order to identify and then follow it along. We mentally subtract any differences as time passes, looking for continuity and familiarity. Thus, timing is a vital part of knowing whether or not a noise 'belongs' to the original in some way. When a sound's constituent parts are shifted in time, the phase between the wave's components changes. Zero phase shift is just that -- no misalignment. A shift of 360 degrees would be one wavelength's worth. Two wavelengths would be 720 degrees.

As mentioned earlier, phase shift in speakers is blamed for all sorts of other problems -- for a speaker's preference for a certain amplifier, certain wire, certain room position, and even certain recording techniques. Comments such as "I found the presentation analytic, etched, forward, present, in-your-face, highly detailed, driving, fast, explosive, recessed, hollow, laid-back, shallow, sweet, slow, fat, bloated, hard to position in the room" are really about what phase shift sounds like at different frequencies and in different amounts. In fact, you can see the spatial nature in each of these comments. To the ear, time is distance, or time is dynamics or time is texture. It all depends on where and how much.

The blame for poor sound is often ascribed to everything else in the system but to a speaker with poor timing, because it's forgotten that distortions don't just add together in a system, they multiply: Distortion distorts distortion as well as the original signal. A tweeter made aggressive from a severe phase shift in a speaker's crossover circuit can hide the clarity of a good amplifier. It can also multiply the 'grainy' sound from an inferior one.




The causes of phase distortion

Time delay is the natural consequence of making something vibrate, whether it's electric fields or material objects. In speakers, only three things can cause time delays:

  • The moving elements (the drivers -- woofers, midranges, tweeters);

  • Their distances through the air to the listener; and

  • The crossover circuit.

Let's go over the cause of motion-based time delays first. Different drivers (round, square, flat) have an inherent amount of phase shift, related only to each one's natural resonant frequency. One analogy is a weight hanging from a spring. If you move the other end of the spring up and down very slowly, the spring does not stretch and the weight follows your motion exactly. The phase shift between your applied force and the weight's motion is zero. The moving system is in a 'minimum-phase' mode. If you move more rapidly, the spring starts to stretch and contract -- and the weight no longer follows your driving force. It moves with a different phase.



Figure 1: Phase of weight's motion to the driving force


Figure 1 shows how this simple system changes phase with frequency. Zero degrees (no difference in motion) occurs at zero Hertz (Hz = wave cycles per second), where there is either no motion or only a steady, one-directional movement. If you move fast enough to reach F1, the natural resonant frequency of the mass-spring system, the weight will be moving up and down a quarter-cycle behind your motion, or -90 degrees out of phase. At high frequencies, (F2 and up), the weight settles down to -180 degrees out of phase with the driving force, or one-half cycle behind. In other words, when you stop moving, the mass will begin to stop only one-half cycle later.



Figure 2: Typical sealed-box woofer phase vs. frequency


Figure 2 describes how a driver (such as a woofer) responds to an amplifier's signal -- somewhat differently from the mass and spring analogy in Figure 1. At high frequencies, the phase shift is zero instead of -180 degrees and reaches -90 degrees at F1, the resonance of the enclosed driver (woofer, midrange, or tweeter alike). The lower the musical note, the longer it's delayed. Below resonance, the phase lag levels off at -90 degrees where the mass-and- spring system went to zero. This is due to the reversal of the mass-spring situation.




With a speaker, the force (from the amplifier) moves the mass (the cone) against the force of the spring (the suspension and the air in the box), instead of moving the spring first. The lower the tone, the longer the speaker takes to respond -- which is a phase lag. This can be represented by a number, as in Figures 1 and 2, or as a positive value to indicate that more time has passed. The choice is a mathematical convenience -- test reports usually show a lag as a positive value. Thus, the phase of a speaker's motion differs from low to high, simply because it has mass and some kind of compliant suspension. Even a 'mass-less,' ionized-air tweeter has thermal mass and elasticity from hot ions transferring their energy.

Musical loudspeakers have often measured poorly because in musically-unimportant ways, they really did perform poorly -- and also because some musically-unimportant parameters were being measured. It's possible to make a loudspeaker that's musical and also measures well. It's possible to shop for them, too. In either case, the ears must be the final judge. To find flaws by ear, the music used must present instruments and voices spanning their complete ranges and tonalities -- clarinet to flute, tuba to cornet, violin to cello, soprano to baritone, bass to banjo, from Mozart to Primus. This will quickly reveal the problems -- and you need to play all of these selections if you'd rather find flaws in the showroom instead of your home two years later.

Recordings with little artificial echo or dynamic processing can be used to discover if a speaker's phase shift is at fault for a certain distortion, instead of the amplifier or wire. How to tell which recordings are made in such a straight-forward manner? Read the reviews, support your independent record label, and trust your judgment. Of course, it's best if you have knowledge of how musical instruments and voices sound live -- and what can happen to them in the studio. Consider reading some recording magazines such as Mix and Recording to learn the latest studio technology.

The moving cone or diaphragm is not the only source of phase error. A speaker's crossover circuit can add even more. Capacitors and inductors create a time delay between the voltage and current of any electromagnetic field flowing through them. The more parts in the circuit, the greater the time delay.

At this point, it must be said that many engineers mis-state phase shift. Instead of calling it 'time delay,' they might say something like, "the low-frequency components are somewhat out-of-phase with the highs, thus these waves will tend to appear inverted from the high frequency tones." This is true, but only when you look at an ongoing series of smooth sine waves, which is not music. Music is impulsive and intermittent, so phase becomes an issue for notes starting and stopping, or harmonics combining, or creating a driving rhythm.

Ask a musician what's the smallest timing lag in the beat he or she can perceive and some will say a portion of a 128th note. These are quite brief intervals, indeed. What that mythical engineer should have said is that, at the beginning of a musical note, the lows were actually delayed for a moment, and so of course, later on, the low-frequency wave would be moving 'up' on the 'scope while a high-frequency wave was already going downwards.

While this is certainly an 'out-of-phase' condition, the time delay is an important musical value, because the beginning and ending of a note contain important information about an instrument's location, the recorded acoustics, and type of instrument or number of voices it represents. Any delay among a complex wave's components means that they're not synchronized as the musician intended.





Figure 3: Equivalent driver distance-offset for two-way speaker


In 1971, Richard Heyser ("Determination of Loudspeaker Arrival Times, Part III," Audio Engineering Society Journal) suggested that one way to view the time-delay distortion in a loudspeaker is by envisioning a series of small loudspeakers, each handling a different frequency range. They would be located farther and farther away from the ear as we go down the scale, as shown in Figure 3.

This offset-distance represents the time delay at each frequency. With only a little more imagination, it follows that the sound stage would be stretched back and forth as well.

How can phase shift be avoided in a speaker? Some methods are simple and some rely heavily on electronic correction. A designer can choose to:

  • Operate each driver well above -- or well below -- its natural resonance, where phase shift is nearly constant;

  • Mount the drivers in the cabinet so that each is the correct distance from the listener;

  • Use a first-order crossover (more later);

  • Use a single, full-range driver (no crossover);

  • Use a co-axial driver (tweeter placed inside the apex of a woofer cone) with a first-order crossover;

  • Employ additional capacitors and inductors in the crossover circuit as corrective elements; or

  • Correct any problems with a 'black box' before the power amplifier, using either digital signal processing or analog circuits.

Each has its problems, but the last four present especially difficult challenges. Making a full-range driver is still not practical, and co-axial drivers have some rather serious distortions. The sixth method can be hard on an amplifier, not to mention the distortion that would be added from the extra circuit elements. The final method's 'black box' can never fully correct problems, nor ever be without its own colorations or distortions.

This leaves the first three methods as viable design criteria -- methods used only in a few brands. The first method has two versions: Drivers with naturally low resonant frequencies need very compliant suspensions (woofers and tweeters alike) if they use lightweight cones. Lighter cones and thinner suspensions are difficult to make uniformly.

On the other hand, drivers of naturally high resonance mean that a panel speaker's diaphragm would need to be stretched to unbelievable tensions in order to push its resonance up to ultrasonic frequencies. For woofers, Ed Long and Associates have an interesting design that drives a subwoofer from the sub-sonics on up to an intentionally high box-resonance frequency of 70-80Hz, at which point it's crossed over. A small box is the objective, and it requires equalization (EQ), heavy power application, a crossover with its -3dB point no higher than the box resonance, and a long-stroke, high-power woofer. This means bi-amping with a line-level crossover is necessary. In the world of high-end audio, that's all okay. But the EQ and crossover both inject time delay distortions into the system.

The second method, aligning the drivers at the 'correct' distance to the listener, depends upon the actual drivers used, the listener's position, and any circuit-induced phase shifts.

Finally, the third method, using a first-order crossover, is too often dismissed, out-of-hand. To quote Siegfried Linkwitz ("Active Crossover Networks for Noncoincident Drivers," Audio Engineering Society Journal, Jan/Feb 1976), the father of the fourth-order, high-phase-shift crossover, "This (first-order circuit) is not a very practical filter because of its slow cutoff behavior of 6dB per octave." What he means is that the drivers must be well-behaved far beyond their crossover points to be used with a first-order circuit, because this circuit allows the drivers to overlap across a wide range. To be used with a first-order crossover, only the best drivers need apply.




When designing for phase accuracy, a great deal of the initial work can be done via measurement. It soon becomes apparent that small anomalies must be tracked down and eliminated by ear, because the finest test equipment can't indicate what those 'blips' on the meter really mean to the ear. Test equipment will also be limited by the microphone used, as microphones have their own phase, tonal, and dynamic flaws. Even lab-calibration microphones can't resolve the same level of detail as the ear.

The designer has to choose between creating a perfect-looking waveform on the 'scope (which means the speaker design includes some unknown microphone-correction factors), or realistic reproduction to the ear. The consumer has to choose between speakers which 'measure well' versus ones that are musical. Both are tough choices to make, especially when measurements are trusted more than the ears.

However, a speaker that's musical on all types of music has good measurements. Speaker design lies at the end of a long chain of events with many variables. To find a coloration (distortion) that needs fixing, a designer must know that it couldn't have come from the musical instrument, microphone technique, recording console, reproducing system, or his own room. A designer can reduce the length of the reproduction chain by using his own master tapes -- something not possible for most consumers.

At the beginning, it was said that one effect of phase distortion is that it'll make an instrument or voice jump back and forth in space. Another effect is to change the instrument's timbre (TAM-ber) -- in other words, its texture. Which of these effects occurs depends on where the phase distortion is located on the frequency scale, and how much phase distortion is injected. If you're to hear either effect, you first need a reference point of no movement and/or no timbre-shift. The best place to 'zero-in' by ear will usually be in the midrange, in a frequency range away from any crossover points. There, phase shift is usually at a minimum and the recorded echo can be heard most clearly. Instruments stand still and any shifts up the scale don't result in any large change in timbre. As the sound moves out of this reference range, changes in location and/or timbre begin to emerge in designs with phase shift.

Perhaps the best way to judge what is 'correct' is to remember that music is not just about tonality. It involves time, and your ear is extremely sensitive to changes over time, from microseconds to years. The auditory process constantly compares 'what is what,' 'what is with what,' 'what came before,' and anticipating 'what comes next.' A trained ear knows what to expect. Change the expectation and change the experience. Without expectation, responses either cease, (as with elevator music) or become an experience of free association (which Disney exploited in Fantasia).




Experiences guide expectation for greater appreciation of how the next note is played. When listening for defects, try to visualize where the instrument is located in space (from front to rear), and remember what its timbre should be, regardless of whether it's being played loudly or softly. Because our psycho-acoustic reaction to phase shift isn't the same across the musical scale, we each have different interpretations of its effects.

In the bass, the wave-lengths are 5', 10', and 32' long, for example. These waves last for so long in the room that we can't easily 'wrap our minds' around them to create a mental picture of their source, and so we hear low-frequency phase problems as 'room problems.' A speaker which gets the timing wrong in the low bass can sound like it has room-placement problems, because the very lowest frequencies have been delayed for so long that they sound like they couldn't have possibly come from the speakers. Those 5', 10', and 32' waves arrive 1', 2', and 8' later (behind) the speaker's midrange -- which defines the distance to the walls. So we assume it must be our room -- around us, behind the speakers, and everywhere but the speakers themselves -- that's causing the discontinuity!

We then move the speakers around to see if we can 'free them' from the room's influence -- without much success. Moving a speaker an inch or two doesn't really matter to low-frequency waves that are 5 to 10' long. What often happens is that this movement has disrupted an uncomfortable blend between room reflections and the already time-delayed direct sound.

To get an idea of what belongs to the speaker and what is the room's contribution, listen to only one speaker (but not in mono) using your sense of time as well as tone. You'll hear the speaker delaying the low-frequency sound, even when you walk right up to it -- too close for it to be a room's contribution.

Time delay in the midrange can change the spatial location of an instrument, to the point at which it sounds like the engineer jerked the microphone smoothly or even violently. At higher frequencies, it may make a voice spit or lisp, or create an irritating edge or grit on voice and strings. All of these effects are readily explainable once we know how far a time delay will move an image and/or change the timbre.




A little math, a little circuit description, a lot of temporal analysis

At a frequency, 'f,' an image moves by a distance = (degrees of phase error at f/360) x by 13,500" per second)/f. If you look at phase shift numbers from the test reports, you'll be surprised at how far a speaker can stretch an image into space.

What also matters is how 'fast' this distortion 'comes on' with a change in frequency. In some speakers, images will be stretched back and forth whenever the sound passes through a range with abrupt phase shifts -- inducing kind of a rapid spatial 'flutter.' The largest source of phase shift in a loudspeaker is usually found in the crossover circuit, classified by its 'order.'

The simplest circuit is called a first-order filter: A capacitor is placed in series with the tweeter and an inductor for the woofer -- there is only one 'reactive' circuit element between a driver and the amplifier. The rate of roll-off for the first-order circuit is 6dB per octave, or 20dB per decade. (A decade would span 5,000Hz down to 500Hz, for example, and a 20dB decrease will seem only 1/4 as loud to the ear.)

Higher-order circuits (2nd, 3rd, and 4th) respectively place two, three, and four elements in the path for a more rapid roll-off (12, 18, and 24dB per octave) of the driver's response.

Each circuit has a very different time delay, and the graphs here show how this affects the amplitude response and square wave performance. However, although these are 'standard' graphs for this type of data, they only indirectly show what it is that we'll hear.

This is where the analysis of the graphs will become somewhat complicated -- not in a technical sense -- but how we can relate them to what we hear. This is seldom, and virtually never, discussed. It really can be a lot to follow. I'll try to keep things as simple as possible.

At the end, I hope you have a clear understanding of how an image will shift, and by how much, or how much the timbre might change in a loudspeaker by looking at a speaker's design and a few of its technical specifications. Some key concepts to understand before we begin -- in each graph:

  • Vtweeter and Vwoofer are the voltages (signals) to the tweeter and woofer.

  • Tweeter and Woofer label their frequency responses (in amplitude or loudness).

  • Phi Tw + W is the phase angle of the system (the amount of phase shift relative to the crossover point).

  • Tw + W is the total output (SPL) for continuous sine waves -- the total frequency response with the phases of the woofer and tweeter taken into account.

  • 'Normalized frequency' represents a multiple of crossover frequency (0.1, 0.5, 1, 2, 4 times, etc.). Each division marked is an octave wide.


Figure_4a Figure_4b Figure_4c

Figure 4; a, b, c, top to bottom: Typical circuit design for a first-order crossover.


Figure 4a shows the circuit layout for a first-order crossover -- one capacitor, one inductor. A first-order crossover creates a +45 degree shift for the tweeter and a -45 degree shift for the woofer. That's a difference of 90 degrees, which happens to be constant across the entire spectrum -- which is the same as no relative shift -- or zero, which is why phase shift is not shown in Figure 4b.

A first-order crossover has no phase shift and no time delay problems. Note how the total square wave response at the bottom of Figure 4c is 'perfect,' indicating harmonics and fundamentals can remain aligned at the listening position.

From here on, it gets a little complex as we analyze higher-order circuits.














Figure_5a Figure_5b Figure_5c

Figure 5; a, b, c: A typical circuit for a second-order crossover.


Figure 5a shows a 2nd-order crossover circuit. Figure 5b shows how a 2nd-order filter apparently adds 90-degrees of shift per driver (180-degrees total difference) at the crossover point. At the frequency extremes, this phase difference returns to zero, as each part of the filter (capacitor or inductor) gradually behaves more like a resistor (their reactances go to zero).

Contrary to popular belief, the 180-degree shift doesn't mean that the tweeter is out of phase with the woofer. It merely leads the woofer output by 180-degrees, or half of the wavelength of the crossover frequency. More about this in a moment.

Figure 5b also shows that the amplitude response (derived from steady-state tones) will have an infinite 'suck-out' Tw + W. At first glance, this cancellation does, indeed, seem exactly like that caused by a tweeter with reversed polarity. What is really true is that the time delay is causing cancellation on steady tones, because the time delay at the crossover frequency is half of a wave cycle, 1/(2f) seconds.

However, before we show how some designers attempt to correct this 'cancellation,' an important point must be made about what else this graph really shows. If you look at the dotted line, on the right side, for the tweeter's phase response in Figure 5b, the tweeter seems to be leading the woofer (by a positive number of degrees). On the left side, the woofer's curve is lagging (negative), at the crossover point. For the tweeter's low end to arrive 'sooner,' everything else must be delayed.

Thus, the low range of the tweeter arrives first, followed by its high range and the woofer's middle range. The upper range of the woofer arrives last. It's the placement of the 'zero points' at the frequency extremes that causes confusion, but only because these graphs are typically used to illustrate what happens on steady tones. Our difficulty comes when we re-interpret them as time delay charts. If we re-drew them to represent what we actually hear arrive from the speaker first, the 'zero points' will be placed elsewhere.

Without re-drawing anything, let's look at Figure 5b again, where this system lets the lowest note of the tweeter arrive first. But this isn't at the crossover point. It's below it, perhaps one octave below the crossover point, where the tweeter is 12dB softer than the woofer. There, the tweeter contributes about one dB to the overall sound pressure level. Even so, a one dB contribution is rather noticeable, so the choice of zero point should be placed perhaps even lower down the scale.

To 'fix' the problems of the system, a designer will often reverse the tweeter's wires to invert its polarity. Although the steady-state results shown in Figure 6a look much smoother, in reality this is because the graph for the tweeter's phase has had 180 degrees subtracted.

This seemingly minor point forms the basis for the many claims of 'phase-coherent' performance, which, at best, is a half-truth. The graph now shows 'a smooth rate of phase-angle change at the crossover point.' But it isn't smooth from the perspective of time passing by. In fact, regardless of how smooth this curve appears, the woofer and tweeter still have the same sequence of arrival as before.




Figure_6a Figure_6b

Figure 6, a and b, top to bottom: As in Figure 5 but with the tweeter's phase reversed.


Reversing the tweeter polarity only means that the tweeter is moving inward on the initial pulse while the woofer moves outward. This is evident in the square wave response in Figure 6b: The tweeter is pulling in while the woofer is pushing out. At the highest frequencies, the tweeter still arrives as before (in Figure 5), and now its absolute polarity will still be backwards.

These graphs are the major source of the confusion about the audibility of phase distortion, because they don't show how changing the tweeter's polarity will actually sound, except on steady tones. Note than in Figure 6a, the combined amplitude response of the woofer and tweeter now exhibits a 3dB rise for steady-state tones. This can be flattened by tuning the 'Q' of the crossover (slight changes in the values of capacitors and inductors). Unfortunately, this results in large changes in the speaker's impedance, which indicate that extra energy is being stored and released by the crossover circuit. This can cause the amplifier to stress out and can also change the tonal balance. The imposed 'notch' also takes transient energy away from an impulsive signal, so the speaker sounds mellower in this range.

If a woofer and tweeter are connected with a 3rd-order filter, even more phase shift is injected. Reversing the tweeter's polarity will never yield a smooth phase curve. Third-order crossovers separate the drivers by 270 degrees, or 3/4ths of a wavelength, of the crossover frequency. A 4th-order filter has a 360-degree phase difference at the crossover frequency. At 3,000Hz, the wavelength is about 4.5" of distance offset.

If a speaker uses a 4th-order crossover at 3,000Hz (common for a 1" tweeter), what will we actually hear from this system?

  • Down at 1,500Hz, the tweeter isn't really contributing. Thus, the image is formed by the woofer. However, the woofer moves back by one inch because the circuit imposes 75 degrees of shift at 1,500Hz. But the dimensional shift is always relative to that 3,000Hz wavelength reference point, where (75 degrees/360) x (13,500ips/3000Hz) = 1 inch.

  • At 2,000Hz, the tweeter output is now audible (12dB softer then the woofer). We hear the image from the woofer move back by a little more, to 1.1" (90-degrees of 3,000Hz equals ¼ wavelength). Yet the tweeter is actually forward by 3/4ths of a wavelength of 3,000Hz, or 3.4". As a soprano's harmonics move through this region, we'll hear the image begin to split by 4.5" total. The image will begin to diffuse from front-to-rear, and harmonics structures will begin to fuzz over -- the sound will become grainy.

  • At 3,000Hz, the woofer and tweeter contribute equally. However, the woofer image has moved backwards by 2.25", for the same total split of 4.5". At this point, the image is very confusing and textures are at maximum graininess. The voice is split into two parts -- 'esses' and 'tees' from the tweeter arrive first, stretched out from the sound of the throat as delivered by the woofer -- an unnatural occurrence. The voice hisses and spits, and strings are edgy*. There'll be comments such as "the transients are etched..." "...detail thrust at the listener..." "...this speaker seems very fast..." "...analytical..." and "...very sensitive to electronics...." Of course, these comments would be expected because the tweeter arrived first!

    * Not all higher-order designs sound like this, because there are ways to disguise the problem. Again, if the designer tunes the "Q" of the circuit (computer-optimized, or by ear), or otherwise misaligns the actual crossover point, the graininess will be reduced, along with transient response. The spatial distortion remains, however, and even a solo flute can be heard to wander.

  • At 5,100Hz, the woofer is now 12dB softer than the tweeter -- just audible. It's now 3.4" behind the tweeter's highest frequencies, and the tweeter is still 1.1" ahead -- a 4.5" total split.

  • At 12,000Hz, two octaves above the crossover point, the woofer output is non-existent, but the tweeter output will still arrive 30 degrees ahead (0.4 inches) of its very top end at 24,000Hz, where phase error returns to zero. This will disturb the timing of a wooden stick striking a bell, blurring its image. A musician striking the bell .4" late is definitely behind the beat as well.




This analysis can also be used for a three-way loudspeaker. Insert a crossover point of 300Hz, which is not uncommon for a woofer-to-midrange transition. Because we are 10 times lower down the scale, the split in the image becomes 10 times greater, or 45". This can lead to a hollow-sounding image, with no real sense of depth. Now, of course, we should choose a new reference 'zero,' above, perhaps somewhere between the woofer and tweeter -- say, the midrange at 1,000Hz. But, here, we won't. Let's just carry on as we've been doing for a moment longer.

1. At 75Hz, two octaves below 300Hz, the woofer is lagging by only 4". Except we're forgetting that a woofer in a sealed box is already starting to roll off at 75Hz, at 12dB per octave (from a 40Hz box resonance). This means the phase shift caused by the mechanical resonance is adding nearly 1/8-wave of 40Hz distance-offset, or about 40 more inches. So, down at 75Hz, the woofer is more than 44" behind the midrange!

2. We could be dealing with the phase shift of a ported speaker -- in that case, you could just double the distance -- offset numbers from the last paragraph for the woofer phase lag, because we're told that the port is out of phase -- except that you'd be wrong. We must consider how the air actually moves. A positive pulse to the woofer creates a negative pulse from the port just D/13,500 inches later, where 'D' is the internal distance from the woofer to the port's outlet. Because the woofer and port communicate through the compliance of the air, the positive woofer pressure will be small (not worth measuring), while the port's negative pressure is high. This back wave is certainly 180 degrees out of phase, sure, but we're hearing an inverted waveform -- whose time delay is only from the woofer to port distance, and any extra distance from your ear to the port.

3. A tweeter is also likely to be rolling off at 24,000Hz (or even worse, ringing ultra-sonically, which means phase is changing very rapidly with frequency) at 12dB/Octave. So its effect on the phase shift, way down at 2,400Hz, has to be taken into account by the same means.

Working in the time domain is no fun, that's for sure. But all of this would be useless unless we can link the numbers to their effects on the sound. Here are some situations we've all heard. Let's see if things add up.


Perceived rhythm

This can be affected by several feet of delay from the woofer, an absolutely plausible situation from the example described above. Take the beat from a tom-tom and compare it to the high-frequency 'crack' from the drumstick. If the rhythm is 88 beats per minute, four beats per measure, then one quarter note will occur every 60/88 = .68 seconds. Five feet of woofer offset is a time delay of about five milliseconds, or 0.005 second, nearly eight percent behind the beat. Certainly, the woofer 'sounds slow' compared to the earlier arrival of the stick.


Grainy sound

If we use the 3,000Hz crossover point of a 4th-order circuit design as an example, then one 3,000Hz cycle takes 1/3000 second (0.0003 sec, 0.33 millisecond). The crossover separates the woofer and tweeter by one full cycle, or 1/3 of one millisecond. Now, a 128th note occurs every 0.68/6 seconds, or 0.11 seconds (110 milliseconds). This is 366 times longer than the 0.30 milliseconds crossover delay -- too short to be heard as a rhythm difference. However, if a given midrange wave (such as voice) has harmonics up in that 3,000Hz range, a time delay will shift the harmonics relative to the fundamental. One way to envision how this looks on the overall shape of the complex wave is that a tiny spike would appear on the side of the main wave. It used to be 'buried' in the wave's peak. The result? We hear grain. If the amplifier has a tendency to a grainy distortion as well, this speaker will multiply the problem. Conversely, if the speaker doesn't have this type of distortion, it may not readily expose an amplifier's problem.




No more math -- just design choices

First-order crossovers avoid most of these timing problems, which is why I use them exclusively in our loudspeakers and why the most favored speakers of the past used them as well or used no crossover at all.

Yet, this is not to claim that these circuits are perfect. They cause problems, which we've found to be manageable for home systems, especially in light of hearing what reduced phase error does for the music. Some problems are the extraordinary driver requirements discussed earlier. Another is that tone-balance shifts are more apparent from standing to sitting. Yet, on extended audition, this latter effect is not as objectionable as are the large phase shifts from a higher-order crossover. Perhaps tone-balance errors are easier to mentally adjust, without the timing errors of higher-order circuits (which have tone-balance shifts upon standing, too).

We mentioned earlier that the ear is sensitive to how 'quickly' the phase shifts happen as the musical scale is traversed. Steeper filters of higher-order crossovers 'come onto' their phase shifts more rapidly than do the lower-order ones. To hear the effects, listen for changes in the sound stage location around the crossover points. If the crossover region falls across an acoustic guitar, a different microphone may seem to have been set closer to the upper strings. But as the guitarist plays lower down, you can hear that it's only one microphone. Ask yourself if the instruments sound 'whole.' If an instrument's harmonic structure travels through a crossover point and the texture (timbre) changes, you're hearing phase shift. Are textures consistent with a string section sitting in one place, or does the microphone move in closer on certain notes? Is there a real sense of space around all instruments, or only around a select few in a narrow frequency range far away from crossover points or mechanical resonances -- such as instruments of 'simple' harmonic structure like the French horn?

If you roll your own speakers, try building a two-way (just one channel) with a first-order crossover, starting with all the voice coil-to-cone junctions at the same distance from the ear. Move them back and forth and you'll hear the phase shift. It'll come and go, heard as tone cancellations or additions, and the image will also collapse from front-to-rear in the crossover region. Next, substitute a second-order circuit and reverse the tweeter wires (or not). In a simple two-way, you'll hear a pervasive graininess to the sound for two octaves below and above the crossover frequency. Imaging will become vague. You'll be able to point out the tweeter's location.

As you move to third-order crossovers and higher, the effect of instruments 'tearing apart' in space will increase because the phase shift is greater. Graininess decreases because the drivers have a narrower region of overlap for less interference. Raising the order of the crossover to infinity would prevent driver interference -- reducing grain to a narrow range -- but the location of a single instrument in space would 'jump' instantaneously about, as the crossover frequency was traversed. Transients having frequency components on each side of the crossover point would be torn apart. What before sounded like grain in a second-order crossover system now results in a particular note having a whistle-like quality.




By the way, all these time-delay numbers discussed above are valid only when you sit on axis with the drivers, at equal distances from the ear. What if you move away from there? Or what if the woofer is around the side of, or under, the enclosure? Throw in those time-delays and distance-offsets as well. The tone balance may be correct for steady-state tones, even with these other offsets, but the image of an instrument existing at a single point in space will be further blurred...and more realism lost.

What if you stand up, away from the woofer, which is down near the floor? The speaker will become brighter -- not because the woofer has to project the sound farther (that difference is only about 0.1dB), but because we hear the tweeter arrive even sooner. Precedence in the high frequencies influences our perception of distance, and thus the location for the upper midrange and high frequencies. If upper frequencies arrive sooner, they sound closer -- hence louder. The lower harmonics have yet to arrive to balance them out.

Other designs use multiple drivers -- two woofers or midranges, for example. In these designs, the sound stage emanates from a larger area, which in itself is an interesting effect. Yet, the multiple drivers bleed around the top of the listener's head. The lateral image can be very unstable with small head movements.

These speakers often are placed widely apart to enlarge the head's acoustic shadowing of the opposite ear. If a multiple-driver speaker uses a high-order crossover, it still suffers from the soundstage distortions discussed earlier. If it uses a first-order circuit, then phase distortion from the double drivers increases when you stand, because one midrange driver is far from the ear and the other midrange driver is closer, creating a double image. We all appreciate hearing stereo clearly, even in another room, and if the room echo bouncing around the house is not phase coherent, it'll be even less coherent.

By the way, why can we enjoy music on a clock radio? After all, its single, cheap speaker has lots of distortion: A severe, -180 degree phase shift at its middle-range cone-breakup frequency (the highs emerge a half-wave sooner). It has another -90-degree shift at its low frequency resonance. However, the phase shifts of a crossover or multiple drivers are not present, and thus cannot multiply the existing phase distortion of the driver. So we hear depth of image from that cheap speaker. This applies to the latest miniature speakers, too. Several applications don't inject crossover-circuit phase shifts because they don't have crossovers, including most headphones, full-range electrostat speakers, and the lowly AM car radio, for example.



The music will be disturbed by time delays and we can hear it. At low frequencies, the delay can be equivalent to many feet of distance. Speakers using anything more than a first-order crossover between a woofer and midrange will delay the woofer's output so much that bass instruments can sound like they were recorded in a different recorded-acoustic, leading to comments like, "The woofer's speed didn't mate well with the upper drivers..." "The woofer sounded sluggish...." A midrange's output can be delayed so far behind that of the tweeter that the 'spit' of a voice is exaggerated by the lack of the throat's contribution, which has yet to arrive.

The next time you listen, ask yourself if you're hearing time-arrival problems instead of tonal distortions. You'll discover that the speakers which preserve the timing between a sound's individual components are the most musical. Bear in mind that higher-order crossovers (certainly 'high-tech' ammo for marketing campaigns) also permit the use of much less-expensive drivers because each receives less out-of-band energy. Admitting that a design has time-domain distortions is not very effective for marketing and promotion.




Future thoughts

'Technology in the service of marketing' has led our industry astray over the past 20 years -- not only with crossover-circuit design, but also with cone materials (titanium laminates, Kevlar weaves, and polypropylene with mineral additives, just to name a few).

Contrary to advertising claims of stiffness and 'high internal damping,' most of these cones are actually designed to operate in a constant state of 'cone breakup' and not as perfect pistons. Measurements sell products, of course, and the frequency response of speakers has seldom been a straight line -- a joke among amplifier designers.

Controlled cone break-up can be used to produce a smooth, frequency response curve at the price of reduced clarity and dynamics. What looks smooth in the frequency domain turns out to be pretty ragged in the time domain.

The cone stiffness described in advertising is based on what is called Young's modules of cone material. Now this modulus is only the ratio of elongation of the materials (or stretching). Speaker cones don't stretch -- they bend and internally deform.

It is the Bulk modulus that describes how the material deforms in three dimensions. 'High internal damping' in a cone is a description of how energy is lost to friction as the molecules and fibers of the cone try to slide past one another. Yet, if the cone was perfectly rigid in all dimensions, then 'high internal damping' is a meaningless phrase, because nothing inside the cone is in relative motion.

An example of a cone that's designed to intentionally have a controlled break-up is the woofer with concentric rings molded into it. They're not there to stiffen the cone; they lie in the wrong direction for that. These rings are actually 'hinges' for different parts of the cone to move independently. The theory is that as frequency increases, a smaller portion of the cone moves, which means the voice coil's driving force sees less mass. So the output steadily increases with frequency, because the same force is moving less mass -- the rising response of a musical instrument or public address speaker's woofer or midrange can be a prime example of this -- where midrange efficiency is also valued.

Also, the 'smaller-diameter' cone will have greater dispersion at higher frequencies. But at what price? While it's an admirable goal to have a large woofer go higher with wider dispersion and greater efficiency, we're asking the cone material to be a perfect hinge. Not possible! Some unwanted part of the cone always moves, and with a phase lag as well. This appears as sudden changes in the driver's impedance curve.

If you ever see the frequency response of a woofer suddenly 'jump' in loudness at a certain frequency, and then level off again, well before any crossover point is reached, you can bet you're seeing either the first cone break-up frequency or a strong reflection (a 'whiplash rebound') from a mismatched suspension material. And if you look carefully, you'll see the matching jog in the impedance curve at the same frequency. If you don't see it, the impedance measurements are not clear enough.

Impedance, phase, amplitude, and dispersion are all related. Problems in one always show up in another.