A Critique of Pure Audition
Malcolm Slaney
Interval Research Corporation
1801 Page Mill Road; Building C; Palo Alto, CA 94304 USA
This chapter will be published in the book
Computational Auditory Scene Analysis,
Dave Rosenthal and Hiroshi Okuno, editors,
Erlbaum, 1998.
Postscript and
PDF reprints of this chapter are available
(but I recommend the whole book, it is good.)
Abstract
All sound-separation systems based on perception assume a bottom-up or
Marr-like view of the world. Sound is processed by a cochlear model,
passed to an analysis system, grouped into objects, and then passed to
higher-level processing systems.
The information flow is strictly bottom up, with no information flowing
down from higher-level expectations. Is this approach correct?
In this chapter, I first summarize existing bottom-up perceptual models.
Then, I examine evidence for top-down processing, describing many of the
auditory and visual effects that indicate top-down information flow.
I hope that this chapter generates discussion about what the role of
top-down processing is, whether this information should be included in
sound-separation models, and how we can build testable architectures.
Demostrations
Several of the stimulai described in this chapter are available for
your viewing pleasure. Audio examples are AIFF files, while the movies are
in QuickTime format.
- Figure 3.3
- Alternating white and black dots that create an illusion. Subjects see
one uniform motion---either motion up and down, or left and right---and
never a combination of the two directions.
A QuickTime movie (77k) is available.
(Source: Adapted with permission from Churchland et al., 1994).
- Figure 3.5
- An auditory illustion proposed by Peter Lagafoged.
"What Vowel is This"
has been translated into a sequence of HTML pages by Malcolm Slaney.
- Figure 3.6
- A sine-wave speech (40k AIFF)
example by Richard Remez and the
original (natural) speech (40k AIFF).
- Figure 3.6
- Miriam Makeba's Click Song (1.5M AIFF)
illustrates how clicks are perceived differently in speech and in music,
at least by the author's native english ears.
- Figure 3.7
- Three experiments demonstrating illusory motion.
The first movie (66k QuickTime)
appears to be three dots moving to the right, with the middle dot occluded
by the square.
In the second movie (61k QuickTime),
the outer dots are removed and there
is no longer a sense of motion.
Finally, in the third movie (862k QuickTime),
tones alternate
in the left and right speakers and the illusion of motion (and occlusion)
returns.
(Source: Adapted with permission from Churchland et al., 1994).
- Figure 3.8
- The McGurk effect (188k QuickTime).
Listen to how the sound changes as you open and
close your eyes while this movie is playing.
(Source: Movie courtesy of Michael Cohen, University of California, Santa
Cruz.)