This page shows my auditory modeling work, my signal processing work, some of my software tools, and pointers to other work.
Note! My tomography book is now online. Download all the chapters. See it here.
There is now a new version of the Auditory Toolbox. It
contains Matlab
functions to implement many different kinds of auditory models. The
toolbox
includes code for Lyon's passive longwave model, Patterson's gammatone
filterbank, Meddis' hair cell model, Seneff's auditory model,
correlograms
and several common representations from the speech-recognition world
(including
MFCC, LPC and spectrograms). This code has been tested on Macintosh,
Windows,
and Unix machines using Matlab 5.2.
Note: This toolbox was originally published as Apple Computer Technical Report #45. The old technical report ( PDF PDF and Postscript ) and old code ( Unix TAR and Macintosh BinHex ) are available for historical reasons. |
Auditory
Toolbox (Version 2.0) |
My primary scientific goal is to understand how our
brains perceive
sound.
My role in this research area is a modeler, I build models that explain
the neurophysiological and psychoacoustic data. Hopefully these models
will help other researchers understand the mechanisms involved and
result
in better experiments. My latest work in this area is titled
"Connecting
Correlograms to Neurophysiology and Psychoacoustics" and was presented
at the XIth
International Symposium on Hearing in Grantham England from 1-6 August,
1997. Two correlograms, one computed using autocorrelation and other
other
computed using AIM, are shown on the left. |
Abstract (and soon demos) |
The
information in most auditory models flows exclusively bottom-up,
yet
there is increasing evidence that a great deluge of information is
flowing
down from the cortex. A paper I wrote for the 1995
Computational Auditory Scene Analysis workshop is called "A
Critique
of Pure Audition". This paper has been greatly refined and is published
in the book Computational
Auditory Scene Analysis in 1998 by Erlbaum. The figure at the
left
shows the spectrogram of sine-wave speech. |
Book chapter (153k pdf) |
I have written several papers describing how to convert
auditory
representations
into sounds. I have built models of the cochlea and central auditory
processing,
which I hope both explain auditory processing and will allow us to
build
auditory sound separation tools. These papers describe the process of
converting
sounds into cochleagrams and correlograms, and then converting these
representations
back into sounds. Unlike the printed versions of this work, the web
page
includes audio file examples. It includes better spectrogram inversion
techniques, a description of how to invert Lyon's passive cochlear
model,
and a description of correlogram inversion. This material was first
presented
as part of the Proceedings of the ATR Workshop on "A Biological
Framework
for Speech Perception and Production" published in September 1994.
A more refined version of this paper was an invited talk at the 1994
NIPS conference. The image on the left shows the spectrogram of one
channel of cochlear output; one step in the correlogram inversion
process. |
ATR (Kyoto) Workshop Web Reprint with Sound Examples |
Pattern Playback is the term used by Frank
Cooper to describe his
successful
efforts to paint spectrogram on plastic and then convert them into
sound.
I wrote of Pattern Playback techniques, from Frank Cooper's efforts to
my own efforts with auditory model inversion, in a paper which was
published
at the 1995 IEEE International Conference on Systems, Man, and
Cybernetics.
My paper is titled "Pattern Playback from 1950 to 1995". The image at
the
left shows a portion of one of Cooper's spectrograms. |
Web Version |
| "Auditory Model Inversion for Sound Separation" is the first paper to describe correlogram inversion techniques. We also discuss improved methods for inverting spectrograms and a cochlear model designed by Richard F. Lyon. This paper was published at ICASSP '94. | Postscript (1.5M) |
| "A Perceptual Pitch Detector" is a paper that describes a model of human pitch perception. It is similar to work done by Meddis and Hewitt and published in JASA, but this paper has more real-world examples. This paper was published at ICASSP '90. | Postscript (3M) |
| "On the importance of time" is an invited chapter by Dick Lyon and myself in the book Visual Representations of Speech Signals (edited by Martin Cooke, Steve Beet and Malcolm Crawford, John Wiley & Sons). This tutorial describes the reason that we think time-domain processing is important when modeling the cochlea and higher-level processing. | Postscript |
| "Lyon's Cochlear Model" is a Mathematica notebook that describes an implementation of simple (but efficient) cochlear model designed by Richard F. Lyon. It is also known as Apple Technical Report #13. | Mathematica Notebook (1.2M) |
| A software package called MacEar implements the latest version of Lyon's Cochlear Model. MacEar is written in very portable C for Unix and Macintosh computers. This link points to the last published version (2.2). (Note the README file included has old program results. The names of the output files have changed and there are a couple of extra channels being output. I'm sorry for the confusion.) | Unix Shell Archive with Sources |
| Gammatone Math is a Mathematica notebook that describes a new more efficient implementation of the Gammatone filters that are often used to implement critical band models. It is also known as Apple Technical Report #35. | Mathematica Notebook (327k) |
| Apple Hearing Demo Reel was published as Apple Technical Report #25. It includes more than one hour of correlogram videos, including a large fraction of the ASA Auditory Demonstration CD. I have a limited number of NTSC copies left. Send email to malcolm@interval.com to request a copy. | HTML Video Guide |
Chris
Bregler, Michele Covell, and I developed a technique we call
Video
Rewrite to automatically synthesize video of talking heads. This
technology
is cool because we use a purely data driven approach (concatenative
triphone
video synthesis) to create new video of a person speaking. Given new
audio,
we concatenate the best sequence of lip images and morph them into a
background
sequence. We can automatically create sequences like the Kennedy and
Johnson
scenes in the movie "Forrest Gump." |
Original SIGGRAPH '97 Paper (with examples) |
| Web Page |
| I was able to help Michele Covell do some neat work on time-compression of audio. Lots of people know how to compress a speech utterance by a constant amount. But if you want to do better, which parts of the speech signal can be compressed the most? This paper describes a good technique and shows how to test the resulting comprehension. | Conference
Paper
Technical Report with Audio Samples
|
Eric
Scheirer and I worked on a system for discriminating between
speech
and music in an audio signal. This paper describes a large number of
features,
how they can be combined into a statistical framework, and the
resulting
performance on discriminating signals found on radio stations. The
results
are better then anybody else's results. (That comparison is not
necessarily
valid since there are no common testing databases. We did work hard to
make our test set representative.) This paper was published at the 1997
ICASSP in Munich. The image on the left shows clouds of our data. |
Web Page |
Work we've done to morph between two sounds is described
in a paper at
the 1996 ICASSP. This work is new because it extends previous audio
morphing
work to include inharmonic sounds. This paper uses results from
Auditory
Scene Analysis to represent, match, warp, and then interpolate between
two sounds. The image on the left shows the smooth spectrogram, one of
two independent representations used when morphing audio signals. |
Web Page |
| I wrote an article describing my experiences writing "intelligent" signal processing documents. My Mathematica notebook "Lyon's Cochlear Model" was the first large document written with Mathematica. While I don't use Mathematica as much as I used to, I still believe that intelligent documents are a good way to publish scientific results. These ideas were also published in a book titled "Knowledge Based Signal Processing" that was published by Prentice Hall. | KBSP Book Chapter in Adobe PDF (3M) |
I
have written Matlab m-functions that read and write QuickTime movies.
The WriteQTMovie code is more
general than previous solutions for creating movies in Matlab. It runs
on any platform that Matlab runs on. It also lets you add sound to the
movie. The ReadQTMovie code reads and parses JPEG compressed moves. |
Matlab Source Code |
Chris
Bregler and I coded an implementation of an image processing
technique
known as snakes. There are two m-files that implement a type of dynamic
contour following popular in computer vision. First proposed by Kass,
Witkin
and Terzopoulos in 1987, snakes are a variational technique to find the
best contour that aligns with an image. The basic routine, snake.m,
aligns
a sequence of points along a contour to the maximum of an array or
image.
Provide it with an image, a set of starting points, limits on the
search
space and it returns a new set of points that better align with the
image.
The second m-file is a demonstration script. Using your own array of
image
data, or a built-in default, a demo window is displayed where you can
click
to indicate points and see the snake program in action. |
Matlab Source Code |
Dick
de Ridder and his colleagues wrote a nice description
of a Support Vector Classifier and provided some code
to demonstrate how it works. I added a Graphical User Interface
(GUI)
so I could play with all the options and put lots of data through it.
With the GUI, you select points with the mouse. After you tell it what kind of distance metric you want, you get several plots showing the results. The links at the right show a number of points separated by a fourth order polynomial. |
Image
showing GUI
Image showing points and support |
| Michele Covell and I wrote some Matlab code to compute multi-dimensional scaling (MDS). MDS allows you to reconstruct an estimate of the position of points, given just relative distance data. These routines do both metric (where you know distances) and non-metric (where you just now the order of distances) data. | Technical report containing the code (no documentation). |
| The SoundAndImage toolbox is a collection of Matlab tools to make it easier to work with sounds and images. On the Macintosh, tools are provided to record and playback sounds through the sound system, and to copy images to and from the scrapbook. For both Macintosh and Unix system, routines are provided to read and write many common sound formats (including AIFF). Only 68k MEX files are included. Users on other machines will need to recompile the software. This toolbox is published as Apple Computer Technical Report #61. | Postscript Documentation (153k) |
| Filter Design is a Mathematica notebook that describes (and implements) many IIR filter design techniques. It was published as Apple Technical Report #34. | Mathematica Notebook (556k) |
I
created a Hypercard stack to make it easier for people with a
Macintosh
and CDROM drive to interact with the Acoustical Society of America's Auditory
Demonstrations CD. This CD is a wonderful collection of auditory
effects
and principles. The ASA Demo Hypercard stack includes the text and
figures
from the book and lets you browse the Audio CD. |
Macintosh Archive |
I
wrote a program for the Macintosh 660/AV and 840/AV computers that uses
the DSP (AT&T3210) to monitor audio levels. VUMeters runs on any
Macintosh
with the AT&T DSP chip. Source and binaries are included. |
Macintosh Archive |
| Macintosh Archive |
In
a past life, I worked on medical imaging. A book on tomographic imaging
(cross-sectional x-ray imaging) was published by IEEE Press: Avinash C.
Kak and Malcolm Slaney, Principles of Computerized Tomographic
Imaging,
(New York : IEEE Press, c1988). The software used to generate many of
the
tomographic images in this book is available. The parallel beam
reconstruction
on the left was generated with the commands
gen n=100 k=100 if=lib.d.s |
Tomographic Software (Unix TAR format) |
| Code to implement the diffraction tomography algorithms in my PhD Thesis is also available. | Compressed Unix TAR File |
| Carl Crawford, Mani Azimi and I wrote a simple Unix plotting package called qplot. Both two-dimensional and 3d-surface plots are supported. | Compressed Unix TAR File |
| Now obsolete code to implement a DITroff previewer under SunView is available. This program was called suntroff and is an ancestor of the X Window System Troff previewer. It was written while I was an employee of Schlumberger Palo Alto Research. All files are compressed Unix TAR files. | Source |
Malcolm SlaneyThe best way to reach me is to send email.
Interval Research, Inc.
1801 Page Mill Road, Building C
Palo Alto, CA 94304
(650) 842-6143
(650) 565-7944 (FAX)
This page last updated on March 1, 2000.