Malcolm Slaney

Publications and Pointers

Several of my technical reports and papers are available on the net for downloading. The following is a brief list. I have a personal web page for the fun stuff.

This page shows my auditory modeling work, my signal processing work, some of my software tools, and pointers to other work.

Note! My tomography book is now online. Download all the chapters. See it here.

Auditory Modeling

Interval Publications

Autocorrelation and AIM correlograms There is now a new version of the Auditory Toolbox. It contains Matlab functions to implement many different kinds of auditory models. The toolbox includes code for Lyon's passive longwave model, Patterson's gammatone filterbank, Meddis' hair cell model, Seneff's auditory model, correlograms and several common representations from the speech-recognition world (including MFCC, LPC and spectrograms). This code has been tested on Macintosh, Windows, and Unix machines using Matlab 5.2.
Note: This toolbox was originally published as Apple Computer Technical Report #45. The old technical report ( PDF PDF and Postscript ) and old code ( Unix TAR and Macintosh BinHex ) are available for historical reasons.
Auditory Toolbox
(Version 2.0)


My primary scientific goal is to understand how our brains perceive sound. My role in this research area is a modeler, I build models that explain the neurophysiological and psychoacoustic data. Hopefully these models will help other researchers understand the mechanisms involved and result in better experiments. My latest work in this area is titled "Connecting Correlograms to Neurophysiology and Psychoacoustics" and was presented at the XIth International Symposium on Hearing in Grantham England from 1-6 August, 1997. Two correlograms, one computed using autocorrelation and other other computed using AIM, are shown on the left.	Abstract (and soon demos)

Sine Wave Speech Spectrogram The information in most auditory models flows exclusively bottom-up, yet there is increasing evidence that a great deluge of information is flowing down from the cortex. A paper I wrote for the 1995 Computational Auditory Scene Analysis workshop is called "A Critique of Pure Audition". This paper has been greatly refined and is published in the book Computational Auditory Scene Analysis in 1998 by Erlbaum. The figure at the left shows the spectrogram of sine-wave speech. Book chapter (153k pdf)
Book chapter (620k postscript)

Audio/video examples

Original paper

Spectrogram of cochlear channel output I have written several papers describing how to convert auditory representations into sounds. I have built models of the cochlea and central auditory processing, which I hope both explain auditory processing and will allow us to build auditory sound separation tools. These papers describe the process of converting sounds into cochleagrams and correlograms, and then converting these representations back into sounds. Unlike the printed versions of this work, the web page includes audio file examples. It includes better spectrogram inversion techniques, a description of how to invert Lyon's passive cochlear model, and a description of correlogram inversion. This material was first presented as part of the Proceedings of the ATR Workshop on "A Biological Framework for Speech Perception and Production" published in September 1994. A more refined version of this paper was an invited talk at the 1994 NIPS conference. The image on the left shows the spectrogram of one channel of cochlear output; one step in the correlogram inversion process. ATR (Kyoto) Workshop Web Reprint with Sound Examples
NIPS Conference Paper (Postscript)

A portion of Frank Cooper's plastic spectrogram Pattern Playback is the term used by Frank Cooper to describe his successful efforts to paint spectrogram on plastic and then convert them into sound. I wrote of Pattern Playback techniques, from Frank Cooper's efforts to my own efforts with auditory model inversion, in a paper which was published at the 1995 IEEE International Conference on Systems, Man, and Cybernetics. My paper is titled "Pattern Playback from 1950 to 1995". The image at the left shows a portion of one of Cooper's spectrograms. Web Version
Postscript (1.8M)

Adobe PDF (227k)

Apple Publications

The following are publications during my time at Apple. The Mathematica notebooks are designed to be self-documenting and in each case the postscript and PDF files are also available. Those files that are Matlab toolboxes include source and documentation All these files are available with the gracious permission of Apple.

"Auditory Model Inversion for Sound Separation" is the first paper to describe correlogram inversion techniques. We also discuss improved methods for inverting spectrograms and a cochlear model designed by Richard F. Lyon. This paper was published at ICASSP '94. Postscript (1.5M)
Adobe PDF (243k)

Online patent

"A Perceptual Pitch Detector" is a paper that describes a model of human pitch perception. It is similar to work done by Meddis and Hewitt and published in JASA, but this paper has more real-world examples. This paper was published at ICASSP '90. Postscript (3M)
Adobe PDF (315k)

"On the importance of time" is an invited chapter by Dick Lyon and myself in the book Visual Representations of Speech Signals (edited by Martin Cooke, Steve Beet and Malcolm Crawford, John Wiley & Sons). This tutorial describes the reason that we think time-domain processing is important when modeling the cochlea and higher-level processing. Postscript
Adobe PDF

"Lyon's Cochlear Model" is a Mathematica notebook that describes an implementation of simple (but efficient) cochlear model designed by Richard F. Lyon. It is also known as Apple Technical Report #13. Mathematica Notebook (1.2M)
Postscript (2.2M)

Adobe PDF (628k)

A software package called MacEar implements the latest version of Lyon's Cochlear Model. MacEar is written in very portable C for Unix and Macintosh computers. This link points to the last published version (2.2). (Note the README file included has old program results. The names of the output files have changed and there are a couple of extra channels being output. I'm sorry for the confusion.) Unix Shell Archive with Sources

Gammatone Math is a Mathematica notebook that describes a new more efficient implementation of the Gammatone filters that are often used to implement critical band models. It is also known as Apple Technical Report #35. Mathematica Notebook (327k)
Postscript (677k)

Adobe PDF (184k)

Apple Hearing Demo Reel was published as Apple Technical Report #25. It includes more than one hour of correlogram videos, including a large fraction of the ASA Auditory Demonstration CD. I have a limited number of NTSC copies left. Send email to malcolm@interval.com to request a copy. HTML Video Guide
PDF Video Guide (116k)

Postscript Video Guide (195k)

Signal Processing

Interval Publications

Chris Bregler, Michele Covell, and I developed a technique we call Video Rewrite to automatically synthesize video of talking heads. This technology is cool because we use a purely data driven approach (concatenative triphone video synthesis) to create new video of a person speaking. Given new audio, we concatenate the best sequence of lip images and morph them into a background sequence. We can automatically create sequences like the Kennedy and Johnson scenes in the movie "Forrest Gump."

Original SIGGRAPH '97 Paper (with examples)

Audio Visual Speech Perception Workshop

Video Demonstration

We studied how adults convey affective messages to infants using prosody. We did not attempt to recognize the words, let alone to distill more nebulous concepts such as satire or irony. We analyzed speech with low-level acoustic features and discriminated approval, attentional bids, and prohibitions from adults speaking to their infants. We built automatic classifiers to create a system, Baby Ears, that performs the task that comes so naturally to infants. The image on the left shows one of the decision surfaces which classifies approval, attention and prohibition utterances on the basis of their pitch. Web Page
Postscript (189k)

Adobe PDF (42k)

I was able to help Michele Covell do some neat work on time-compression of audio. Lots of people know how to compress a speech utterance by a constant amount. But if you want to do better, which parts of the speech signal can be compressed the most? This paper describes a good technique and shows how to test the resulting comprehension. Conference Paper
Technical Report with Audio Samples

Two gaussian clouds of data Eric Scheirer and I worked on a system for discriminating between speech and music in an audio signal. This paper describes a large number of features, how they can be combined into a statistical framework, and the resulting performance on discriminating signals found on radio stations. The results are better then anybody else's results. (That comparison is not necessarily valid since there are no common testing databases. We did work hard to make our test set representative.) This paper was published at the 1997 ICASSP in Munich. The image on the left shows clouds of our data. Web Page
Postscript (349k)

Adobe PDF (263k)

Smooth spectrogram used in morphing Work we've done to morph between two sounds is described in a paper at the 1996 ICASSP. This work is new because it extends previous audio morphing work to include inharmonic sounds. This paper uses results from Auditory Scene Analysis to represent, match, warp, and then interpolate between two sounds. The image on the left shows the smooth spectrogram, one of two independent representations used when morphing audio signals. Web Page
Postscript (3M)

Adobe PDF (237k)Patent

Apple Publications

I wrote an article describing my experiences writing "intelligent" signal processing documents. My Mathematica notebook "Lyon's Cochlear Model" was the first large document written with Mathematica. While I don't use Mathematica as much as I used to, I still believe that intelligent documents are a good way to publish scientific results. These ideas were also published in a book titled "Knowledge Based Signal Processing" that was published by Prentice Hall.

KBSP Book Chapter in Adobe PDF (3M)

IEEE Signal Processing Article in Adobe PDF (2M)

Software Publications

Interval Publications

I have written Matlab m-functions that read and write QuickTime movies. The WriteQTMovie code is more general than previous solutions for creating movies in Matlab. It runs on any platform that Matlab runs on. It also lets you add sound to the movie. The ReadQTMovie code reads and parses JPEG compressed moves.

Matlab Source Code

Chris Bregler and I coded an implementation of an image processing technique known as snakes. There are two m-files that implement a type of dynamic contour following popular in computer vision. First proposed by Kass, Witkin and Terzopoulos in 1987, snakes are a variational technique to find the best contour that aligns with an image. The basic routine, snake.m, aligns a sequence of points along a contour to the maximum of an array or image. Provide it with an image, a set of starting points, limits on the search space and it returns a new set of points that better align with the image. The second m-file is a demonstration script. Using your own array of image data, or a built-in default, a demo window is displayed where you can click to indicate points and see the snake program in action.

Matlab Source Code

Matlab Demonstration Source

Dick de Ridder and his colleagues wrote a nice description of a Support Vector Classifier and provided some code to demonstrate how it works. I added a Graphical User Interface (GUI) so I could play with all the options and put lots of data through it.
With the GUI, you select points with the mouse. After you tell it what kind of distance metric you want, you get several plots showing the results. The links at the right show a number of points separated by a fourth order polynomial.
Image showing GUI
Image showing points and support

Image showing distance to hyperplane

Get all the code

Michele Covell and I wrote some Matlab code to compute multi-dimensional scaling (MDS). MDS allows you to reconstruct an estimate of the position of points, given just relative distance data. These routines do both metric (where you know distances) and non-metric (where you just now the order of distances) data. Technical report containing the code (no documentation).

Apple Publications

The SoundAndImage toolbox is a collection of Matlab tools to make it easier to work with sounds and images. On the Macintosh, tools are provided to record and playback sounds through the sound system, and to copy images to and from the scrapbook. For both Macintosh and Unix system, routines are provided to read and write many common sound formats (including AIFF). Only 68k MEX files are included. Users on other machines will need to recompile the software. This toolbox is published as Apple Computer Technical Report #61.

Postscript Documentation (153k)

Adobe PDF Documentation (20k)

Macintosh Archive

Filter Design is a Mathematica notebook that describes (and implements) many IIR filter design techniques. It was published as Apple Technical Report #34. Mathematica Notebook (556k)
Postscript (1M)

Adobe PDF (212k)

image of CDROM box I created a Hypercard stack to make it easier for people with a Macintosh and CDROM drive to interact with the Acoustical Society of America's Auditory Demonstrations CD. This CD is a wonderful collection of auditory effects and principles. The ASA Demo Hypercard stack includes the text and figures from the book and lets you browse the Audio CD. Macintosh Archive

VUMeters display I wrote a program for the Macintosh 660/AV and 840/AV computers that uses the DSP (AT&T3210) to monitor audio levels. VUMeters runs on any Macintosh with the AT&T DSP chip. Source and binaries are included. Macintosh Archive

Icon for TCPPlay Bill Stafford and I wrote TCPPlay to allow us to play sounds from a Unix machine over the network to the Macintosh on our desks. This archive includes Macintosh and Unix source code and the Macintosh application. There are other network audio solutions, but this works well on the Macintosh. Macintosh Archive

Previous Publications

In a past life, I worked on medical imaging. A book on tomographic imaging (cross-sectional x-ray imaging) was published by IEEE Press: Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging, (New York : IEEE Press, c1988). The software used to generate many of the tomographic images in this book is available. The parallel beam reconstruction on the left was generated with the commands

gen n=100 k=100 if=lib.d.s
filt n=100 k=100
back n=100 k=100
disn min=1.0 max=1.05

Tomographic Software (Unix TAR format)

The book is now online. Download the PDF.

Code to implement the diffraction tomography algorithms in my PhD Thesis is also available.

Compressed Unix TAR File

Carl Crawford, Mani Azimi and I wrote a simple Unix plotting package called qplot. Both two-dimensional and 3d-surface plots are supported.

Compressed Unix TAR File

Now obsolete code to implement a DITroff previewer under SunView is available. This program was called suntroff and is an ancestor of the X Window System Troff previewer. It was written while I was an employee of Schlumberger Palo Alto Research. All files are compressed Unix TAR files.

Source

LaserWriter fonts

Complete package

Other Research Pointers

I organize the Stanford CCRMA Hearing Seminar. Just about any topic related to auditory perception is considered fair game at the seminar. An archive of seminar announcements can be found at Stanford (organized as a table) or at UCSC as a chronological listing of email announcements. Send email to hearing-seminar-request@ccrma.stanford.edu if you would like to be added to the mailing list.

For more Information

I can be reached at

Malcolm Slaney
Interval Research, Inc.
1801 Page Mill Road, Building C
Palo Alto, CA 94304
(650) 842-6143
(650) 565-7944 (FAX)

The best way to reach me is to send email.

This page last updated on March 1, 2000.

Malcolm Slaney ( malcolm@interval.com)