Much of my professional and personal lives are currently documented elsewhere.
|I'm now working on
sensemaking--the science of understanding how people understand
the world around them.
||Understanding Large Document
Collections (HICSS 2005)
Being literate with large document collections (HICSS 2006)
|I'm working with a talented Stanford student, Hiroko Terasawa, to understand and characterize how people perceive timbre. We hope to build a model of timbre that explains human timbre perception (and by extension speech) as well as the three-color model explains color vision.||First
experiment (ICAD 2005)
|I work with a bunch
of very smart people at the Telluride Neuromorphic
Engineering Workshop. I work with the audio group and in 2003
two of the students did some very nice work on audio classification
using novel approaches. Nima Mesgarani used cortical spatial-temporal
response fields and a tensor SVD to get the best performance to date on
a speech-music discrimination task. Sourabh Ravindran looked at a
more conventional approach, but optimized to run at low power levels in
a sensor network.
Low-Power Sensor Approach
started working on user modeling. How do users make sense of
their world? This paper, presented at the User Modeling 2003
meeting, talks about how to figure out the different tasks that users
perform as they go about their work. It applies the
expectation-maximization (EM) algorithm to segment and cluster
time-oriented text data.
Multitasking Users Paper
|I wrote a book
chapter, describing our work on text tools for video mining. This
chapter summarizes the work on semantic-audio retrieval and multimedia
segmentation, both described below.
the Semantics of Media book chapter
|My first attempt, using a winner-take-all approach, was published at ICASSP. A better approach, based on something I called "mixture of probability efforts" will be published at ICME.||ICASSP SAR Paper|
|Our first attempts were aimed at pure text segmentation. A shorter ICASSP paper and a longer paper for the SIAM Text Mining Workshop were accepted for publication.||SIAM Text Mining Workshop Paper|
|We later extended these ideas to audio and video signals. We presented the first of these ideas, and some information on the temporal correlations present in these different dimensions, at the ICCV Event 2001 workshop. The most complete version of our work was presented at the ACM Multimedia conference.|| ICCV
Event 2001 Paper
|The BabyEars project
is a research effort to understand how people communicate emotional
messages with speech. Computers, to date, are good at recognizing
words but they ignore the emotional content in a speech
signal. Babies, on the other hand, learn the emotional content in
the speech they hear before they understand the words. We wanted
to bridge these two worlds by building statistical classifiers that
understand emotional messages in speech. We want machines to
recognize the emotional messages in a speech signal as well as dogs do.
This project studied the emotional content in speech signals using speech spoken by parents to their infant kids. We looked at approval, attentional, prohibition messages and compared the properties of these infant-directed speech signals to "normal" speech between adults. Our work ignored the words and concentrated on the prosodic features of the speech: the acoustic pitch, timing and loudness cues that we vary as we speak.
The key findings in this study are:
|For more information about this
project, see the following links.
Living on Earth Radio Show
New Scientist Article
Speech Communication Journal
Send email to email@example.com requesting an electronic reprint of the BabyEars journal article.
This is joint work with Gerald McRoberts at the Haskins Laboratory
|Michele Covell and I did some neat work called FastMPEG on an algorithm to time-compress audio files that have been bit-compressed (ala MPEG) without first decompressing them. As far as we know, nobody else has done this much processing on compressed MPEG audio signals.||ICASSP Paper and audio examples|
|I created a system which measures the synchronization between a speech signal and a talking face. This work, FaceSync, was published at NIPS 2000.||NIPS Paper|
|IEEE published a book I wrote with my thesis advisor, Principle of Computerized Tomographic Imaging. We sold several thousand copies through IEEE and it was in print for 14 years. IEEE decided not to reprint it. Even better, SIAM decided to include it in their book series Classics in Applied Mathematics. It is now back in print.||Online copy of the book|
|Steve Greenberg (ICSI at Berkeley) and I organized a NATO Advanced Studies Institute on Computational Models of Hearing. We are currently putting the finishing touches on a book that summarizes the work of our "students."||Order your copy now|
Last update: June 22, 2005 Send email to me at firstname.lastname@example.org
My address isMalcolm Slaney