I. Introduction to the physiology of speech and hearing

Rabiner & Juang, Chapter 1, 2.1 - 2.2


Speech production physiology

Figure: The human vocal system.
After [J. L. Flanagan, Speech Analysis and Perception, Springer-Verlag, Berlin, 2nd edition, 1965].

Figure: Schematic of the functional components of the vocal system.
After [J. L. Flanagan, Speech Analysis and Perception, Springer-Verlag, Berlin, 2nd edition, 1965].

The process of producing speech sounds:

Speech differs from breathing in that at some point in the path you set the air in rapid motion or vibration

Two principal components of speech production

  1. Excitation - create a sound by setting the air in rapid motion
  2. Vocal tract - "shape" the sound

A. Excitation: three principal forms

  1. Phonation: vibration of vocal cords

    The vocal cords consist of ligament and muscle, and are adjustable under muscle control. The cartilage surrounding the vocal cords provides support. The opening that allows air to pass through the vocal cords from the trachea to the larynx is called the glottis.

    There are two modes of operation of the vocal cords:

    1. Vibrating
      • cords tense, pressed together - no air flows
      • air pressure from the lungs forces them open
      • local pressure is reduced --> cords close
      • the cycle repeats

      The result is a quasi-periodic release of air into the pharynx. The fundamental frequency of the vocal cord opening/closing cycle becomes the fundamental frequency (informally, the "pitch") of the resulting sound.

      The tenser the vocal cords

      -- the higher the pitch
      -- the shorter the period

      Typical frequency of vocal cord open/close cycle:

      male: 128 Hz
      female: 256 Hz

    2. Non-vibrating
      • vocal cords open
      • air flows from trachea to pharynx without interruption
      • not an excitation since the air isn't set into rapid motion

  2. Frication: Turbulent air flow

  3. Plosive: Closure at some point in the vocal tract, followed by a release of air

    e.g., /p/ as in "pot": closure at lips

    • can be combined with vocal cord vibration: phonation and plosive
      /b/ as in "boy": closure at lips closure as in /p/, combined with phonation


A model of speech production

Relation between excitation and the vocal tract

Simple model:

Linear system, with input x(t), system function H[], and output s(t)


Two principal components of speech production (continued)

  1. Excitation - create a sound by setting the air in rapid motion
  2. Vocal tract - "shape" the sound
B. Vocal tract:

Reference: Rabiner & Schafer 3.2, 3.3

Simple model: uniform tube closed at glottis end, open at mouth end
/a/ as in "ado" - called "schwa"

Modes of vibration: resonances

The uniform tube has resonant frequencies at 500, 1500, and 2500 Hz.
These are the same resonances that characterize the "schwa" sound.
In speech, resonant frequencies are called formant frequencies.

Can show that the vocal tract modeled as a uniform tube can be represented with an all-pole transfer function.

In terms of the LTI system model, the excitation is the input function x(t), the vocal tract acts as the system function H[], and the speech is the output s(t).

The shape of the resulting spectrum is given by H[f]. In a simplified model for phonated sounds, the glottal pulses of x(t) form an impulse train, with interval T between pulses. This appears in the speech spectrum as pulses at frequency intervals 1/T, shaped by the H[f] envelope.

In reality, the vocal system differs from the uniform tube model in several respects:

Model breakdowns:

Speech analysis systems typically assume all-pole filters are identical to tube models for all speech sounds.


Auditory system: hearing and perception

Reading: Rabiner & Juang section 3.5

Figure: Schematic of the human ear (not to scale).
After [J. L. Flanagan, Speech Analysis and Perception, Springer-Verlag, Berlin, 2nd edition, 1965].

Outer ear: collects sound waves

Middle ear: vibrations of tympanic membrane are translated to oscillations of liquid in inner ear

Inner ear:

Figure: Schematic of the organ of Corti, in inner ear.
After [J. L. Flanagan, Speech Analysis and Perception, Springer-Verlag, Berlin, 2nd edition, 1965].

cochlea: transforms mechanical vibrations into nerve impulses
nerve impulses --> brain

What information is in the nerve impulses? Not well understood

Processing in the inner ear: "Place Theory"

Movement of basilar membrane causes the hairs to bend (3K hairs in cochlea; 30K nerve fibers).
Produces electrical discharge in nerve.
Translates mechanical signal to electrical signal.

End of notes on Physiology of speech production and hearing.



Go: