EE649 SPEECH PROCESSING BY COMPUTER Spring 2002

Project #4B: Small Vocabulary Continuous Speech Recognition System

Assigned: Thursday March 21 Due: Friday April 26

Individual project

You are to implement a recognizer for simple voice dialing applications using the Hiddem Markov Model Toolkit (HTK) V3.1 . This recognizer will be designed to recognize continuously spoken digit strings and a limited set of names. Though this recognizer is built for a small vocabulary continuous speech recognition system, the design is general-purpose and would be useful for a range of applications.

Toolkit Installation: The HTK 3.1 is available at: http://htk.eng.cam.ac.uk/. The version 3.1 is the current

release and is available for free download but you must first agree to the license. You must register for a username and password for accessing the HTK. Note HTK is only available as a source distribution. To build HTK3.1 you must have a working ANSI C compiler and associated tools installed on your system. We suggest you download the HTK and install it in your ECN account. After downloading and uncompressing, follow the instructions in the "README" file to compile and install the HTK 3.1.

HTKBook: A detailed handbook for HTK users is available at:

http://htk.eng.cam.ac.uk/prot-docs/HTKBook/htkbook.html, which can be accessed after your registration. Basically you can follow the steps in "A Tutorial Example of Using HTK" in the HTK Book to work on this project, constructing a recognizer with the HMMs being continuous density mixture Gaussian tied-state triphones with clustering performed using decision trees.

 

Vocabulary: The goal of the system to be built here is to provide a voice-operated interface for phone

dialing. Thus, the recognizer must handle digit strings and also personal name lists.

Training Data: You will have access to the training data recorded using the HTK tool HSLAB as described

in "Step 3 - Recording the Data" in the tutorial. The training data files contain 100 sentences spoken by one male speaker. The wav files and their transcriptions are provided to be used to develop a recognizer.

Testing Data: Your recognizer will be tested on the speech utterances spoken by the same speaker. The

test set sentences were generated by the tool HSGEN. You will not have access to this data.

Goal and Documentation:

  1. Your goal is to construct an HMM recognizer using HTK 3.1. Included in your documentation should be a detailed description of your procedure for building this recognizer step by step, the parameter adjustment you considered, and your reasons for choosing the parameters you did. You may extend the project with discussions and experiments on speaker adaptation.
  2. You should turn in:

  1. A written description of your procedures of constructing the recognizer including a quantitative summary of your recognizer’s performance on the data files provided.
  2. The files necessary to test your HMM recognizer, including the "macros" and "hmmdefs" of your final HMM, the dictionary and word network files, and the tiedlist file.

 

You must work by yourself on this project.

 

EE649 SPEECH PROCESSING BY COMPUTER Spring 2002

Project #4B: Data and Program Specifications

Training Files:

Each data file is the wave file for one sentence spoken by one adult male speaker. All of the data were recorded using a high quality microphone in a quiet room. There is some silence (no fixed amount) at the beginning and ending of each file.

The data files are available via anonymous ftp in directory

/var/spool/ftp/pub/ee649/Data/p4B/train_wav

File names are of the form from train0001.wav to train0100.wav.

 

 

The transcription for the training speech files is available via anonymous ftp in directory

/var/spool/ftp/pub/ee649/Data/p4B/train.txt

Each line contains the name for one training file and its corresponding transcription.

The EE649 Projects web page contains a link to the ftp site.