Digital Speech Processing

Course Description

Course gives fundamentals of digital speech processing and its applications in communications and multimedia. Digital speech modeling, parametric models. Speech analysis, parameter estimation for vocal tract model and excitation model. Most important speech models and their properties. Speech coding and applications. Automatic speech and speaker recognition, language detection. Speech feature vectors, Cepstral analysis. Statistical models for speech recognition, Hidden Markov Model, Gaussian Mixture Model, training procedures for statistical models. Acoustical and lexical models. Speech synthesis, diphonic, threephonic. Speech normalization and modification. Examples of systems for speech coding, recognition and synthesis.

General Competencies

Fundamental knowledge on digital speech processing, speech modeling and analysis, recognition and synthesis. Experience in digital speech processing applications in communication and multimedia systems.

Learning Outcomes

  1. recognize the significance of digital speech processing and its applications
  2. describe speech production mechanism and corresponding physical models
  3. list various methods for modeling of speech signal in continuous and discrete time domain
  4. apply linear prediction methods for modeling of speech signal
  5. employ homomorphic speech processing for estimation of excitation and vocal tract model
  6. develop simple algorithms for speech processing using Matlab
  7. analyze quantization effects of model coefficients on its accuracy
  8. apply methods for recognition of vowels and speaker identity

Forms of Teaching

Lectures

Lectures are organized in two terms. First term consists of 7 weeks of lectures and midterm exam. The second term consists of another 6 weeks of lectures and final exam. Weekly workload for lectures is 2 hours for total of 15 weeks in semester.

Exams

Course can be passed through continuous assesment based on midterm and final exams. Final exam consists of written part and oral exam. Students have another possibility to pass the course through end of semester exam, that also has a written and oral part.

Laboratory Work

During the semester, two laboratory exercises are organized in accordance to week-by-week plan. These exercises are used to prepare students for individual work.

Consultations

Consultations are organized in lecture weeks just after the lectures.

Programming Exercises

Total course workload related to student individual work amounts to 90 hours, which students use for Program exercises and preparation for exams. Homework for each of the two semester terms is the Report of individual work on Program exercises. This Report also includes the report of Laboratory exercises. For individual work, students have to examine corresponding chapters in Course-book and Lecture notes which are cited in the week-by-week plan, perform the required exercises and prepare the report for each chapter.

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 0 % 10 % 0 % 10 %
Homeworks 0 % 20 % 0 % 20 %
Mid Term Exam: Written 0 % 30 % 0 %
Final Exam: Written 0 % 30 %
Final Exam: Oral 10 %
Exam: Written 50 % 60 %
Exam: Oral 10 %
Comment:

Assessment of Laboratory exercises and Homework (Program exercises) is performed commonly based on submitted Reports of individual work for the first and second term. Students can approach the oral part of the final exam only if they have at least 50% of total points from midterm exam and written part of the final exam.

Week by Week Schedule

  1. Lectures: (L): Introduction to digital speech processing and its applications, Automatic speech, speaker and language recognition, Basic principles of speech synthesis, Text-to-Speech, Computer dialog systems with applications in virtual reality; Individual work (I): Chap: Survey of digital speech processing applications, Chap: Fundamentals of speech production, Chap: Phonetics and Linguistics.
  2. Lectures (L): Fundamentals of speech production, Physical model of production; Individual work (I): Cahp 1: Recording of speech signals using sound cards.
  3. Lectures (L): Acoustic model of vocal tract, Excitation signal of the vocal tract; Individual work (I): Chap 2: Analysis of speech signals in time domain.
  4. Laboratory (LB): Chap 4: Analysis of speech formant structure; Individual work (I): Chap 3: Spectral analysis of speech signals and spectrograms.
  5. Lectures (L): Connected tube model of the vocal tract, Time discrete vocal tract model; Individual work (I): Chap 5: Automatic classification of vowels based on their format structure.
  6. Lectures (L): Linear prediction and its application for speech modeling; Individual work (I): Chap 6: Automatic speaker classification based on formant structure.
  7. Lectures (L): Autocorrelation method for LPC model estimation; Individual work (I): Chap 7: Linear prediction methods.
  8. Midterm exam
  9. Laboratory (LB): Chap 8: Autocorrelation method for speech predictor estimation; Individual work (I): Chap 9: Levinson-Durbin algorithm; prediction gain analysis.
  10. Lectures (L): Covariance method for LPC model estimation, Parametric representations for short-time speech spectral envelope modeling; Individual work (I): Chap 10: Covariance method for speech predictor estimation.
  11. Lectures (L): Homomorphic speech processing; Individual work (I): Chap 11: Quantization effects of LPC predictor coefficients.
  12. Lectures (L): Applications of homomorphic processing on speech signal; Individual work (I): Chap 12: Homomorphic analysis of speech signal.
  13. Lectures (L): Introduction to automatic speech recognition (ASR), Speech analysis for ASR; Individual work (I): Chap 13: Voicing and pitch estimation.
  14. Lectures (L): Feature vectors; Statistical models and classification methods for ASR; Individual work (I): Chap 14: Example of the Vocoder.
  15. Final exam

Study Programmes

University graduate
Electronic and Computer Engineering (profile)
Recommended elective courses (3. semester)
Information Processing (profile)
Specialization Course (1. semester) (3. semester)

Literature

Rabiner, L., Juang, B-H (1993.), Fundamentals of speech recognition, Prentice Hall, Englewood Cliffs, New Jersey
Kleijn, W.B., Paliwal, K.K. (1995.), Speech coding and synthesis, Elsevier
Kondoz A.M. (1994.), Digital speech, Coding for low bit rate communication systems, John Wiley & Sons
Petrinović, D. (2002.), Digitalna obrada govora, Zavodska skripta, FER, ZESOI
Petrinović, D. (2010.), Uvod u digitalnu obradbu govora koristenjem Matlaba, FER, Udžbenici sveučilišta u Zagrebu
(.), Laboratorijske vježbe iz digitalne obrade govora, Zavodska skripta Petrinović, D. FER, ZESOI 2003,

Laboratory exercises