Digital Speech Processing

HrvatskihrEnglishen

Pristupačnost

Text size:A A

Page layout Normal Simple

Page contrast Normal High contrast Invert high contrast

Clear all

Digital Speech Processing

Data is displayed for academic year: 2023./2024.

Lecturers

Prof.

Davor Petrinović

Course Description

Course gives fundamentals of digital speech processing and its applications in communications and multimedia. Digital speech modeling, parametric models. Speech analysis, parameter estimation for vocal tract model and excitation model. Most important speech models and their properties. Speech coding and applications. Automatic speech and speaker recognition, language detection. Speech feature vectors, Cepstral analysis. Statistical models for speech recognition, Hidden Markov Model, Gaussian Mixture Model, training procedures for statistical models. Acoustical and lexical models. Speech synthesis, diphonic, threephonic. Speech normalization and modification. Examples of systems for speech coding, recognition and synthesis.

Study Programmes

University graduate

[FER3-EN] Control Systems and Robotics - profile

Elective course (3. semester)

Elective courses (1. semester)

[FER3-EN] Electrical Power Engineering - profile

Elective courses (3. semester)

Learning Outcomes

recognize the significance of digital speech processing and its applications
describe speech production mechanism and corresponding physical models
compare various methods for modeling of speech signal in continuous and discrete time domain
apply linear prediction methods for modeling of speech signal
employ homomorphic speech processing for estimation of excitation and vocal tract model
develop simple algorithms for speech processing using Matlab
analyze quantization effects of model coefficients on its accuracy
apply methods for recognition of vowels and speaker identity

Forms of Teaching

Lectures

Lectures are organized in two terms. First term consists of 7 weeks of lectures and midterm exam. The second term consists of another 6 weeks of lectures and final exam. Weekly workload for lectures is 2 hours for total of 15 weeks in semester.

Independent assignments

Total course workload related to student individual work amounts to 90 hours, which students use for Program exercises and preparation for exams. Homework for each of the two semester terms is the Report of individual work on Program exercises. This Report also includes the report of Laboratory exercises. For individual work, students have to examine corresponding chapters in Course-book and Lecture notes which are cited in the week-by-week plan, perform the required exercises and prepare the report for each chapter.

Laboratory

During the semester, laboratory exercises are organized in accordance to week-by-week plan. These exercises are used to prepare students for individual work.

Grading Method

	Continuous Assessment		Exam
Type	Threshold	Percent of Grade	Threshold	Percent of Grade
Laboratory Exercises	0 %	10 %	0 %	10 %
Homeworks	0 %	20 %	0 %	20 %
Mid Term Exam: Written	0 %	30 %	0 %
Final Exam: Written	0 %	30 %
Final Exam: Oral		10 %
Exam: Written			50 %	60 %
Exam: Oral				10 %

Comment:

Assessment of Laboratory exercises and Homework (Program exercises) is performed commonly based on submitted Reports of individual work for the first and second term. Students can approach the oral part of the final exam only if they have at least 50% of total points from midterm exam and written part of the final exam.

Week by Week Schedule

Lectures: (L): Introduction to digital speech processing and its applications, Automatic speech, speaker and language recognition, Basic principles of speech synthesis, Text-to-Speech, Computer dialog systems with applications in virtual reality; Lab.exc. (E): Chap.: Survey of digital speech processing applications, Chap.: Fundamentals of speech production, Chap.: Phonetics and Linguistics.
Lectures (L): Fundamentals of speech production, Physical model of production; Lab.exc. (E): Chap 1: Recording of speech signals using sound cards.
Lectures (L): Acoustic model of vocal tract; Lab.exc. (E): Chap. 2: Analysis of speech signals in time domain.
Lectures (L): Excitation signal of the vocal tract; Lab.exc. (E): Chap. 3: Spectral analysis of speech signals and spectrograms and Chap. 4: Analysis of speech formant structure.
Lectures (L): Connected tube model of the vocal tract, Time discrete vocal tract model; Lab.exc. (E): Chap. 5: Automatic classification of vowels based on their format structure.
Lectures (L): Linear prediction and its application for speech modeling; Lab.exc. (E): Chap. 6: Automatic speaker classification based on formant structure.
Lectures (L): Autocorrelation method for LPC model estimation; Lab.exc. (E): Chap. 7: Linear prediction methods.
Midterm exam
Lectures (L): Properties of autocorrelation based LPC model; Lab.exc. (E): Chap. 8: Autocorrelation method for speech predictor estimation; and Chap. 9: Levinson-Durbin algorithm; prediction gain analysis.
Lectures (L): Covariance method for LPC model estimation, Parametric representations for short-time speech spectral envelope modeling; Lab.exc. (E): Chap. 10: Covariance method for speech predictor estimation.
Lectures (L): Homomorphic speech processing; Lab.exc. (E): Chap. 11: Quantization effects of LPC predictor coefficients.
Lectures (L): Applications of homomorphic processing on speech signal; Lab.exc. (E): Chap. 12: Homomorphic analysis of speech signal.
Lectures (L): Introduction to automatic speech recognition (ASR), Speech analysis for ASR; Lab.exc. (E): Chap. 13: Voicing and pitch estimation.
Lectures (L): Feature vectors; Statistical models and classification methods for ASR; Lab.exc. (E): Chap. 14: Example of the Vocoder.
Final exam

Literature

(.), Petrinović, D. (2010.), Uvod u digitalnu obradbu govora koristenjem Matlaba, FER, Udžbenici sveučilišta u Zagrebu,

Petrinović, D. (2003.), Laboratorijske vježbe iz digitalne obrade govora, FER, ZESOI

John R. Deller, Jr., John H. L. Hansen, John G. Proakis (2000.), Discrete-Time Processing of Speech Signals, Wiley-IEEE Press

Panos E. Papamichalis (1987.), Practical Approaches to Speech Coding, Prentice Hall

A. M. Kondoz (2005.), Digital Speech, John Wiley & Sons

Petrinović, D. (2010.), Uvod u digitalnu obradbu govora korištenjem Matlaba, FER, Udžbenici sveučilišta u Zagrebu

Petrinović, D. (2010.), Digitalna obrada govora, Zavodska skripta, FER, ZESOI

Lawrence R. Rabiner, Biing-Hwang Juang (1993.), Fundamentals of Speech Recognition, Prentice Hall

W. Bastiaan Kleijn, Kuldip K. Paliwal (1995.), Speech Coding and Synthesis, Elsevier Science Limited

L.R.Rabiner, R.W.Schafer (1978.), Digital Processing of Speech Signals, Prentice-Hall

E. Keller (1994.), Fundamentals of Speech Synthesis and Speech Recognition, Wiley-Blackwell

Sadaoki Furui (1991.), Advances in Speech Signal Processing, CRC Press

For students

General

ID 223034

Winter semester

5 ECTS

L3 English Level

L2 e-Learning

30 Lectures

0 Seminar

0 Exercises

13 Laboratory exercises

0 Project laboratory

0 Physical education excercises

Grading System

88 Excellent

75 Very Good

62 Good

50 Sufficient

Lehrstuhl für Sprachverarbeitung und Mustererkennung, RWTH Aachen

Praktikum Digitale Sprach- und Bildverarbeitung, TU Munchen

Automatic Speech Recognition, MIT

Introduction to Speech and Image Processing, UCLA

CE-DSP10 Audio processing, IEEE & ACM Computing Curricula, IEEE & ACM Computing Curricula

Speech and Language Processing - Module 4F11, Cambridge

EQ2320 Speech Signal Processing, Royal Institute of Technology Stockholm

Digital Speech Processing

Poll

No polls currently selected on this page!

Digital Speech Processing

Lecturers

Course Description

Study Programmes

University graduate

Learning Outcomes

Forms of Teaching

Grading Method

Comment:

Week by Week Schedule

Literature

For students

General

Grading System

Similar Courses

Digital Speech Processing

Poll