Natural Language Processing
This course is focused on basic natural language processing methods and models from an engineering perspective, with the emphasis on corpus-based methods and algorithms. The students will be able to understand and to participate in the development of methods and tools applied in text-to-speech synthesis, speech recognition, machine translation and similar systems.
- identify, from a computational perspective, complexity of NLP problems
- Evaluate open-source NLP tools
- Analyze text and speech corpora
- Support speech synthesis projects
- Support speech recognition projects
- Support machine translation projects
Forms of Teaching
2 hour per week during the winter semesterExams
continous evaluation during the semester; written iterim exam; written final exam; final oral exam.Consultations
on student's demandSeminars
|Type||Threshold||Percent of Grade||Threshold||Percent of Grade|
|Class participation||0 %||10 %||0 %||10 %|
|Seminar/Project||0 %||60 %||0 %||60 %|
|Mid Term Exam: Written||0 %||15 %||0 %|
|Final Exam: Written||0 %||15 %|
|Exam: Written||0 %||30 %|
Week by Week Schedule
- Natural language processing (NLP) as an engineering discipline. Its linguistic essentials and mathematical foundations.
- Dictionary organization, types of dictionaries. Annotated and not annotated corpora, problem of reusability, overview of markup schemes and tag sets.
- The role of lexical acquisition, supervised and unsupervised learning, evaluation measures.
- Statistical estimators and language models. Bayesian-based disambiguation and dictionary-based disambiguation vs. unsupervised disambiguation.
- Probabilistic models of pronunciation and spelling. Spelling error correction: minimum edit distance, Bayesian method for spelling correction.
- Hidden Markov Models (HMMs): fundamental questions, properties and variants. Implementation of HMMs, initialization of parameter values.
- Part-of-speech (POS) tagging: information sources. Applying HMMs to POS tagging, tagging accuracy.
- Introduction to syntax: context-free grammars (CFGs). Some features of probabilistic context-free grammars (PCFGs). Training a PCFG.
- Other syntax models (phrase structure grammars and dependency grammars) and their relations to CFGs.
- Parsing with CFGs. Lexicalized and probabilistic parsing. Parsing models vs. language models.
- NLP in speech synthesis and speech recognition. Overview of methods and tools.
- Language preprocessing in speech synthesis: morphological and syntactic analysis, prosody generation.
- Language and acoustic modeling for speech recognition: n-grams language models, probability estimation, and evaluation.
- Parallel corpora as a source for machine translation. Text alignment methods and tools.
- Technical evaluation of machine translation and translation tools. Overview of existing systems and use.