Natural Language Processing

Course Description

Theoretical foundations of natural language processing (NLP). Data resources: dictionaries and corpora, markup schemes and tag sets. Learning from corpora: lexical acquisition, word sense disambiguation, language models. Grammars: Hidden Markov Models (HMMs), Context Free Grammars (CFGs), and other. Grammar model implementation in part-of-speech tagging, and parsing. NLP preprocessing in speech synthesis, NLP postprocessing in speech recognition. Methods and tools for machine translation.

General Competencies

This course is focused on basic natural language processing methods and models from an engineering perspective, with the emphasis on corpus-based methods and algorithms. The students will be able to understand and to participate in the development of methods and tools applied in text-to-speech synthesis, speech recognition, machine translation and similar systems.

Learning Outcomes

  1. identify, from a computational perspective, complexity of NLP problems
  2. Evaluate open-source NLP tools
  3. Analyze text and speech corpora
  4. Support speech synthesis projects
  5. Support speech recognition projects
  6. Support machine translation projects

Forms of Teaching

Lectures

2 hour per week during the winter semester

Exams

continous evaluation during the semester; written iterim exam; written final exam; final oral exam.

Consultations

on student's demand

Seminars

seminar exercises

Other

Project exercises

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Class participation 0 % 10 % 0 % 10 %
Seminar/Project 0 % 60 % 0 % 60 %
Mid Term Exam: Written 0 % 15 % 0 %
Final Exam: Written 0 % 15 %
Exam: Written 0 % 30 %

Week by Week Schedule

  1. Natural language processing (NLP) as an engineering discipline. Its linguistic essentials and mathematical foundations.
  2. Dictionary organization, types of dictionaries. Annotated and not annotated corpora, problem of reusability, overview of markup schemes and tag sets.
  3. The role of lexical acquisition, supervised and unsupervised learning, evaluation measures.
  4. Statistical estimators and language models. Bayesian-based disambiguation and dictionary-based disambiguation vs. unsupervised disambiguation.
  5. Probabilistic models of pronunciation and spelling. Spelling error correction: minimum edit distance, Bayesian method for spelling correction.
  6. Hidden Markov Models (HMMs): fundamental questions, properties and variants. Implementation of HMMs, initialization of parameter values.
  7. Part-of-speech (POS) tagging: information sources. Applying HMMs to POS tagging, tagging accuracy.
  8. Introduction to syntax: context-free grammars (CFGs). Some features of probabilistic context-free grammars (PCFGs). Training a PCFG.
  9. Other syntax models (phrase structure grammars and dependency grammars) and their relations to CFGs.
  10. Parsing with CFGs. Lexicalized and probabilistic parsing. Parsing models vs. language models.
  11. NLP in speech synthesis and speech recognition. Overview of methods and tools.
  12. Language preprocessing in speech synthesis: morphological and syntactic analysis, prosody generation.
  13. Language and acoustic modeling for speech recognition: n-grams language models, probability estimation, and evaluation.
  14. Parallel corpora as a source for machine translation. Text alignment methods and tools.
  15. Technical evaluation of machine translation and translation tools. Overview of existing systems and use.

Study Programmes

University graduate
Computer Engineering (profile)
Recommended elective courses (3. semester)
Computer Science (profile)
Recommended elective courses (3. semester)
Software Engineering and Information Systems (profile)
Recommended elective courses (3. semester)
Telecommunication and Informatics (profile)
Recommended elective courses (3. semester)

Literature

Christopher D. Manning, Hinrich Schütze (1999.), Foundations of Statistical Natural Language Processing, MIT Press
Shrikanth Narayanan, Abeer Alwan (2004.), Text to Speech Synthesis: New Paradigms and Advances, Prentice Hall PTR
Ruslan Mitkov (ed.) (2005.), The Oxford Handbook of Computational Linguistics, Oxford University Press, USA
Daniel Jurafsky, James H. Martin (2008.), Speech and Language Processing (2nd edition), Prentice Hall
Vladimir Cherkassky, Yunqian Ma (2011.), Introduction to Predictive Learning, Springer

General

ID 34477
  Winter semester
4 ECTS
L2 English Level
L1 e-Learning
30 Lectures
0 Exercises
0 Laboratory exercises
0 Project laboratory

Grading System

80 Excellent
70 Very Good
60 Good
50 Acceptable