Text Analysis and Retrieval

Course Description

Most human knowledge is stored in unstructured, textual format. Due to the vast and rapidly growing amount of text data available, text analysis and retrieval systems have become an indispensable part of modern ICT infrastructure. Such systems address diverse information needs of the users and enable the extraction of information from large volumes of unstructured data. Because of the complexity and ambiguity of natural language, text analysis is a non-trivial task, which relies on natural language processing, computational linguistics, and machine learning. This course provides a systematic overview of both traditional and advanced methods for text analysis and retrieval. The first part of the course deals with the fundamentals of information retrieval and natural language processing techniques relevant for text analysis. The second part deals with applications in text analysis and retrieval, with an emphasis on methods based on statistical natural language processing and machine learning.

Learning Outcomes

  1. Summarize the application areas, trends, and challenges in text analysis and retrieval
  2. Describe the fundamental techniques of text analysis and retrieval
  3. Use linguistic preprocessing tools
  4. Design and implement a text analysis/retrieval system
  5. Apply machine learning algorithms to text analysis tasks
  6. Evaluate a text analysis/retrieval system
  7. Organize and formulate a system description paper
  8. Describe, review, analyze, and criticize the main text analysis methods present in scientific papers

Forms of Teaching

Lectures

Seminars and workshops

Independent assignments

Laboratory

Other

Week by Week Schedule

  1. Document classification and tagging, Document clustering, Information needs, relevance, evaluation, effectiveness, Applications in information retrieval and text mining
  2. Computational morphology, Part of speech tagging, Deterministic and stochastic grammars, constituency and dependency grammars (CFG, PCFG), Parsing algorithms (CYK, Chart), lexicalized parsing, dependency parsing, Language models, smoothing, evaluation
  3. Text preprocessing (stemming, phrases, stop lists), Information retrieval models (vector space, probabilistic, Boolean), Information needs, relevance, evaluation, effectiveness, Advanced information retrieval techniques (semantic search, faceted search), Web search (PageRank and HITS)
  4. Markov and hidden Markov models, Conditional random fields, Confusion matrix-based performance measures (accuracy, precision, recall, sensitivity, F-score), Multiclass performance measures, Assessing inter-annotator agreement (Cohen's kappa, Fleiss' kappa)
  5. Latent semantic document models (LSI, LDA), Computational semantics (formal semantics, semantic role labeling), Distributional semantic models
  6. Neural natural language processing, Deep recurrent neural networks: RNN, bidirectional RNN, deep RNN, long short-term memory, sequence modelling, applications     
  7. Applications in information retrieval and text mining
  8. Not held
  9. Text information extraction (named entities, keyphrases, relations, etc;), Event detection and tracking
  10. Question answering
  11. Document summarization, multidocument summarization, Textual similarity, paraphrase, and entailment
  12. Textual similarity, paraphrase, and entailment
  13. Sentiment analysis and opinion mining
  14. Authorship analysis and author profiling, Project
  15. Not held

Study Programmes

University graduate
Audio Technologies and Electroacoustics (profile)
Free Elective Courses (2. semester)
Communication and Space Technologies (profile)
Free Elective Courses (2. semester)
Computational Modelling in Engineering (profile)
Free Elective Courses (2. semester)
Computer Engineering (profile)
Free Elective Courses (2. semester)
Computer Science (profile)
Elective Courses of the Profile (2. semester)
Control Systems and Robotics (profile)
Free Elective Courses (2. semester)
Data Science (profile)
Elective Coursesof the Profile (2. semester)
Electrical Power Engineering (profile)
Free Elective Courses (2. semester)
Electric Machines, Drives and Automation (profile)
Free Elective Courses (2. semester)
Electronic and Computer Engineering (profile)
Free Elective Courses (2. semester)
Electronics (profile)
Free Elective Courses (2. semester)
Information and Communication Engineering (profile)
Elective Courses of the Profile (2. semester)
Network Science (profile)
Free Elective Courses (2. semester)
Software Engineering and Information Systems (profile)
Core-elective courses (2. semester)

Literature

(.), Introduction to Information Retrieval,
(.), Foundations of Statistical Natural Language Processing,
(.), Speech and Language Processing,
(.), Neural Network Methods in Natural Language Processing,

For students

General

ID 222452
  Summer semester
5 ECTS
L3 English Level
L1 e-Learning
30 Lectures
15 Laboratory exercises