Text Analysis and Retrieval

HrvatskihrEnglishen

Pristupačnost

Text size:A A

Page layout Normal Simple

Page contrast Normal High contrast Invert high contrast

Clear all

Text Analysis and Retrieval

Data is displayed for academic year: 2023./2024.

Lecturers

Prof.

Jan Šnajder

Laboratory exercises

Ana Barić

univ. mag. ing. comp.

Josip Jukić

mag. ing.

Course Description

Most human knowledge is stored in an unstructured, textual format. Due to the vast and rapidly growing amount of text data available, text analysis and retrieval systems have become an indispensable part of modern ICT infrastructure. Such systems address the diverse information needs of the users and enable the extraction of information from large volumes of unstructured data. Because of the complexity and ambiguity of natural language, text analysis is a non-trivial task that relies on natural language processing, computational linguistics, and machine learning. This course provides a systematic overview of traditional and advanced text analysis and retrieval methods. The first part of the course deals with the fundamentals of information retrieval and natural language processing techniques relevant to text analysis. The second part deals with applications in text analysis and retrieval, with an emphasis on methods based on machine learning and deep learning.

Study Programmes

University graduate

[FER3-EN] Data Science - profile

Recommended elective courses (2. semester)

Learning Outcomes

Summarize the application areas, trends, and challenges in text analysis and retrieval
Describe the fundamental techniques of text analysis and retrieval
Use linguistic preprocessing tools
Design and implement a text analysis/retrieval system
Apply machine learning algorithms to text analysis tasks
Evaluate a text analysis/retrieval system
Organize and formulate a system description paper
Describe, review, analyze, and criticize the main text analysis methods present in scientific papers

Forms of Teaching

Lectures

Weekly two-hour lectures

Independent assignments

Team project focused on the development and evaluation of NLP models

Laboratory

Three programming assignments focused on implementing and testing NLP algorithms

Other

Team project presentation

Week by Week Schedule

Document classification and tagging, Document clustering, Information needs, relevance, evaluation, effectiveness, Applications in information retrieval and text mining
Computational morphology, Part of speech tagging, Deterministic and stochastic grammars, constituency and dependency grammars (CFG, PCFG), Parsing algorithms (CYK, Chart), lexicalized parsing, dependency parsing, Language models, smoothing, evaluation
Text preprocessing (stemming, phrases, stop lists), Information retrieval models (vector space, probabilistic, Boolean), Information needs, relevance, evaluation, effectiveness, Advanced information retrieval techniques (semantic search, faceted search), Web search (PageRank and HITS)
Markov and hidden Markov models, Conditional random fields, Confusion matrix-based performance measures (accuracy, precision, recall, sensitivity, F-score), Multiclass performance measures, Assessing inter-annotator agreement (Cohen's kappa, Fleiss' kappa)
Latent semantic document models (LSI, LDA), Computational semantics (formal semantics, semantic role labeling), Distributional semantic models
Neural natural language processing, Deep recurrent neural networks: RNN, bidirectional RNN, deep RNN, long short-term memory, sequence modelling, applications
Applications in information retrieval and text mining
Not held
Text information extraction (named entities, keyphrases, relations, etc;), Event detection and tracking
Question answering systems
Document summarization, multidocument summarization, Textual similarity, paraphrase, and entailment
Textual similarity, paraphrase, and entailment
Sentiment analysis and opinion mining
Authorship analysis and author profiling
Not heldProject presentations

Literature

(.), Introduction to Information Retrieval,

(.), Foundations of Statistical Natural Language Processing,

(.), Speech and Language Processing,

(.), Neural Network Methods in Natural Language Processing,

For students

General

ID 222925

Summer semester

5 ECTS

L3 English Level

L1 e-Learning

30 Lectures

0 Seminar

0 Exercises

15 Laboratory exercises

0 Project laboratory

0 Physical education excercises

Grading System

89 Excellent

76 Very Good

63 Good

50 Sufficient

Information Retrieval and Search Engines, Katholieke Universiteit Leuven

Natural Language Processing, MIT

Text Analysis and Retrieval

Poll

No polls currently selected on this page!

Text Analysis and Retrieval

Lecturers

Laboratory exercises

Course Description

Study Programmes

University graduate

Learning Outcomes

Forms of Teaching

Week by Week Schedule

Literature

For students

General

Grading System

Similar Courses

Text Analysis and Retrieval

Poll