Text Analysis and Retrieval
Course Description
General Competencies
Familiarity with the basic language processing tasks, document representation models, methods for document retrieval, classification, and clustering, as well as semantic search techniques. Familiarity with basic information extraction methods, text mining, and document visualization techniques. Familiarity with the evaluation of information retrieval systems. Understanding of the theoretical foundations of these methods as well as their limitations, advantages, and disadvantages. Familiarity with the tools and frameworks for language processing, text mining, and document retrieval. The ability to design, implement, and evaluate a simple full-text retrieval and analysis system. Familiarity with the applications, best practices, trends, and challenges in the field of text analysis and retrieval.
Learning Outcomes
- Summarize the application areas, trends, and challenges in text analysis and retrieval
- Describe the fundamental techniques of text analysis and retrieval
- Use linguistic preprocessing tools
- Design and implement a text analysis/retrieval system
- Apply machine learning algorithms to text analysis tasks
- Evaluate a text analysis/retrieval system
- Formulate and write a system description paper
- Describe, review, analyze, and criticize the main text analysis methods present in scientific papers
Forms of Teaching
Two hours lecture per week for 13 weeks. Lectures include the presentation of the teaching material, discussions, and group work.
ExamsContinuous assessment consisting of a midterm exam, a final exam, one reading assignment, and one project assignment.
Seminars6-8 reading assignments.
Other Forms of Group and Self StudyOne group project assignment.
OtherAdditional study at home is required.
Grading Method
Continuous Assessment | Exam | |||||
---|---|---|---|---|---|---|
Type | Threshold | Percent of Grade | Threshold | Percent of Grade | ||
Homeworks | 0 % | 25 % | 0 % | 0 % | ||
Seminar/Project | 25 % | 50 % | 0 % | 50 % | ||
Mid Term Exam: Written | 0 % | 25 % | 0 % | |||
Exam: Written | 50 % | 50 % |
Week by Week Schedule
- Introduction: motivation and applications, examples of successful systems, literature overview, overview of the existing tools.
- Basics of natural language processing.
- Basics of information retrieval.
- Web search, advanced information retrieval, information retrieval evaluation.
- Machine learning for natural language processing.
- Text classification, clustering, and latent semantic models.
- Word embeddings and neural networks for natural language processing.
- Midterm exam.
- Information extraction and applications.
- Question answering systems.
- Semantic textual similarity, summarization, and simplification.
- Sentiment analysis.
- Authorship analysis.
- Extra topic. Summary and suggestions for further study.
- Final exam.