Analysis of massive data sets

Course Description

Introduction to mining of massive data sets. The MapReduce programming model. Finding similar items. Mining data streams. Link analysis. Finding frequent itemsets. Clustering of massive data sets. Recommendation systems. Mining social-networks graphs. Advertising on the Web. Dimensionality reduction. Large-scale machine learning.

Learning Outcomes

  1. Recognize and understand why certain problem belongs to Big Data category
  2. Apply the MapReduce programming model when faced with certain problems in practice
  3. design and evaluate system for finding similar items in a massive data set
  4. design and evaluate system for finding frequent itemsets in a massive data set
  5. design and evaluate system for node rank among graph represented massive data set
  6. design and evaluate recommendation system
  7. apply the appropriate clustering algorithms in order to identify clusters in a massive data set
  8. apply the appropriate algorithms for processing data streams

Forms of Teaching


Lecturer-driven classroom presentations with live demonstrations of how to implement theoretical concepts in software


Two written exams

Laboratory Work

Several programming assignments that cover the topic of the course. Students individually implement given assignments in recommended language or tool, and periodically demonstrate the progress to teaching assistants.


Individual office hours with lecturers and assistants are organized on student's request.

Grading Method

By decision of the Faculty Council, in the academic year 2019/2020. the midterm exams are cancelled and the points assigned to that component are transferred to the final exam, unless the teachers have reassigned the points and the grading components differently. See the news for each course for information on knowledge rating.
Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 50 % 35 % 0 % 0 %
Attendance 0 % 5 % 0 % 0 %
Mid Term Exam: Written 0 % 30 % 0 %
Final Exam: Written 0 % 30 %
Exam: Written 50 % 100 %

Continuous Assessment: Min (Mid Term Exam: Written + Final Exam: Written + Lecture attendance and oral examination in classroom) = 50 %

Week by Week Schedule

  1. Introduction to Analysis of Massive Data Sets.
  2. The MapReduce Programming Model.
  3. Finding Similar Items in a Massive Data Set.
  4. Finding Frequent Itemsets in a Massive Data Set.
  5. Mining Data Streams.
  6. Computing NodeRank in a Massive Data Set Represented as Graph.
  7. Detecting Communities in Social Network graphs.
  8. Midterm exam.
  9. Finding clusters of similar entities in massive data sets.
  10. Recommendation Systems.
  11. Advanced topics in Recommendation Systems.
  12. Advertising on the Web.
  13. Dimensionality Reduction.
  14. Large-scale Machine Learning.
  15. Final exam.

Study Programmes

University graduate
Computer Science (profile)
Specialization Course (2. semester)
Software Engineering and Information Systems (profile)
Specialization Course (2. semester)
Telecommunication and Informatics (profile)
Specialization Course (2. semester)


Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (2014.), Mining of Massive Datasets, Cambridge University Press
Michael Manoochehri (2013.), Data Just Right, Addison-Wesley
Jiawei Han, Jian Pei, Micheline Kamber (2011.), Data Mining: Concepts and Techniques, Elsevier

Laboratory exercises


ID 147658
  Summer semester
L2 English Level
L1 e-Learning
30 Lectures
0 Exercises
15 Laboratory exercises
0 Project laboratory

Grading System

88 Excellent
75 Very Good
63 Good
50 Acceptable