Analysis of massive data sets

Course Description

Introduction to mining of massive data sets. The MapReduce programming model. Finding similar items. Mining data streams. Link analysis. Finding frequent itemsets. Clustering of massive data sets. Recommendation systems. Mining social-networks graphs. Advertising on the Web. Dimensionality reduction. Large-scale machine learning.

Learning Outcomes

  1. Recognize and understand why certain problem belongs to Big Data category
  2. Apply the MapReduce programming model when faced with certain problems in practice
  3. design and evaluate system for finding similar items in a massive data set
  4. design and evaluate system for finding frequent itemsets in a massive data set
  5. design and evaluate system for node rank among graph represented massive data set
  6. design and evaluate recommendation system
  7. apply the appropriate clustering algorithms in order to identify clusters in a massive data set
  8. apply the appropriate algorithms for processing data streams

Forms of Teaching

Lectures

Lecturer-driven classroom presentations with live demonstrations of how to implement theoretical concepts in software

Exams

Two written exams

Laboratory Work

Several programming assignments that cover the topic of the course. Students individually implement given assignments in recommended language or tool, and periodically demonstrate the progress to teaching assistants.

Consultations

Individual office hours with lecturers and assistants are organized on student's request.

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 50 % 35 % 0 % 0 %
Attendance 0 % 5 % 0 % 0 %
Mid Term Exam: Written 0 % 30 % 0 %
Final Exam: Written 0 % 30 %
Exam: Written 50 % 100 %
Comment:

Continuous Assessment: Min (Mid Term Exam: Written + Final Exam: Written + Lecture attendance and oral examination in classroom) = 50 %

Week by Week Schedule

  1. Introduction to Analysis of Massive Data Sets.
  2. The MapReduce Programming Model.
  3. Finding Similar Items in a Massive Data Set.
  4. Finding Frequent Itemsets in a Massive Data Set.
  5. Mining Data Streams.
  6. Computing NodeRank in a Massive Data Set Represented as Graph.
  7. Detecting Communities in Social Network graphs.
  8. Midterm exam.
  9. Finding clusters of similar entities in massive data sets.
  10. Recommendation Systems.
  11. Advanced topics in Recommendation Systems.
  12. Advertising on the Web.
  13. Dimensionality Reduction.
  14. Large-scale Machine Learning.
  15. Final exam.

Study Programmes

University graduate
Computer Science (profile)
Specialization Course (2. semester)
Software Engineering and Information Systems (profile)
Specialization Course (2. semester)
Telecommunication and Informatics (profile)
Specialization Course (2. semester)

Literature

Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (2014.), Mining of Massive Datasets, Cambridge University Press
Michael Manoochehri (2013.), Data Just Right, Addison-Wesley
Jiawei Han, Jian Pei, Micheline Kamber (2011.), Data Mining: Concepts and Techniques, Elsevier

Lecturers

Laboratory exercises

Grading System

ID 147658
  Summer semester
4 ECTS
L2 English Level
L1 e-Learning
30 Lecturers
0 Exercises
15 Laboratory exercises

General

88 Excellent
75 Very Good
63 Good
50 Acceptable