Analysis of Massive Datasets
Data is displayed for the academic year: 2025./2026.
Laboratory exercises
Course Description
An introduction to the analysis of large datasets. Finding similar entities. Data Flow Analysis. Analysis of links in data presented by graphs. Finding frequent gatherings. Finding groups in large datasets. Recommendation systems. Social Network Graph Analysis. Web Advertising Models. Dimensionality reduction. Scalable Machine Learning.
Prerequisites
computer programming, algorithms and data structures, basic probability theory, basic linear algebra
Study Programmes
University graduate
[FER3-HR] Audio Technologies and Electroacoustics - profile
Elective Courses
(2. semester)
[FER3-HR] Communication and Space Technologies - profile
Elective Courses
(2. semester)
[FER3-HR] Computer Engineering - profile
Elective Courses
(2. semester)
[FER3-HR] Computer Science - profile
Core-elective courses
(2. semester)
[FER3-HR] Control Systems and Robotics - profile
Elective Courses
(2. semester)
[FER3-HR] Data Science - profile
Elective Courses
(2. semester)
Elective Courses of the Profile
(2. semester)
[FER3-HR] Electrical Power Engineering - profile
Elective Courses
(2. semester)
[FER3-HR] Electric Machines, Drives and Automation - profile
Elective Courses
(2. semester)
[FER3-HR] Electronic and Computer Engineering - profile
Elective Courses
(2. semester)
[FER3-HR] Electronics - profile
Elective Courses
(2. semester)
[FER3-HR] Information and Communication Engineering - profile
Elective Courses
(2. semester)
[FER3-HR] Network Science - profile
Elective Courses
(2. semester)
[FER3-HR] Software Engineering and Information Systems - profile
Core-elective courses
(2. semester)
[FER2-HR] Computer Science - profile
Specialization Course
(2. semester)
Learning Outcomes
- identify and understand why a problem belongs to the Big Data category
- apply the MapReduce programming model when encountering certain types of problems
- design and evaluate a system for finding similar entities in a large data set
- design and evaluate a system for finding frequent sets in a large data set
- design and evaluate a node ranking system for a very large data set represented by a graph
- design and evaluate a recommendation system
- apply appropriate algorithms to find groups in a large set of falls
- apply appropriate algorithms to process data flows
Forms of Teaching
Lectures
Lecturer-driven classroom presentations of theoretical concepts.
ExercisesExamples and problem solving during lectures.
SeminarsSoftware implementation of selected massive dataset analysis methods. Students individually implement given assignment in a recommended programming language or tool, and submit their solutions to automatic online evaluation.
Grading Method
| Continuous Assessment | Exam | |||||
|---|---|---|---|---|---|---|
| Type | Threshold | Percent of Grade | Threshold | Percent of Grade | ||
| Laboratory Exercises | 50 % | 30 % | 50 % | 0 % | ||
| Class participation | 0 % | 10 % | 0 % | 0 % | ||
| Mid Term Exam: Written | 50 % | 30 % | 0 % | |||
| Final Exam: Written | 50 % | 30 % | ||||
| Exam: Written | 50 % | 100 % | ||||
| Exam: Oral | 100 % | |||||
Week by Week Schedule
- Locality-sensitive hashing (LSH), minhash and simhash algorithms
- Locality-sensitive hashing (LSH), minhash and simhash algorithms
- Graph mining
- Web search (PageRank and HITS)
- Data mining with Map-Reduce, Feature selection (filter methods, subset selection, wrapper method)
- Data stream mining
- Data stream mining
- Midterm exam
- Time series and sequences mining
- Collaborative filtering and recommender engines
- Clustering algorithms for large datasets (BFR, CURE)
- Sampling, filtering and estimating data stream moments
- Large-scale algorithms for mining frequent item sets (Apriori, PCY, SON)
- Detecting communities in large graphs (Girvan-Newman, Affiliation-Graph Model)
- Final exam
Literature
(.), Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (2014.), Mining of Massive Datasets, Cambridge University Press,
(.), Michael Manoochehri (2013.), Data Just Right, Addison-Wesley,
(.), Jiawei Han, Jian Pei, Micheline Kamber (2011.), Data Mining: Concepts and Techniques, Elsevier,
General
ID 284075
Summer semester
5 ECTS
L1 e-Learning
45 Lectures
0 Seminar
0 Exercises
15 Laboratory exercises
0 Project laboratory
0 Physical education excercises
Grading System
88 Excellent
75 Very Good
63 Good
50 Sufficient
Pristupačnost