Analysis of Massive Datasets

Course Description

An introduction to the analysis of large datasets. MapReduce software model. Finding similar entities. Data Flow Analysis. Analysis of links in data presented by graphs. Finding frequent gatherings. Finding groups in large datasets. Recommendation systems. Social Network Graph Analysis. Web Advertising Models. Dimensionality reduction. Machine learning with proportional growth.

Learning Outcomes

  1. identify and understand why a problem belongs to the Big Data category
  2. apply the MapReduce programming model when encountering certain types of problems
  3. design and evaluate a system for finding similar entities in a large data set
  4. design and evaluate a system for finding frequent sets in a large data set
  5. design and evaluate a node ranking system for a very large data set represented by a graph
  6. design and evaluate a recommendation system
  7. apply appropriate algorithms to find groups in a large set of falls
  8. apply appropriate algorithms to process data flows

Forms of Teaching

Lectures

Exercises

Laboratory

Week by Week Schedule

  1. Locality-sensitive hashing (LSH), minhash and simhash algorithms
  2. Locality-sensitive hashing (LSH), minhash and simhash algorithms
  3. Graph mining
  4. Web search (PageRank and HITS)
  5. Data mining with Map-Reduce, Feature selection (filter methods, subset selection, wrapper method)
  6. Data stream mining
  7. Data stream mining
  8. Midterm exam
  9. Time series and sequences mining
  10. Collaborative filtering and recommender engines
  11. Clustering algorithms for large datasets (BFR, CURE)
  12. Sampling, filtering and estimating data stream moments
  13. Large-scale algorithms for mining frequent item sets (Apriori, PCY, SON)
  14. Detecting communities in large graphs (Girvan-Newman, Affiliation-Graph Model)
  15. Final exam

Study Programmes

University graduate
Audio Technologies and Electroacoustics (profile)
Free Elective Courses (2. semester)
Communication and Space Technologies (profile)
Free Elective Courses (2. semester)
Computational Modelling in Engineering (profile)
Free Elective Courses (2. semester)
Computer Engineering (profile)
Free Elective Courses (2. semester)
Computer Science (profile)
Core-elective courses (2. semester) Specialization Course (2. semester)
Control Systems and Robotics (profile)
Free Elective Courses (2. semester)
Data Science (profile)
Elective Coursesof the Profile (2. semester)
Electrical Power Engineering (profile)
Free Elective Courses (2. semester)
Electric Machines, Drives and Automation (profile)
Free Elective Courses (2. semester)
Electronic and Computer Engineering (profile)
Free Elective Courses (2. semester)
Electronics (profile)
Free Elective Courses (2. semester)
Information and Communication Engineering (profile)
Free Elective Courses (2. semester)
Network Science (profile)
Free Elective Courses (2. semester)
Software Engineering and Information Systems (profile)
Core-elective courses (2. semester) Specialization Course (2. semester)
Telecommunication and Informatics (profile)
Specialization Course (2. semester)

Literature

(.), Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (2014.), Mining of Massive Datasets, Cambridge University Press,
(.), Michael Manoochehri (2013.), Data Just Right, Addison-Wesley,
(.), Jiawei Han, Jian Pei, Micheline Kamber (2011.), Data Mining: Concepts and Techniques, Elsevier,

Associate Lecturers

For students

General

ID 222459
  Summer semester
5 ECTS
L3 English Level
L1 e-Learning
45 Lectures
5 Laboratory exercises