Distributed Big Data Processing

Data is displayed for academic year: 2023./2024.

Lectures

Course Description

Introduction to Big Data. Distributed storage of Big Data and distributed file systems. MapReduce programming model. MapReduce design patterns. Distributed processing of large textual collections. Efficient search in large textual collections. Analysis of links and large networks. Distributed storage of large collections of structured data. Distributed recommender systems. Distributed Big Data processing based on dataflow programming. Distributed data stream processing in real-time. Distributed machine learning. Distributed analysis of social networks.

Study Programmes

University graduate
[FER3-HR] Audio Technologies and Electroacoustics - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Communication and Space Technologies - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Computational Modelling in Engineering - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Computer Engineering - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Computer Science - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Control Systems and Robotics - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Data Science - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Electrical Power Engineering - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Electric Machines, Drives and Automation - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Electronic and Computer Engineering - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Electronics - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Information and Communication Engineering - profile
Elective Courses (1. semester) (3. semester)
[FER3-HR] Network Science - profile
Elective Courses (3. semester)
Elective Courses of the Profile (3. semester)
[FER3-HR] Software Engineering and Information Systems - profile
Elective Course of the profile (3. semester)
Elective Courses (3. semester)
[FER2-HR] Computer Engineering - profile
Specialization Course (3. semester)
[FER2-HR] Software Engineering and Information Systems - profile
Specialization Course (3. semester)
[FER2-HR] Telecommunication and Informatics - profile
Specialization Course (3. semester)

Learning Outcomes

  1. identify big data characteristics
  2. compare distributed algorithms for big data processing
  3. develop simple algorithms for distributed big data processing
  4. apply open source technologies for distributed big data processing and storage
  5. develop a distributed recommender system
  6. develop a distributed data stream processing system
  7. analyze big networks

Forms of Teaching

Lectures

The classes are organized in two blocks: The first block comprises 7 classes and a midterm exam, while the second comprises 6 classes and a final exam. this makes in total 15 weeks with 2 hours per week.

Independent assignments

Students need to resolve independently practical tasks as preparation for laboratory exercises.

Laboratory

Students need to resolve independently practical tasks for laboratory exercises.

Week by Week Schedule

  1. Performance evaluation of distributed systems, Big data concepts, Failure and recovery
  2. External storage, physical organization, and drives, Distributed DBMS, Data replication and consistency models, Data management
  3. Time and space trade-offs in algorithms, Programming middleware for distributed systems, Performance evaluation of distributed systems, Dynamic analysis of distributed systems (parallelism, synchronisation, and simulation), Multiple simultaneous computations, Parallelism, communication, and coordination, Programming constructs for parallelism, Basic knowledge of parallel decomposition concepts, Core distributed algorithms, Parallel algorithmic patterns (divide-and-conquer, map and reduce, master-workers, others)
  4. Time and space trade-offs in algorithms, Strategies for choosing the appropriate data structure, Programming middleware for distributed systems, Performance evaluation of distributed systems, Dynamic analysis of distributed systems (parallelism, synchronisation, and simulation), Multiple simultaneous computations, Parallelism, communication, and coordination, Programming constructs for parallelism, Core distributed algorithms, Parallel algorithmic patterns (divide-and-conquer, map and reduce, master-workers, others)
  5. Information retrieval models (vector space, probabilistic, Boolean)
  6. Web search (PageRank and HITS)
  7. Strategies for choosing the appropriate data structure, External storage, physical organization, and drives, NoSQL databases
  8. Midterm exam
  9. Multiple simultaneous computations, Basic knowledge of parallel decomposition concepts, Core distributed algorithms
  10. Time series and sequences mining, Lazy evaluation and infinite streams, Transmission Control Protocol (TCP) server and client; Concurrency; Application protocols based on TCP; Hypertext Transfer Protocol (HTTP) and File Transfer Protocol (FTP); Simple HTTP server, Case studies focused on Java network programming and network programming in Python
  11. Time series and sequences mining, Lazy evaluation and infinite streams
  12. Programming middleware for distributed systems, Big data concepts, Data management
  13. Visit
  14. Clustering, Centrality, Degree distributions, Degree correlations, Community structure diameter, Structure of social network graphs, Social network analysis
  15. Final exam

Literature

(.), Tom White (2015.), Hadoop: The Definitive Guide, "O'Reilly Media, Inc.",
(.), Donald Miner, Adam Shook (2012.), MapReduce Design Patterns, "O'Reilly Media, Inc.",
(.), Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia (2015.), Learning Spark, "O'Reilly Media, Inc.",
(.), Jimmy Lin, Chris Dyer (2010.), Data-intensive Text Processing with MapReduce, Morgan & Claypool Publishers,
(.), Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (2014.), Mining of Massive Datasets, Cambridge University Press,
(.), Michael Manoochehri (2013.), Data Just Right, Addison-Wesley,

For students

General

ID 222765
  Winter semester
5 ECTS
L1 English Level
L1 e-Learning
30 Lectures
0 Seminar
0 Exercises
13 Laboratory exercises
0 Project laboratory
0 Physical education excercises

Grading System

Excellent
Very Good
Good
Sufficient