Distributed Big Data Processing
- identify big data characteristics
- compare distributed algorithms for big data processing
- develop simple algorithms for distributed big data processing
- apply open source technologies for distributed big data processing and storage
- develop a distributed recommender system
- develop a distributed data stream processing system
- analyze big networks
Forms of Teaching
During lectures, theoretical aspects of the distributed storage and processing of the Big Data will be explained and discussed on various examples and different datasets.Exams
Midterm exam (week 8) and final exam (week 15).Laboratory Work
During laboratory exercises, students will solve several short practical assignments in Java and required opensource technologies (Apache Hadoop, Apache Lucene, Apache Mahout and Apache Spark) and discuss the solutions.
|Type||Threshold||Percent of Grade||Threshold||Percent of Grade|
|Laboratory Exercises||0 %||40 %||0 %||40 %|
|Homeworks||0 %||10 %||0 %||10 %|
|Attendance||0 %||10 %||0 %||10 %|
|Mid Term Exam: Written||0 %||20 %||0 %|
|Final Exam: Written||0 %||20 %|
|Exam: Written||50 %||40 %|
Week by Week Schedule
- Introduction to the distributed Big Data processing.
- Distributed Big Data Storage. Distributed File Systems.
- Map-reduce Programming Model.
- Basic Design Patterns in the Map-reduce Programming Model.
- Advanced Design Patterns in the Map-reduce Programming Model.
- Distributed Storage of the Structured Big Data.
- Distributed Recommender Systems.
- 1. midexam
- 1. midexam
- Real-time Data Stream Processing.
- Real-time Data Stream Processing. (2)
- Efficient Search in Large Textual Collections.
- Efficient Search in Large Textual Collections. (2)
- Link and Large Network Analysis.
- Distributed Analysis of Social Networks