Popis predmeta

Course Description

The course starts from the fundamentals of statistical programming through the description of standard programming elements - data types, packages and data structures, designing user-defined functions and objects. After that, we describe how to import data from different sources and prepare them for analysis - transformation and tidying of data, managing missing values, deriving new variables from existing ones, managing date / time and textual type of data. The basics of statistical and exploratory analysis of data sets are learned. The concept of grammar of graphics and ways of designing professional visualizations are discussed. Knowledge of managing different types of distributions is acquired as well as basic ways of creating simulations. Knowledge is gained how to implement chosen machine learning methods. The programmatic approach to data mining is mastered - sampling, separation into training and test sets, creation and evaluation of predictive and descriptive models.

Learning Outcomes

  1. analyze small and large data sets in a meaningful and organized manner
  2. identify the nature of the data and the nature of its processing
  3. use the interactive programming approach to data analysis
  4. modify the raw data into a form suitable for analysis
  5. prepare complex functions and packages
  6. create professional visualizations of datasets
  7. apply machine learning methods in the programming environment
  8. apply the methodology of preparing reports

Forms of Teaching

Lectures

Lectures in the classroom with prepared digital workbooks

Laboratory

Solving digital workbooks, solving programming tasks

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 10 % 20 % 10 % 0 %
Homeworks 5 % 10 % 5 % 5 %
Class participation 0 % 10 % 0 % 0 %
Seminar/Project 5 % 20 % 5 % 20 %
Mid Term Exam: Written 5 % 15 % 0 %
Final Exam: Written 10 % 25 %
Exam: Written 50 % 75 %

Week by Week Schedule

  1. Basic syntax and semantics of higher level languages, Variables and simple data types (eg numbers. characters. logical values), Expressions and assignments. The notion of missing values.
  2. Complex data structures - vectors, matrices and lists. The principle of vectorization and recycling. Index operator. Location, logical and nominal referencing of elements in complex structures.
  3. Data frames as the main structure for storing datasets. Internal representation of data frames. Categorical variables.
  4. Program flow control commands - conditional execution and loops.
  5. Built-in functions. The notion of search path, lexical scope and environment. User-defined functions. Functional programming. Declarative alternatives to programming loops.
  6. Object-oriented programming in the context of statistical programming and data analysis environment.
  7. Pipeline operator and code chaining. The notion of tidy data. Preparation of data for analysis in the context of rough data transformations and data reshaping into a tidy format.
  8. Midterm exam
  9. Dates and timestamps. The notion of temporal data in the context of data analysis. Character strings and string processing. Regular expressions and text analysis.
  10. Methods for data management and exploratory analysis. Procedural equivalents of language commands for retrieving relational data. Set operations. Missing value management.
  11. Basic elements of grammar of graphics. Data visualization. The notion of aesthetics and geometry in the context of visualization.
  12. Programming methods for descriptive and inferential statistics. Simulations.
  13. Selected machine learning methods - linear regression, kNN classification.
  14. Introduction to predictive modeling. Training and testing dataset splits. Cross-validation methods. Declarative approach to the development and evaluation of predictive models.
  15. Final exam

Study Programmes

University graduate
Audio Technologies and Electroacoustics (profile)
Free Elective Courses (1. semester) (3. semester)
Communication and Space Technologies (profile)
Free Elective Courses (1. semester) (3. semester)
Computational Modelling in Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Computer Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Computer Science (profile)
Free Elective Courses (1. semester) (3. semester)
Control Systems and Robotics (profile)
Free Elective Courses (1. semester) (3. semester)
Data Science (profile)
Elective Courses of the Profile (3. semester) Free Elective Courses (1. semester)
Electrical Power Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Electric Machines, Drives and Automation (profile)
Free Elective Courses (1. semester) (3. semester)
Electronic and Computer Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Electronics (profile)
Free Elective Courses (1. semester) (3. semester)
Information and Communication Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Network Science (profile)
Elective Courses of the Profile (1. semester) (3. semester)
Software Engineering and Information Systems (profile)
Elective Course of the profile (3. semester) Elective Course of the Profile (1. semester)

Literature

(.), Programirajmo u R-u,
(.), R for Data Science,
(.), Openintro Statistics,
(.), Introduction to Statistical Learning,
(.), Advanced R,

For students

General

ID 222597
  Winter semester
5 ECTS
L3 English Level
L2 e-Learning
45 Lectures
15 Laboratory exercises

Grading System

87 Excellent
75 Very Good
62 Good
50 Acceptable

Learning Outcomes

  1. analyze small and large data sets in a meaningful and organized manner
  2. identify the nature of the data and the nature of its processing
  3. use the interactive programming approach to data analysis
  4. modify the raw data into a form suitable for analysis
  5. prepare complex functions and packages
  6. create professional visualizations of datasets
  7. apply machine learning methods in the programming environment
  8. apply the methodology of preparing reports

Forms of Teaching

Lectures

Lectures in the classroom with prepared digital workbooks

Laboratory

Solving digital workbooks, solving programming tasks

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 10 % 20 % 10 % 0 %
Homeworks 5 % 10 % 5 % 5 %
Class participation 0 % 10 % 0 % 0 %
Seminar/Project 5 % 20 % 5 % 20 %
Mid Term Exam: Written 5 % 15 % 0 %
Final Exam: Written 10 % 25 %
Exam: Written 50 % 75 %

Week by Week Schedule

  1. Basic syntax and semantics of higher level languages, Variables and simple data types (eg numbers. characters. logical values), Expressions and assignments. The notion of missing values.
  2. Complex data structures - vectors, matrices and lists. The principle of vectorization and recycling. Index operator. Location, logical and nominal referencing of elements in complex structures.
  3. Data frames as the main structure for storing datasets. Internal representation of data frames. Categorical variables.
  4. Program flow control commands - conditional execution and loops.
  5. Built-in functions. The notion of search path, lexical scope and environment. User-defined functions. Functional programming. Declarative alternatives to programming loops.
  6. Object-oriented programming in the context of statistical programming and data analysis environment.
  7. Pipeline operator and code chaining. The notion of tidy data. Preparation of data for analysis in the context of rough data transformations and data reshaping into a tidy format.
  8. Midterm exam
  9. Dates and timestamps. The notion of temporal data in the context of data analysis. Character strings and string processing. Regular expressions and text analysis.
  10. Methods for data management and exploratory analysis. Procedural equivalents of language commands for retrieving relational data. Set operations. Missing value management.
  11. Basic elements of grammar of graphics. Data visualization. The notion of aesthetics and geometry in the context of visualization.
  12. Programming methods for descriptive and inferential statistics. Simulations.
  13. Selected machine learning methods - linear regression, kNN classification.
  14. Introduction to predictive modeling. Training and testing dataset splits. Cross-validation methods. Declarative approach to the development and evaluation of predictive models.
  15. Final exam

Study Programmes

University graduate
Audio Technologies and Electroacoustics (profile)
Free Elective Courses (1. semester) (3. semester)
Communication and Space Technologies (profile)
Free Elective Courses (1. semester) (3. semester)
Computational Modelling in Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Computer Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Computer Science (profile)
Free Elective Courses (1. semester) (3. semester)
Control Systems and Robotics (profile)
Free Elective Courses (1. semester) (3. semester)
Data Science (profile)
Elective Courses of the Profile (3. semester) Free Elective Courses (1. semester)
Electrical Power Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Electric Machines, Drives and Automation (profile)
Free Elective Courses (1. semester) (3. semester)
Electronic and Computer Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Electronics (profile)
Free Elective Courses (1. semester) (3. semester)
Information and Communication Engineering (profile)
Free Elective Courses (1. semester) (3. semester)
Network Science (profile)
Elective Courses of the Profile (1. semester) (3. semester)
Software Engineering and Information Systems (profile)
Elective Course of the profile (3. semester) Elective Course of the Profile (1. semester)

Literature

(.), Programirajmo u R-u,
(.), R for Data Science,
(.), Openintro Statistics,
(.), Introduction to Statistical Learning,
(.), Advanced R,

For students

General

ID 222597
  Winter semester
5 ECTS
L3 English Level
L2 e-Learning
45 Lectures
15 Laboratory exercises

Grading System

87 Excellent
75 Very Good
62 Good
50 Acceptable