Statistical Programming Fundamentals

Data is displayed for academic year: 2023./2024.

Laboratory exercises

Course Description

The course starts from the fundamentals of statistical programming through the description of standard programming elements - data types, packages and data structures, designing user-defined functions and objects. After that, we describe how to import data from different sources and prepare them for analysis - transformation and tidying of data, managing missing values, deriving new variables from existing ones, managing date / time and textual type of data. The basics of statistical and exploratory analysis of data sets are learned. The concept of grammar of graphics and ways of designing professional visualizations are discussed. Knowledge of managing different types of distributions is acquired as well as basic ways of creating simulations. Knowledge is gained how to implement chosen machine learning methods. The programmatic approach to data mining is mastered - sampling, separation into training and test sets, creation and evaluation of predictive and descriptive models.

Study Programmes

University undergraduate
[FER3-EN] Computing - study
Elective Courses (5. semester)
[FER3-EN] Electrical Engineering and Information Technology - study
Elective Courses (5. semester)
University graduate
[FER3-EN] Control Systems and Robotics - profile
Elective courses (1. semester)
[FER3-EN] Data Science - profile
Elective courses (1. semester)
[FER3-EN] Electrical Power Engineering - profile
Elective courses (1. semester)

Learning Outcomes

  1. analyze small and large data sets in a meaningful and organized manner
  2. identify the nature of the data and the nature of its processing
  3. use the interactive programming approach to data analysis
  4. modify the raw data into a form suitable for analysis
  5. prepare complex functions and packages
  6. create professional visualizations of datasets
  7. apply machine learning methods in the programming environment
  8. apply the methodology of preparing reports

Forms of Teaching

Lectures

Lectures in the classroom with prepared digital workbooks

Laboratory

Solving digital workbooks, solving programming tasks

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Laboratory Exercises 50 % 20 % 50 % 0 %
Homeworks 50 % 10 % 50 % 10 %
Class participation 0 % 10 % 0 % 0 %
Seminar/Project 25 % 20 % 25 % 20 %
Mid Term Exam: Written 5 % 15 % 0 %
Final Exam: Written 10 % 25 %
Exam: Written 50 % 70 %

Week by Week Schedule

  1. Basic syntax and semantics of higher level languages, Variables and simple data types (eg numbers. characters. logical values), Expressions and assignments. The notion of missing values.
  2. Complex data structures - vectors, matrices and lists. The principle of vectorization and recycling. Index operator. Location, logical and nominal referencing of elements in complex structures.
  3. Data frames as the main structure for storing datasets. Internal representation of data frames. Categorical variables.
  4. Program flow control commands - conditional execution and loops.
  5. Built-in functions. The notion of search path, lexical scope and environment. User-defined functions. Functional programming. Declarative alternatives to programming loops.
  6. Object-oriented programming in the context of statistical programming and data analysis environment.
  7. Pipeline operator and code chaining. The notion of tidy data. Preparation of data for analysis in the context of rough data transformations and data reshaping into a tidy format.
  8. Midterm exam
  9. Dates and timestamps. The notion of temporal data in the context of data analysis. Character strings and string processing. Regular expressions and text analysis.
  10. Methods for data management and exploratory analysis. Procedural equivalents of language commands for retrieving relational data. Set operations. Missing value management.
  11. Basic elements of grammar of graphics. Data visualization. The notion of aesthetics and geometry in the context of visualization.
  12. Programming methods for descriptive and inferential statistics. Simulations.
  13. Selected machine learning methods - linear regression, kNN classification.
  14. Introduction to predictive modeling. Training and testing dataset splits. Cross-validation methods. Declarative approach to the development and evaluation of predictive models.
  15. Final exam

Literature

(.), Programirajmo u R-u,
(.), R for Data Science,
(.), Openintro Statistics,
(.), Introduction to Statistical Learning,
(.), Advanced R,

For students

General

ID 223076
  Winter semester
5 ECTS
L3 English Level
L2 e-Learning
45 Lectures
0 Seminar
0 Exercises
15 Laboratory exercises
0 Project laboratory
0 Physical education excercises

Grading System

87 Excellent
75 Very Good
62 Good
50 Sufficient