Introduction to Data Science

Data is displayed for academic year: 2023./2024.

Lectures

Exercises

Laboratory exercises

Course Description

This course introduces the students to five key facets of a data-based research, where data are obtained by observation: (1) data wrangling, cleaning, and sampling to obtain a suitable data set, (2) data management to facilitate efficient access to data, (3) exploratory data analysis to generate hypotheses and intuition, (4) prediction based on statistical methods such as regression and classification, and (5) communication of results through visualization, stories, and interpretable summaries.

Study Programmes

University graduate
[FER3-EN] Data Science - profile
Core-elective courses (1. semester)

Learning Outcomes

  1. Use Python and other tools to scrape, clean, and process data
  2. Use data management techniques to store data locally and in cloud infrastructures
  3. Use statistical methods and visualization to quickly explore data
  4. Apply statistics and computational analysis to make predictions based on data
  5. Describe the outcome of data analysis using descriptive statistics and visualizations
  6. Use cluster and cloud infrastructure to perform data-intensive computation

Forms of Teaching

Lectures

Lectures - theory

Exercises

Examples in jupyter notebook

Laboratory

Organized as a project - students work on a data science project

Grading Method

Continuous Assessment Exam
Type Threshold Percent of Grade Threshold Percent of Grade
Seminar/Project 25 % 40 % 25 % 40 %
Final Exam: Written 50 % 60 %
Comment:

final exam, data analysis task on a computer

Week by Week Schedule

  1. Course administration. Overview of the data science field. Supporting technologies for data science. Auditory exercises: introduction to Pandas.
  2. Data handling, where data are obtained by observation: data acquisition, data models, common dataset issues, data reshaping, data cleanup. Auditory exercises: data handling and feature engineering in Python. Project: studying suggested scientific papers.
  3. Data visualization: various graphs for dataset visualization, best practice for data visualization, visualization for special purposes, visualization tools. Auditory exercises: data visualization in Python. Project: studying interdisciplinary scientific papers, selecting one for replication of results.
  4. First view of data: descriptive and inferential statistics. Auditory exercises: descriptive statistics in Python. Project: consultation with assistant regarding the selected scientific paper.
  5. Data annotations and metrics. Auditory exercises: data annotations and metrics. Project: work on replicating the results.
  6. Data acquisition through research: types of research studies and data acquisition methods. Project: work on replicating the results.
  7. Applied linear regression in descriptive data analysis. Data transformation. Linear regression assumptions. Auditory exercises: introduction to regression analysis. Project: work on replicating the results.
  8. --
  9. Applied supervised machine learning: classification and prediction. Auditory exercises: applied supervised machine learning in Python. Project: finishing work on replicating the results.
  10. Applied unsupervised machine learning: clustering. Auditory exercises: applied unsupervised machine learning in Python. Project: forming a team to improve the results of scientific paper, consultations with assistant.
  11. Introduction to deep learning (neural networks, loss, invariance and equivariance, convolutional neural networks, recurrent networks). Auditory exercises: deep learning in Python. Project: team work on improving the results.
  12. Text handling (text, feature vectors, bag of words, tokenisation, stop words, n-grams, TF/IDF, attention). Auditory exercises: working with text data in Python. Project: team work on improving the results.
  13. Handling graphs and networks (nodes and edges, directed and undirected graphs, centrality measures, Graph convolutional networks). Auditory exercises: working with graph data in Python. Project: team work on improving the results.
  14. Project presentations.
  15. Final exam.

Literature

Jacob T. Vanderplas, Jake VanderPlas (2016.), Python Data Science Handbook, O'Reilly Media
Matt Harrison, Theodore Petrou (2020.), Pandas 1.x Cookbook, Packt Publishing Ltd
Alice Zheng, Amanda Casari (2018.), Feature Engineering for Machine Learning, "O'Reilly Media, Inc."
François Chollet (2021.), Deep Learning with Python, Second Edition, Simon and Schuster

For students

General

ID 240721
  Winter semester
5 ECTS
L2 English Level
L1 e-Learning
45 Lectures
0 Seminar
15 Exercises
15 Laboratory exercises
0 Project laboratory
0 Physical education excercises

Grading System

88 Excellent
75 Very Good
63 Good
50 Sufficient