Computer Systems Reliability

HrvatskihrEnglishen

Pristupačnost

Text size:A A

Page layout Normal Simple

Page contrast Normal High contrast Invert high contrast

Clear all

Computer Systems Reliability

Data is displayed for academic year: 2023./2024.

Lecturers

Assoc. Prof.

Vlado Sruk

Course Description

Widespread use of computer systems and a dependence on their services makes them an unavoidable part of our lives. It is of great importance to limit the damage caused by their failures to acceptable levels. This course will study concepts, methods and techniques of design, implementation and analysis of reliability, availability and fault tolerance of computer systems hardware and software. The objective is to develop understanding of impairments on the computer systems? dependability, means to improve systems and their attributes, and perform evaluations. The emphasis is placed on the study of reliability and fault-toleance of computer systems; on developing ability to apply basic principles to building improved real systems, and on introducing tools for analysis and evaluation of systems attributes.

Study Programmes

University graduate

[FER2-HR] Computer Engineering - profile

Theoretical Course (2. semester)

General Competencies

Students will develop a systematic understanding of basic concepts, methods and techniques for designing, implementation and evaluation of reliability, availability and fault-tolerance of hardware and software systems. They will gain an understanding of computer systems failure models, fault detection, fault masking, fault recovery strategies and testing. Students will be able to apply different approaches to improve and evaluate reliability, availability and fault tolerance as well as carry self-directed conclusions and applying fault-tolerant techniques to different problem areas. Also, they will become capable of expanding theoretical and practical knowledge through studying the new methodologies and developing critiques of them.

Learning Outcomes

Describe the principles and theory of computer hardware and software reliability.
Predict computer system faults.
Predict computer system dependability.
Apply probabilistic dependability analysis of fault-tolerant computer systems.
Apply software reliability techniques.
Design and evaluate system architectures for fault-tolerant computer systems.

Forms of Teaching

Lectures

This course will consist of three 45-minute lectures per week. Lectures will emphasize main concepts illustrated with examples, solutions and topic discussions.

Exams

There will be two exams - a mid-term (20% of final grade) and a final (45%). Homework assignments, short quizzes and class participations are gradaded as well.

Consultations

Consultation with the instructor will be avaliable in predefined terms and e-lerning system.

Internship visits

Students will visit some computing centre and will be introduced with specific dependability implementation approaches.

Grading Method

	Continuous Assessment		Exam
Type	Threshold	Percent of Grade	Threshold	Percent of Grade
Homeworks	60 %	25 %	60 %	25 %
Quizzes	0 %	4 %	0 %	0 %
Class participation	0 %	6 %	0 %	0 %
Mid Term Exam: Written	50 %	20 %	0 %
Final Exam: Written	50 %	45 %
Exam: Written			50 %	55 %
Exam: Oral				20 %

Week by Week Schedule

Introduction. Motivation for the course. Basic principles, examples and terminology. Dependability, Reliability, Availability definitions. Faults, Errors, and Failure.
Fault and Error Models. Failure process. Fault handling.
Digital system testing. Simulations. Design for Testability. Built-In Test, Built-In Self-Test.
Reliability Theory. Reliability Evaluation Methods. Failures rate, Mean Time to Failure, Mean Time to Repair. Combinatorial Modeling. RBD. MonteCarlo simulation.
Reliability, Availability, and Safety modeling using Markov models. Failure Mode and Effects Analysis.
Reliability improvement techniques. Fault tolerant design techniques. Hardware redundancy approaches.
Midterm exam
Repairable Systems. Standby Systems. Discussions.
Time redundancy. Detecting and tolerating transient and permanent faults. Information redundancy. Error Detecting and Correcting Codes.
Software Redundancy. Software Error Models. N-version programming, Recovery blocks.
Software failure models, prediction of software failure intensities, impact of software failures on systems behaviour.
Fault-tolerance in distributed systems. Byzantine failure model.
High availability computer systems and services. Maintenance models.
Experimental analysis of systems reliability and availability. Design methodology. Discussions.
Final exam

Literature

D.P. Siewiorek, R.S. Swarz (1998.), Reliable Computer Systems: Design and Evaluation, AK Peters, Ltd.

M.L. Shooman (2002.), Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design, J. Wiley & Sons

H. Pham (2000.), Software Reliability, Springer

M.Xie, J.S. Dai, K.L. Poh (2004.), Computing System Reliability: Models and Analysis, Kluwer Academic

M. Rausand, A. Hoyland (2004.), System Reliability Theory: Models, Statistical Methods, and Applications, J. Wiley & Sons

For students

General

ID 34505

Summer semester

5 ECTS

L2 English Level

L1 e-Learning

45 Lectures

0 Seminar

0 Exercises

0 Laboratory exercises

0 Project laboratory

0 Physical education excercises

Grading System

85 Excellent

72 Very Good

60 Good

50 Sufficient

Fault-Tolerant Computer Systems, Chalmers University

Vorlesung und Übung Fehlertolerante Systeme, TU Wien

Dependable Systems, TU Berlin

Computer Systems Reliability

Lecturers

Course Description

Study Programmes

University graduate

General Competencies

Learning Outcomes

Forms of Teaching

Grading Method

Week by Week Schedule

Literature

For students

General

Grading System

Similar Courses