Things handed out or covered in class:

(The Boilermaker shaded row indicates where we are in the current semester. The material after that is from the last time I taught the class and is for your reference only. It will likely change some in this semester's teaching.)

Title

Start

Date

Document

Notes

Reference

Material

Course Information

Aug 13

html

 

 

Course FAQ

Aug 13

html

 

 

Motivation & Introduction to Fault Tolerance Terms

 

pdf

Lec 1
Lec 2
Lec 3

Bug detection ALA [ pdf ]

Availability calculation [ pdf ] Solution [ pdf ]

Reliability Analysis: Discrete case   pdf Lec 1
Lec 2
Lec 3

Problem sheet [ pdf ]

KST book Chapter 2: pp. 61-93
Reliability Analysis: Continuous case  

pdf - Part 1

pdf - Part 2

Video lecture 1

Video lecture 2

Video lecture 3

Lec 1
Lec 2
Lec 3

Problem sheet [ pdf ]

KST book Chapter 3: pp. 115-165

Calculus cheat sheet [ pdf ]

Case study - LANL failure data [ pdf ]

Hardware  

pdf

Video lecture 1

Video lecture 2

Lec 1
Lec 2
Lec 3
 
Software  

pdf

Video lecture: basics

Video lecture: Process pairs

Video lecture: Robust data structures: structural integrity

Lec 1
Lec 2
"Robust search methods for B-trees" Fujimura and Jalote, FTCS-88. [ Paper ] [ Slides ]
Distributed protocol primitives - broadcast and agreement  

pdf

 

Lec 1
Lec 2
Lec 3
Lec 4

“Fault Tolerance in Distributed Systems” by Pankaj Jalote, Prentice Hall. Chapter 4 – Broadcast. [ pdf ]

“Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 8 – Agreement. [ pdf ]

“Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 13 – Commit Protocols. [ pdf ]

Modeling  

pdf

Stochastic Activity Networks

Lecture 3 (on video) – Stochastic Activity Networks

Lecture 4 (on video) – Stochastic Activity Networks

Lec 1

Markov process: Trivedi book, pp. 337-356

Ken Keefe's presentation and support material for Mobius: URL

Recovery

 

pdf Lec 1-2

D. K. Pradhan, “Fault Tolerant Computer System Design”, Chapter 3.10 (“Forward recovery”).

Singhal and Shivaratri, “Advanced Concepts in Operating Systems”, Chapter 12: Recovery (“Backward recovery”).

Synchronous Checokpointing: “Checkpointing and Rollback-Recovery for Distributed Systems” by R. Koo and S. Toueg, IEEE Transactions on Software Engineering, Jan 1987, pp. 23-31.

Asynchronous Checkpointing: “Crash Recovery with Little Overhead” by T. Juang and S. Venkatesan, 11th International Conference on Distributed Computing Systems, May 1991, pp. 454-461.

Validation  

pdf

 

 

Lec 1  
Replication  

pdf

Lecture 1 (online lecture) –

Basics, Optimistic replication (scan of handout)

Lecture 2 (online lecture) –

Pessimistic replication (scan of handout)

Lecture 3 (online lecture) Pessimistic concurrency control - hierarchical voting approach, dynamic voting approach.

Lecture 4 (online lecture)

from dynamic voting up to quantitative determination of optimal replication degree

 

 

 

Lec 1
Lec 2
Lec 3
Lec 4

 

Pankaj Jalote book Chapter 7 "Data replication and resiliency" [ pdf ]

Putting it all together: AWS, Hadoop   pdf  

“Amazon Web Services - Architecting for The Cloud: Best Practices,” Jinesh Varia (Amazon), January 2011. [ pdf ]

“On Designing and Deploying Internet-Scale Services,” James Hamilton - Windows Live Services Platform, pp. 231-242 of the Proceedings of the 21st Large Installation System Administration Conference (LISA '07).

Secure Coding   pdf   Not covered
Putting it all together   pdf   Not covered

           

Administrative Announcements:

Title

Date

Document

Course Information

Aug 28

html

Anonymous feedback form

Aug 28

html

Regrade policy

Aug 28

html

Link to internal/private part of site   html

Announcements (running document, updated

through the semester)

Continuous

html

 

Dependability in the News” Discussions

Date

Topic

Link

Aug 21 Knight capital trading error (Apr 2012) pdf (NYT article)
Sep 22 Ransomware attacks in 2016-17

html (Wired magazine)
html (NY Times)
html (NY Times)

Nov 13 Internet down (or how easy it is to take down DNS) html

 

Assignments:

Title

Handout Date

Due Date

Document

Solution

Getting to know you Aug 21 Aug 26 On Blackboard  
Quantitative assessment Oct 1 Oct 8 On Blackboard On Blackboard
SAN modeling Nov 7 Nov 17 On Blackboard On Blackboard
Chaos engineering Dec 1 Dec 6 On Blackboard On Blackboard

Exams:

Title

Handout Date

Due Date

Questions

Clarifications

Solution

Sample Midterm Exam

Questions

Oct 30

pdf

(password protected)

Midterm exam stats are: Mean = 76.66, Median = 74, Std. Dev. = 14.17.

 

Class Notes

Active Learning Activity:

Title

Handout Date

Document

Bug detection August 24

pdf

Mudboard: Bug detection, Parity September 8 pdf
Mudboard: Dependent failures September 29 pdf
Mudboard: Non series-parallel system analysis October 29 pdf
Mudboard: Broadcast protocols Dec 10 pdf
Security bug detection Dec 9 pdf

 

Course Projects:

Title

Document

Grading template for final presentation and report txt
Sample project reports

html

Project list

html (Password needed)

pdf (Password needed)

Project proposal structure html

Grading template for final project presentation and report

txt

Sample project list from 2011 html

Sample project list from 2009

html

Sample project list from 2007

html

pdf

Sample project list from 2005

html

Project Presentations

Title

Group Members

Document

 

References

Date

Topic

File

 

[ ECE 695B Home | Description | Handouts | Announcements ]

[Purdue Home]

[ECE Home]

Maintained by Saurabh Bagchi
sbagchi@purdue.edu

Last updated: December 22, 2017