Things handed out or covered in class:
(The Boilermaker shaded row indicates where we are in the current semester. The material after that is from the last time I taught the class and is for your reference only. It will likely change some in this semester's teaching.)
Title |
Start Date |
Document |
Notes |
Reference Material |
Course Information |
Aug 13 |
|
|
|
Course FAQ |
Aug 13 |
|
|
|
Motivation & Introduction to Fault Tolerance Terms |
|
Lec 1 Lec 2 Lec 3 |
Bug detection ALA [ pdf ] |
|
Reliability Analysis: Discrete case | Lec 1 Lec 2 Lec 3 |
Problem sheet [ pdf ] KST book Chapter 2: pp. 61-93 |
||
Reliability Analysis: Continuous case | Lec 1 Lec 2 Lec 3 |
Problem sheet [ pdf ] KST book Chapter 3: pp. 115-165 Calculus cheat sheet [ pdf ] Case study - LANL failure data [ pdf ] |
||
Hardware | Lec 1 Lec 2 Lec 3 |
|||
Software | Lec 1 Lec 2 |
"Robust search methods for B-trees" Fujimura and Jalote, FTCS-88. [ Paper ] [ Slides ] | ||
Distributed protocol primitives - broadcast and agreement |
|
Lec 1 Lec 2 Lec 3 Lec 4 |
“Fault Tolerance in Distributed Systems” by Pankaj Jalote, Prentice Hall. Chapter 4 – Broadcast. [ pdf ] “Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 8 – Agreement. [ pdf ] “Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 13 – Commit Protocols. [ pdf ] |
|
Modeling |
Stochastic Activity Networks Lecture 3 (on video) – Stochastic Activity Networks Lecture 4 (on video) – Stochastic Activity Networks |
Lec 1 | Markov process: Trivedi book, pp. 337-356 Ken Keefe's presentation and support material for Mobius: URL |
|
Recovery |
|
Lec 1-2 | D. K. Pradhan, “Fault Tolerant Computer System Design”, Chapter 3.10 (“Forward recovery”). Singhal and Shivaratri, “Advanced Concepts in Operating Systems”, Chapter 12: Recovery (“Backward recovery”). Synchronous Checokpointing: “Checkpointing and Rollback-Recovery for Distributed Systems” by R. Koo and S. Toueg, IEEE Transactions on Software Engineering, Jan 1987, pp. 23-31. Asynchronous Checkpointing: “Crash Recovery with Little Overhead” by T. Juang and S. Venkatesan, 11th International Conference on Distributed Computing Systems, May 1991, pp. 454-461. |
|
Validation |
|
Lec 1 | ||
Replication |
Lecture 1 (online
lecture) –
Basics, Optimistic
replication (scan
of handout)
Lecture 2 (online
lecture) –
Pessimistic replication (scan of handout)
Lecture 3 (online lecture) – Pessimistic concurrency control - hierarchical voting approach, dynamic voting approach. Lecture 4 (online lecture) – from dynamic voting up to quantitative determination of optimal replication degree
|
Pankaj Jalote book Chapter 7 "Data replication and resiliency" [ pdf ] |
||
Putting it all together: AWS, Hadoop | “Amazon Web Services - Architecting for The Cloud: Best Practices,” Jinesh Varia (Amazon), January 2011. [ pdf ] “On Designing and Deploying Internet-Scale Services,” James Hamilton - Windows Live Services Platform, pp. 231-242 of the Proceedings of the 21st Large Installation System Administration Conference (LISA '07). |
|||
Secure Coding | Not covered | |||
Putting it all together | Not covered |
Administrative
Announcements:
Title |
Date |
Document |
Course Information |
Aug 28 |
|
Anonymous feedback form |
Aug 28 |
|
Regrade policy |
Aug 28 |
|
Link to internal/private part of site | html | |
Announcements (running document, updated through the semester) |
Continuous |
”Dependability in the News” Discussions
Date |
Topic |
Link |
Aug 21 | Knight capital trading error (Apr 2012) | pdf (NYT article) |
Sep 22 | Ransomware attacks in 2016-17 | |
Nov 13 | Internet down (or how easy it is to take down DNS) | html |
Title |
Handout Date |
Due Date |
Document |
Solution |
Getting to know you | Aug 21 | Aug 26 | On Blackboard | |
Quantitative assessment | Oct 1 | Oct 8 | On Blackboard | On Blackboard |
SAN modeling | Nov 7 | Nov 17 | On Blackboard | On Blackboard |
Chaos engineering | Dec 1 | Dec 6 | On Blackboard | On Blackboard |
Exams:
Title |
Handout Date |
Due Date |
Questions |
Clarifications |
Solution |
Sample Midterm Exam Questions |
Oct 30 |
(password protected) |
Midterm exam stats are: Mean = 76.66, Median = 74, Std. Dev. = 14.17. |
|
Active Learning Activity:
Title |
Handout Date |
Document |
Bug detection | August 24 | |
Mudboard: Bug detection, Parity | September 8 | |
Mudboard: Dependent failures | September 29 | |
Mudboard: Non series-parallel system analysis | October 29 | |
Mudboard: Broadcast protocols | Dec 10 | |
Security bug detection | Dec 9 |
Title |
Document |
Grading template for final presentation and report | txt |
Sample project reports | |
Project list |
html
pdf |
Project proposal structure | html |
Grading template
for final project presentation and report |
|
Sample project list from 2011 | html |
Sample project
list from 2009 |
|
Sample project
list from 2007 |
|
Sample project
list from 2005 |
Project Presentations
Title |
Group Members |
Document |
References
Date |
Topic |
File |
[ ECE 695B Home | Description | Handouts | Announcements ]
Maintained
by Saurabh Bagchi
sbagchi@purdue.edu
Last updated:
December 22, 2017