Things handed out or covered in class:
(The Boilermaker shaded row indicates where we are in the current semester. The material after that is from the last time I taught the class and is for your reference only. It will likely change some in this semester's teaching.)
|
Title |
Start Date |
Document |
Notes |
Reference Material |
|
Course Information |
Aug 13 |
|
|
|
|
Course FAQ |
Aug 13 |
|
|
|
|
Motivation & Introduction to Fault Tolerance Terms |
|
Lec 1 Lec 2 Lec 3 |
Bug detection ALA [ pdf ] |
|
| Reliability Analysis: Discrete case | Lec 1 Lec 2 Lec 3 |
Problem sheet [ pdf ] KST book Chapter 2: pp. 61-93 |
||
| Reliability Analysis: Continuous case | Lec 1 Lec 2 Lec 3 |
Problem sheet [ pdf ] KST book Chapter 3: pp. 115-165 Calculus cheat sheet [ pdf ] Case study - LANL failure data [ pdf ] |
||
| Hardware | Lec 1 Lec 2 Lec 3 |
|||
| Software | Lec 1 Lec 2 |
"Robust search methods for B-trees" Fujimura and Jalote, FTCS-88. [ Paper ] [ Slides ] | ||
| Distributed protocol primitives - broadcast and agreement |
|
Lec 1 Lec 2 Lec 3 Lec 4 |
“Fault Tolerance in Distributed Systems” by Pankaj Jalote, Prentice Hall. Chapter 4 – Broadcast. [ pdf ] “Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 8 – Agreement. [ pdf ] “Advanced Concepts in Operating Systems” by Singhal and Shivaratri, McGraw Hill. Chapter 13 – Commit Protocols. [ pdf ] |
|
| Modeling |
Stochastic Activity Networks Lecture 3 (on video) – Stochastic Activity Networks Lecture 4 (on video) – Stochastic Activity Networks |
Lec 1 | Markov process: Trivedi book, pp. 337-356 Ken Keefe's presentation and support material for Mobius: URL |
|
| Recovery |
|
Lec 1-2 | D. K. Pradhan, “Fault Tolerant Computer System Design”, Chapter 3.10 (“Forward recovery”). Singhal and Shivaratri, “Advanced Concepts in Operating Systems”, Chapter 12: Recovery (“Backward recovery”). Synchronous Checokpointing: “Checkpointing and Rollback-Recovery for Distributed Systems” by R. Koo and S. Toueg, IEEE Transactions on Software Engineering, Jan 1987, pp. 23-31. Asynchronous Checkpointing: “Crash Recovery with Little Overhead” by T. Juang and S. Venkatesan, 11th International Conference on Distributed Computing Systems, May 1991, pp. 454-461. |
|
| Validation |
|
Lec 1 | ||
| Replication |
Lecture 1 (online
lecture) –
Basics, Optimistic
replication (scan
of handout)
Lecture 2 (online
lecture) –
Pessimistic replication (scan of handout)
Lecture 3 (online lecture) – Pessimistic concurrency control - hierarchical voting approach, dynamic voting approach. Lecture 4 (online lecture) – from dynamic voting up to quantitative determination of optimal replication degree
|
Pankaj Jalote book Chapter 7 "Data replication and resiliency" [ pdf ] |
||
| Putting it all together: AWS, Hadoop | “Amazon Web Services - Architecting for The Cloud: Best Practices,” Jinesh Varia (Amazon), January 2011. [ pdf ] “On Designing and Deploying Internet-Scale Services,” James Hamilton - Windows Live Services Platform, pp. 231-242 of the Proceedings of the 21st Large Installation System Administration Conference (LISA '07). |
|||
| Secure Coding | Not covered | |||
| Putting it all together | Not covered |
Administrative
Announcements:
|
Title |
Date |
Document |
|
Course Information |
Aug 28 |
|
|
Anonymous feedback form |
Aug 28 |
|
|
Regrade policy |
Aug 28 |
|
| Link to internal/private part of site | html | |
|
Announcements (running document, updated through the semester) |
Continuous |
”Dependability in the News” Discussions
|
Date |
Topic |
Link |
| Aug 21 | Knight capital trading error (Apr 2012) | pdf (NYT article) |
| Sep 22 | Ransomware attacks in 2016-17 | |
| Nov 13 | Internet down (or how easy it is to take down DNS) | html |
|
Title |
Handout Date |
Due Date |
Document |
Solution |
| Getting to know you | Aug 21 | Aug 26 | On Blackboard | |
| Quantitative assessment | Oct 1 | Oct 8 | On Blackboard | On Blackboard |
| SAN modeling | Nov 7 | Nov 17 | On Blackboard | On Blackboard |
| Chaos engineering | Dec 1 | Dec 6 | On Blackboard | On Blackboard |
Exams:
|
Title |
Handout Date |
Due Date |
Questions |
Clarifications |
Solution |
|
Sample Midterm Exam Questions |
Oct 30 |
(password protected) |
Midterm exam stats are: Mean = 76.66, Median = 74, Std. Dev. = 14.17. |
|
Active Learning Activity:
|
Title |
Handout Date |
Document |
| Bug detection | August 24 | |
| Mudboard: Bug detection, Parity | September 8 | |
| Mudboard: Dependent failures | September 29 | |
| Mudboard: Non series-parallel system analysis | October 29 | |
| Mudboard: Broadcast protocols | Dec 10 | |
| Security bug detection | Dec 9 |
|
Title |
Document |
| Grading template for final presentation and report | txt |
| Sample project reports | |
|
Project list |
html
pdf |
| Project proposal structure | html |
|
Grading template
for final project presentation and report |
|
| Sample project list from 2011 | html |
|
Sample project
list from 2009 |
|
|
Sample project
list from 2007 |
|
|
Sample project
list from 2005 |
Project Presentations
|
Title |
Group Members |
Document |
References
|
Date |
Topic |
File |
[ ECE 695B Home | Description | Handouts | Announcements ]
Maintained
by Saurabh Bagchi
sbagchi@purdue.edu
Last updated:
December 22, 2017