Good Enough Dependability: A New Paradigm For Dependable Systems Design
|Event Date:||November 3, 2017|
|Speaker Affiliation:||University of British Columbia|
|Contact Name:||Saurabh Bagchi
|School or Program:||Electrical and Computer Engineering
Traditionally, software is designed with the assumption that the hardware is fault-free, and hence software hardly ever needs to deal with hardware errors. However, this assumption is becoming increasingly difficult to satisfy as CMOS devices scale to smaller and smaller sizes, and as manufacturing variations increase. In addition, traditional solutions such as guard-banding and dual modular redundancy (DMR) are challenging to apply in commodity systems due to stringent power constraints. Therefore, there is a compelling need to develop low overhead software approaches for protecting commodity software from hardware errors.
In this talk, I will describe our approach to build software systems that are resilient to hardware faults. We call this approach “good enough dependability”, to emphasize that imperfect protection is allowed in order to achieve low costs. First, I will present a compiler-based approach to identify critical data in soft-computing applications, or applications that have inherent resilience to errors. We call the errors that cause significant deviation from the correct output of the application as Egregious Data Corruptions (EDCs), and provide targeted protection for such errors. I will then discuss how we extend this approach to Silent Data Corruptions (SDCs) in general-purpose applications, which are not inherently error resilient, using machine learning techniques. Finally, I will present our work on detecting long-latency crash causing errors in applications through the use of static analysis, and on modelling hardware error propagation in programs.
This is joint work with my graduate students, colleagues at UBC, and industry collaborators.
Karthik Pattabiraman received his M.S and PhD. degrees from the University of Illinois at Urbana-Champaign (UIUC) in 2004 and 2009 respectively. After a post-doctoral stint at Microsoft Research (Redmond), Karthik joined the University of British Columbia (UBC) in 2010, where he is now an associate professor of electrical and computer engineering. Karthik's research interests are in building error-resilient software systems, and in software engineering and security. Karthik has won distinguished paper (or runner up) awards at the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2008, the IEEE International Conference on Software Testing (ICST), 2013, the IEEE/ACM International Conference on Software Engineering (ICSE), 2014, and the European Dependable Computing Conference (EDCC), 2015, 2016. He is a recipient of the 2016 Killam Faculty Research Fellowship at UBC. Karthik is a senior member of the IEEE, and a member of the IFIP Working Group on Dependable Computing (10.4). Find out more about him at: http://blogs.ubc.ca/karthik