SWAT: Battling the Dark Side of Moore's Law

Event Date: April 27, 2009
Speaker: Alex (Man-Lap) Li
Speaker Affiliation: Department of Computer Science
University of Illinois at
Sponsor: ECE Faculty Candidate
CE Area Seminar
Time: 10:00 AM
Location: MSEE 239
Open To: Acceptable for ECE694A
While devices continue to scale in accordance with Moore's Law to
provide increasing processing capability, their reliability is a growing
concern. This dark side of Moore's Law will cause future hardware to
fail for many reasons. The pervasiveness of the reliability problem
across the broad computing market demands low-cost and general
solutions, precluding the use of expensive solutions that involve
excessive redundancy or piecemeal solutions that target specific failure
modes. This talk will present the SWAT (SoftWare Anomaly Treatment)
system, a low-cost and general reliability solution that automatically
detects, diagnoses, recovers from, and repairs around failed components.
By observing that error detection mechanisms must be extremely low cost
because they are "always on," SWAT effectively optimizes the overall
system cost by only handling hardware faults that propagate and appear
as software anomalies. SWAT therefore detects a variety of hardware
faults by watching for anomalous software behavior, using novel zero to
low-cost hardware and software monitors. In the infrequent case that a
fault is detected, SWAT invokes a comprehensive diagnosis procedure to
isolate the root cause of the fault, repair or reconfigure around it,
and invoke recovery. The long-term goal of the SWAT project is to
develop a hardware-software codesigned solution that treats both
hardware and software faults with a common framework optimized for
overall system reliability.

Alex (Man-Lap) Li is a Ph.D. candidate in the Department of Computer
Science at the University of Illinois at Urbana-Champaign (UIUC). His
research interest lies in computer architecture and systems, with a
focus on parallel and reliable computing. Well before the widespread use
of multicore systems, he authored the ALPBench benchmark suite that
exploits the thread- and data-level parallelism in
popular(then-emerging) media applications (publicly available, over 700
downloads). He is the recipient of the 2008 W. J. Poppelbaum Memorial
Award, granted annually by the UIUC CS department to a graduate student
in computer architecture based on academic merit and creativity.