Dependability
for Computer Systems meets Data Analytics
Abstract
We
live in a data-driven world as everyone around has been telling us of late.
Everything is generating data, sometime volumes of it, from the sensors
embedded in our physical spaces to the large number of machines in data centers
which are being monitored for a wide variety of metrics. The question that we
pose is:
Can the volume of
data be used for improving the dependability of computing systems?
Dependability
is simply the property that the system continues to provide its functionality
despite the introduction of faults, either accidental faults (design defects,
environmental effects, etc.) or maliciously introduced faults (security
attacks, either external or internal). The computing systems that we target
have been increasing in scale, both in terms of the number of executing
elements and the amount of data that they need to process. For example, a large
number of data-spewing sensors on mobile and embedded devices coupled with the
large number of such devices show such increases in scale. We have been
addressing the dependability challenge through large-scale data analytics in
three broad domains: embedded and mobile networks, distributed scientific
computing clusters and applications, and computational genomics. In this talk,
I will first give a high-level view of the dependability challenges in these
three domains and some of our key results.
I
will then go into two recent developments: dependability in a cellular network
and dependability through approximating computation. In the first development, we
answer the question – can the cellular network and the smart mobile devices
working together mitigate the problem of network outages or reduced data
bandwidth. In the second development, we answer the question – can the
limitations of human perception be leveraged to approximate certain computation
and thus allow the computation to meet timing guarantees, even when executing
on resource-constrained platforms. A common example is video processing where
the human visual system is forgiving for certain kinds of inaccuracy.
Bio
Saurabh
Bagchi is a Professor in the School of Electrical and Computer Engineering and
the Department of Computer Science (by courtesy) at Purdue University in West
Lafayette, Indiana. He is the founding Director of a university-wide resiliency
center at Purdue called CRISP (2017-present). He is an ACM Distinguished
Scientist (2013), a Senior Member of IEEE (2007) and of ACM (2009), a
Distinguished Speaker for ACM (2012), and an IMPACT Faculty Fellow at Purdue.
He is the recipient of an IBM Faculty Award (2014), a Google Faculty Award (2015),
and the AT&T Labs VURI Award (2016). He was elected to the IEEE Computer
Society Board of Governors for the 2017-19 term.
Saurabh's
research interest is in distributed systems and dependable computing. He is
proudest of the 18 PhD students who have graduated from his research group and
are in various stages of building wonderful careers in industry or academia. In
his group, he and his students have far too much fun building and breaking real
systems. Saurabh received his MS and PhD degrees from the University of
Illinois, Urbana-Champaign and his BS degree from the Indian Institute of
Technology Kharagpur, all in Computer Science.