October 4, 2016

Professor Saurabh Bagchi receives AT&T Labs Award for work on data center reliability

Professor Saurabh Bagchi
Professor Saurabh Bagchi
Subrata Mitra
Subrata Mitra
Rajesh Panta
Rajesh Panta
Alicia Abella
Alicia Abella
He received the award through AT&T's Virtual University Research Initiative (VURI) program, which facilitates collaborations between AT&T researchers and universities. Professor Bagchi’s project began in 2015 and is focused on diagnosis and repair of QoS issues in a multi-tenant cloud.

Professor Saurabh Bagchi has received an award from AT&T Labs through its Virtual University Research Initiative (VURI) program, which facilitates collaborations between AT&T researchers and universities. Professor Bagchi’s project began in 2015 and is focused on diagnosis and repair of QoS issues in a multi-tenant cloud. Three researchers from AT&T Labs - Rajesh Panta, Kaustubh Joshi, and Moo Ryong Ra - work with Saurabh’s group.

One notable result from the work has been an efficient distributed repair technique for storage failures in the data centers. This work was published in Eurosys 2016 with graduate researcher Subrata Mitra as the lead author and a patent filing is ongoing on this work. Saurabh described the work in the following way:

"With the explosion of data in applications all around us, erasure coded storage has emerged as an attractive alternative to replication because even with significantly lower storage overhead, they provide better reliability against data loss. Reed-Solomon code is the most widely used erasure code because it provides maximum reliability for a given storage overhead and is flexible in the choice of coding parameters that determine the achievable reliability. However, reconstruction time for unavailable data becomes prohibitively long mainly because of network bottlenecks. We have devised a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction. Then a distributed protocol progressively combines these partial results to reconstruct the unavailable data blocks and this technique reduces the network pressure. Our experiments carried out on AT&T's data center clusters show that PPR reduces repair time and degraded read time significantly. Importantly, this technique is compatible with existing erasure codes."

Rajesh Panta, Principal Inventive Scientist at AT&T Labs had this to say about the award and the collaboration:

"We are excited about this collaboration. We have been working with Professor Bagchi and his team on cloud and network areas for several years. We believe that we will be able to use Professor Bagchi’s expertise in distributed and fault tolerant systems to develop innovative and practical solutions for cloud platforms."

Alicia Abella, Assistant Vice President for Cloud Technologies and Services Research at AT&T Labs said:

“We are very pleased to be working with Professor Saurabh Bagchi and his students on this innovative project. AT&T Labs has a long and successful history of collaborative work with the academic community and we look forward to continuing that tradition in concert with Professor Bagchi.”