SEMINAR: "Cloud Storage Systems: Latency, Reliability, and Cost"

Event Date: February 22, 2016

Monday, February 22nd, 2016
10:30am
MSEE 239

Vaneet Aggarwal, Assistant Professor, Industrial Engineering, Purdue

Abstract

Consumers are engaged in more social networking and E-commerce activities these days and are increasingly storing their documents and media in the online storage. Businesses are relying on Big Data analytics for business intelligence and are migrating their traditional IT infrastructure to the cloud. These trends cause the online data storage demand to rise faster than Moore's Law. Erasure coding techniques are used widely for distributed data storage since they provide space-optimal data redundancy to protect against data loss.

Cost-effective, network-accessible storage is a strategic infrastructural capability that can serve many businesses. These customers, however, have very diverse requirements of latency, reliability, cost, security etc. In this talk, I will describe how to characterize latency, reliability, cost, and the trade-offs involved in these. In order to characterize latency, we give and analyze a novel scheduling algorithm. Further, I will present a novel concept of functional caching in storage systems and describe its advantages as compared to duplication in current caching systems. I will then describe that limited bandwidth between data centers allow us to design new coding schemes that help improve mean time to data loss of the system by 10^20 for (51,30) erasure code as compared to a standard Reed-Solomon code. Finally, I will focus on joint optimization of customer requirements, present new approaches for content placement and content access, and validate the results using implementations on an open source distributed file system on a public test grid.