Faraz
Ahmad Purdue University (Alumni) Software Engineer Teradata Aster 999
Skyway Road, Suite 100 San
Carlos, CA 94070 Phone:
(765) 491-0424 Email:
faraz.ahmad@gmail.com I
received my Ph.D. in the Department of
Electrical and Computer Engg. at Purdue
University, working in Computer Systems Architecture group under
my advisor T. N.
Vijaykumar. I got my Bachelor's degree from the University of
Engineering and Technology, Lahore, Pakistan. My research interests include
cloud computing, data mining, statistical analysis, data center
architectures, big-data and energy-aware computing, distributed systems and
computer architecture. My current work focuses on big-data analytics. My
past projects during my graduation include ShuffleWatcher, Shuffle-aware
scheduling in multi-tenant MapReduce clusters (Usenix ATC 2014), Tarazu,
optimizing MapReduce On Heterogeneous Clusters (ASPLOS 2012), PowerTrade, a
joint optimization of idle power and cooling power to reduce overall data
center power (ASPLOS 2010), and MaRCO, a runtime performance optimization for
MapReduce, the well-known programming model for large-volume data analysis in
data centers (Tech Report 2007). During this work, I also developed a
benchmark suite for MapReduce (details below). I have also worked on
providing architecture support for debugging multithreaded programming in
multicores (TimeTraveler, ISCA 2010).
MapReduce
is a well-known programming model, developed within Google, for processing
large amounts of raw data, for example, crawled documents or web request
logs. This data is usually so large that it must be distributed across
thousands of machines in order to be processed in a reasonable time. The ease
of programmability, automatic data management and transparent fault tolerance
has made MapReduce a favorable choice for large-scale data centers batch
processing. Map, written by a user of the MapReduce library, takes an input
pair and produces a set of intermediate key/value pairs. The library groups
together all intermediate values associated with the same intermediate key
and passes them to the reduce function through an all-map-to-all-reduce
communication called Shuffle. Reduce, also written by the user, receives
intermediate key along with a set of values from Map and merges together
these values to produce the final output. Hadoop
is an open-source implementation of MapReduce which is being improved and
developed regularly by software developers / researchers and is maintained by
Apache Software Foundation. Despite
being vast efforts on the development of Hadoop MapReduce, there has not been
a very rigorous work done on the benchmarks side. During our
work on MapReduce, we developed a benchmark suite which represents a broad
range of MapReduce applications exhibiting application characteristics with
high/low computation and high/low shuffle volumes. The details of
applications, their code (compatible with Hadoop-0.20 and Hadoop-1.0.0), and
details about input datasets can be found below.
|