Sponsors: Purdue Research Foundation
Collaborators: Nanoelectronics Research Lab (Purdue ECE)
The design process used to convert algorithms to hardware or software
implementations has traditionally obeyed the axiom that the
specification and implementation need to be equivalent in a Boolean or
numerical sense. This assumption is built in to design, verification,
and testing methodologies. However, algorithms from several
interesting application domains exhibit the property of inherent
resilience to “errors” from extrinsic or intrinsic sources, offering
entirely new avenues for performance and power optimization by
relaxing the requirement of exact numerical or Boolean
equivalence. While inherent resilience is present in a wide range of
application domains (multimedia, digital signal processing, wireless
communications), emerging workloads of the future, such as
Recognition, Mining, and Synthesis, take this inherent resilience to a
different level due to the massive amounts of data they process,
statistical nature of algorithms, and built-in user expectations of
Collaborators: MESDAT Lab, U. C. San Diego
The scaling of integrated circuits (ICs) into the nanometer regime has thrown up new challenges for designers, foremost among which are variations in the characteristics of IC components. Variations threaten to diminish the fundamental benefits of technology scaling, such as improvements in cost-per-transistor, performance and power consumption. Variation-aware design techniques that have been proposed thus far are being stretched to their limits, and cannot contain the incessant increase in variations. Therefore, it is important to develop new design approaches for systems that are inherently resilient to variations in the underlying components.
This project develops a framework based on adaptive applications and architectures for the design of variation-tolerant application-specific systems. It advances the state-of-the-art by (i) adopting a cross-layer approach at the system architecture and application layers, (ii) leveraging the inherent "elasticity" of a wide class of applications to adapt to variations in the underlying hardware while still producing acceptable performance and maintaining end-user experience, and (iii) exploring a hybrid (design-time and post-fabrication) design methodology, enabling more accurate and effective system adaptation in response to variations. The developed technologies will significantly extend our ability to avail of the benefits of technology scaling in the face of increasing variations.
Sponsors: NSF, NEC Labs America
Collaborators: Paramount Group, Purdue ECE, NEC Labs America
Graphics Processing Units (GPUs) have emerged as a significant force in the transition of the computing industry to mainstream parallel computing. Although they have traditionally been deployed in computing platforms to exclusively perform graphics computations, their increasingly general-purpose (GPGPU) architecture makes them capable of executing a wide range of compute-intensive applications. Current GPGPUs are "manycore" processors that incorporate upto 240 (soon to be 512!) cores in a single chip and feature very high memory bandwidths. Researchers have demonstrated exciting speedups on GPUs for a wide range of application domains. Thanks to the pervasiveness of GPUs, most computing systems desktops, laptops, and even mobile devices will soon evolve into heterogeneous parallel computers that contain manycore computing engines. However, enabling applications and end users to benefit from this vast untapped computing potential requires that we make GPGPU programming easy and accessible to the average programmer. Despite significant advances in the state-of-the-art in GPGPU programming, from graphics-specific APIs such as OpenGL to general-purpose models such as NVIDIA's CUDA, OpenCL, and AMD's Stream SDK, achieving the twin goals of easy programmability and efficient GPU execution remains a major challenge. Due to the relative infancy of programmable GPU architectures, there are several important open research challenges in this area that we will address in the proposed research.
This project addresses the critical challenges of (i) making GPGPU programming easier by investigating new high-level programming models for GPGPUs, and (ii) enabling efficient GPGPU execution by developing compilation frameworks for programs written to these models. We propose two complementary, synergistic models for GPGPU programming the OpenMP programming model that has been widely used for shared memory parallel programming, and Parallel Operator Data-Flow Graphs (PO-DFGs), which naturally represent the structure of algorithms in a wide range of current and emerging application domains such as audio, video, and image processing, and recognition and mining. For programs written to these programming models, we will develop various optimization techniques including partitioning of the program between the host CPU and GPU and across multiple GPUs, stream optimizations that render the program's memory access characteristics to be more amenable to the GPU's memory system, minimizing data transfer between the host and GPU memory, and various GPU architecture-specific optimizations. We will also focus on the challenge of enabling applications to execute efficiently on GPUs in the face of scaling data sizes when the application's footprint does not fit in the GPU memory. Our research will significantly contribute to the evolution of GPGPU programming from manual ports of applications using low-level APIs, to the use of high-level parallel programming models. We will demonstrate the benefits of the developed programming models and frameworks using standard parallel benchmarks, as well as applications from important domains such as audio, image and video processing, cryptography, and recognition and mining.