Programming Parallel Machines
This course will enable you to write programs targeting parallel machines, using any of the four major parallel programming paradigms: MPI (message passing), OpenMP (for shared memory machines), Pthreads thread programming (for shared memory machines) and, GPU programming (using Cuda). We will also discuss system architecture and memory and programming language coherency models, as these are necessary to develop correct parallel programs and to debug parallel programs when they are not correct. We will also spend time on sequential performance optimizations.
This is not a course in parallel algorithms, although you will need implement one or more parallel algorithms for the course project.
This is not a course in parallel algorithms, although you will need implement one or more parallel algorithms for the course project.
ECE56300
Credit Hours:
3Learning Objective:
At the end of the course you will be able to- Write a parallel program using MPI
- Write a parallel program using OpenMP
- Write a parallel program using explicit threads
- Write a GPU program using Cuda
- Compute the performance, efficiency and performance of a parallel program
- Decide on the suitability of a parallel algorithm for a particular parallel programming model
Description:
This course will enable you to write programs targeting parallel machines, using any of the four major parallel programming paradigms: MPI (message passing), OpenMP (for shared memory machines), Pthreads thread programming (for shared memory machines) and, GPU programming (using Cuda). We will also discuss system architecture and memory and programming language coherency models, as these are necessary to develop correct parallel programs and to debug parallel programs when they are not correct. We will also spend time on sequential performance optimizations.
This is not a course in parallel algorithms, although you will need implement one or more parallel algorithms for the course project.
Topics Covered:
- Introduction to parallelism and the course
- Shared and distributed memory architectures, multicores, dependence, and the relationship to parallelism
- Hardware and software coherence in shared memory systems. The focus will be on software memory models as 565 does a better job of teaching hardware coherence
- Sequential Optimizations
- OpenMP and shared memory programming
- Pthreads, Java and shared memory programming
- MPI and distributed memory programming
- GPU architecture and programming
- Tunin applications and speedup theory, Amdahl's law, strong and weak scaling, etc. theory
- Algorithms and techniques for fast reductions, recurrences, parallel prefix, divide and conquer, super linear speedup, etc.
- Parallelizing compilers and their limitations
- New programming models: some of Cilk, Stream, UPC, Galois X 10
- Tests