Programming Parallel Machines

This course will enable you to write programs targeting parallel machines, using any of the four major parallel programming paradigms: MPI (message passing), OpenMP (for shared memory machines), Pthreads thread programming (for shared memory machines) and, GPU programming (using Cuda). We will also discuss system architecture and memory and programming language coherency models, as these are necessary to develop correct parallel programs and to debug parallel programs when they are not correct. We will also spend time on sequential performance optimizations.
This is not a course in parallel algorithms, although you will need implement one or more parallel algorithms for the course project.

ECE56300

Credit Hours:

3

Learning Objective:

At the end of the course you will be able to
  • Write a parallel program using MPI
  • Write a parallel program using OpenMP
  • Write a parallel program using explicit threads
  • Write a GPU program using Cuda
  • Compute the performance, efficiency and performance of a parallel program
  • Decide on the suitability of a parallel algorithm for a particular parallel programming model

Description:

This course will enable you to write programs targeting parallel machines, using any of the four major parallel programming paradigms: MPI (message passing), OpenMP (for shared memory machines), Pthreads thread programming (for shared memory machines) and, GPU programming (using Cuda). We will also discuss system architecture and memory and programming language coherency models, as these are necessary to develop correct parallel programs and to debug parallel programs when they are not correct. We will also spend time on sequential performance optimizations.
This is not a course in parallel algorithms, although you will need implement one or more parallel algorithms for the course project.

Topics Covered:

  • Introduction to parallelism and the course
  • Shared and distributed memory architectures, multicores, dependence, and the relationship to parallelism
  • Hardware and software coherence in shared memory systems. The focus will be on software memory models as 565 does a better job of teaching hardware coherence
  • Sequential Optimizations
  • OpenMP and shared memory programming
  • Pthreads, Java and shared memory programming
  • MPI and distributed memory programming
  • GPU architecture and programming
  • Tunin applications and speedup theory, Amdahl's law, strong and weak scaling, etc. theory
  • Algorithms and techniques for fast reductions, recurrences, parallel prefix, divide and conquer, super linear speedup, etc.
  • Parallelizing compilers and their limitations
  • New programming models: some of Cilk, Stream, UPC, Galois X 10
  • Tests

Prerequisites:

Competency in C, C++ or Fortran.

Applied / Theory:

85 / 15

Web Address:

https://engineering.purdue.edu/~smidkiff/ece563/

Homework:

Homework will be assigned periodically. You will be graded on whether or not you turn in something that looks reasonable, not correctness.

Projects:

The project will consist of a larger MPI/shared memory program for a more substantial algorithm.

Exams:

Two exams. First exam will be take home, the second will be in class

Textbooks:

There is no required text. The lectures will follow the book Parallel Programming in C with MPI and OpenMP, along with supplemental material that will be provided.

Computer Requirements:

Students will use the Purdue Scholar cluster for programming assignments. If students have access to multi-node parallel machines in their lab or job, they may do the programming on those machines as well. Students will need access to a computer to allow them to remotely login to the Purdue Scholar cluster. Piazza