Parallel computing allows multiple processing units to work together simultaneously on a common task. For certain types of applications, parallelization can increase execution time in proportion to the number of computers or processors used. This is a significant advantage for applications which have performance and/or memory bottlenecks, such as one typically encounters in financial modeling, physics, engineering, or other applied science domains.
This is a fast-paced applied programming course aimed at students with general interest in high speed computing and with significant development experience in either C, C++, or FORTRAN. No specific prior specialized knowledge is assumed. Students should, however, have both an interest and some previous experience in either algorithmic development, numerical methods, applied mathematics, or perhaps any physics or engineering-type discipline. A brief overview of parallel computing will be presented at the outset, but the course will be less a survey of HPC architectures and more a practicum on algorithmic implementation and performance tuning. The goal of the course it to give students experience in developing efficient, scalable parallel algorithms both for distributed memory (using MPI) and shared memory models architectures(using OpenMP and CUDA). Assignments will be designed with some flexibility to allow students to explore applying parallel techniques to applications in their own field of interest.
The topics will interleave program model studies with application exemplars designed to gain experience in implementing these parallel programming strategies
- Overview of high performance computing platforms and programming models
- Single core cache-based performance optimization
- Introduction to Message Passing: MPI
- Non-blocking MPI point-to-point communication semantics in depth
- Strategies for parallel solution of explicit and implicit PDEs
- Nearest neighbor ghost cell filling for discretized PDEs
- MPI Collectives
- Techniques for "one-sided" programming for unpredictable communication
- Techniques for implementing domain decomposition for particle codes
- MPI User defined types
- N-body problems
- On-core Threading: OpenMP and Pthreads
- GPGPU programming: CUDA
This is a project-based class with no quizzes or tests. Coursework consists of 4-5 bi-weekly (approximately) homework assignments.
This course has no required textbooks. Students may find the following references useful as a complement to lecture notes:
- Using MPI by Gropp, Lusk, and Skjellum. MIT Press. 2nd edition
- Numerical Recipes in C (or F77 or F90). Online version.
- MPI: The Complete Reference. Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra. The MIT Press. Online version.
- Using OpenMP: Portable Shared Memory Parallel Programming by Chapman, Jost, van der Past
- CUDA by Example: An Introduction to General Purpose GPU Programming by Sanders