Intro to high-performance computing (HPC)

This course is an introduction to High-Performance Computing on the Alliance clusters.

Abstract: This course is an introduction to High-Performance Computing (HPC) on the Alliance clusters. We will start with the cluster hardware overview, then talk about some basic tools and the software environment on our clusters. Next we’ll give a quick tour of various parallel programming frameworks such as OpenMP, MPI, Python Dask, newer parallel languages such as Chapel and Julia, and we’ll try to compile some serial, shared-memory and distributed-memory codes using makefiles. We’ll then proceed to working with the Slurm scheduler, submitting and benchmarking our previously compiled codes. We will learn about batch and interactive cluster usage, best practices for submitting a large number of jobs, estimating your job’s resource requirements, and managing file permissions in shared cluster filesystems. There will be many demos and hands-on exercises on our training cluster.

Instructor: Alex Razoumov (SFU)

Prerequisites: Working knowledge of the Linux Bash shell. We will provide guest accounts to one of our Linux systems.

Software: All attendees will need a remote secure shell (SSH) client installed on their computer in order to participate in the course exercises. On Mac and Linux computers SSH is usually pre-installed (try typing ssh in a terminal to make sure it is there). Many versions of Windows also provide an OpenSSH client by default – try opening PowerShell and typing ssh to see if it is available. If not, then we recommend installing the free Home Edition of MobaXterm.

Materials: Please download a ZIP file with all slides (single PDF combining all chapters) and sample codes. A copy of this file is also available on the training cluster.

Videos: introduction

These videos (recorded in 2020) cover the same materials we study in the course, but you can watch these at your own pace.

Introduction (3 min)
Cluster hardware overview (17 min)
Basic tools on HPC clusters (18 min)
File transfer (10 min)
Programming languages and tools (16 min)

Updates:

Since April 1st, 2022, your instructors in this course are based at Simon Fraser University.
Some of the slides and links in the video have changed – please make sure to download the latest version of the slides (ZIP file).
Compute Canada has been replaced by the Digital Research Alliance of Canada (the Alliance). All Compute Canada hardware and services are now provided to researchers by the Alliance and its regional partners. However, you will still see many references to Compute Canada in our documentation and support system.
New systems were added (e.g. Narval in Calcul Québec), and some older systems were replaced (Cedar → Fir, Béluga → Rorqual, Graham → Nibi, Niagara → Trillium)

Videos: overview of parallel programming frameworks

Here we give you a brief overview of various parallel programming tools. Our goal here is not to learn how to use these tools, but rather tell you at a high level what these tools do, so that you understand the difference between shared- and distributed-memory parallel programming models and know which tools you can use for each. Later, in the scheduler session, you will use this knowledge to submit parallel jobs to the queue.

Feel free to skip some of these videos if you are not interested in parallel programming.

OpenMP (3 min)
MPI (message passing interface) (9 min)
Chapel parallel programming language (7 min)
Python Dask (6 min)
Make build automation tool (9 min)
Other essential tools (5 min)
Python and R on clusters (6 min)

Videos: Slurm job scheduler

Slurm intro (8 min)
Job billing with core equivalents (2 min)
Submitting serial jobs (12 min)
Submitting shared-memory jobs (9 min)
Submitting MPI jobs (8 min)
Slurm jobs and memory (8 min)
Hybrid and GPU jobs (5 min)
Interactive jobs (8 min)
Getting information and other Slurm commands (6 min)
Best computing / storage practices and summary (9 min)