From cloud to HPC

… building a GPU-enabled container on Arbutus

This webinar (90% ready) was scheduled for 2025-Mar-11, but then cancelled as I was sick.

Abstract

In this beginner-friendly webinar I will walk through the steps of creating a GPU container on a virtual machine (VM) on Arbutus that can be deployed on HPC clusters with NVIDIA GPUs. This webinar will teach you how to:

create a VM on Arbutus cloud,
create an Apptainer sandbox from scratch in Linux,
install NVIDIA drivers and CUDA into the container,
compile GPU-enabled, single-locale Chapel in the container,
convert the sandbox into a SIF container,
compile and run Chapel codes inside the container, both on the VM and on production HPC clusters.

Spinning up a VM with a GPU

https://arbutus.cloud.alliancecan.ca/project/instances # log in with CCDB credentials
to CCInternal-magic-castle-training project
Compute | Instances | Launch Instance
Instance Name = alex-gpu-webinar
Source = AlmaLinux-9.4-x64-2024-05
Flavor = g1-8gb-c4-22gb
Networks = CCInternal-magic-castle-training-network
Security Groups = default
Key Pairs = alex20240821
click Launch (may take a few mins)
Instances -> in the drop-down menu select Associate Floating IP
Network | Security Groups | on the ``default’’ row click Manage Rules
Add Rule | pick SSH from the first drop-down menu | click Add

chmod 600 ~/.ssh/alex20240821.pem
ssh -i ~/.ssh/alex20240821.pem almalinux@206.12.91.229

sudo dnf check-update              # check which packages have pending updates
sudo dnf update -y                 # update these
sudo dnf install -y epel-release   # enable Extra Packages for Enterprise Linux (EPEL)
sudo dnf install -y git apptainer cmake bat
sudo dnf install -y htop nano wget tmux emacs-nox netcdf netcdf-devel
sudo reboot

git clone git@bitbucket.org:razoumov/synchpc.git syncHPC
/bin/rm -f ~/.bashrc && ln -s syncHPC/bashrc ~/.bashrc && source ~/.bashrc
/bin/rm -f ~/.emacs && ln -s syncHPC/emacs ~/.emacs
ln -s syncHPC/startSingleLocaleGPU.sh startSingleLocaleGPU.sh

Volumes | Volumes | Create a volume, name=razoumovVol, no source, empty volume, type default, 300 GB, click Create from your instance Attach Volume: pick “razoumovVol” Compute | Instances | Attach volume, select razoumovVol, click Attach Compute | Instances | Volumes Attached, check which device it is attached to, e.g. /dev/vdb

ssh alma
sudo fdisk -l               # find your volume/device
sudo fdisk /dev/vdb         # type "g" to create a partition, then "w" to write and exit
sudo mkfs.ext4 /dev/vdb     # format the partition
sudo mkdir /data
sudo mount /dev/vdb /data   # mount the parition
df -hT /data                # check it
sudo mkdir /data/work
sudo chown almalinux.almalinux -R /data/work

“The Nouveau GPU driver is an open-source graphics driver for NVIDIA GPUs, developed as part of the Linux kernel. It provides support for NVIDIA graphics cards without requiring NVIDIA’s proprietary driver. However, Nouveau is often slower and lacks full support for newer GPU features compared to the official NVIDIA driver.”

The GPU driver details below are from https://docs.alliancecan.ca/wiki/Using_cloud_vGPUs

# prevent loading of the buggy Nouveau GPU driver when the system boots
sudo sh -c "echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf"
sudo sh -c "echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf"
sudo dracut -fv --omit-drivers nouveau
sudo dnf -y update
# sudo dnf -y install epel-release   # already done
sudo reboot

# install the vGPU driver
# sudo dnf remove libglvnd-gles-1:1.3.4-1.el9.x86_64
# sudo dnf remove libglvnd-glx-1:1.3.4-1.el9.x86_64
sudo dnf -y install http://repo.arbutus.cloud.alliancecan.ca/pulp/repos/alma9/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el9.noarch.rpm   # install the Arbutus vGPU Cloud repository
sudo dnf -y install nvidia-vgpu-gridd.x86_64 nvidia-vgpu-tools.x86_64 nvidia-vgpu-kmod.x86_64
nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  GRID V100D-8C                  On  |   00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0             N/A /  N/A  |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

# install CUDA
SRC=https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers
wget $SRC/cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm   # 4.6GB download
sudo dnf -y install cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm
sudo dnf clean all
sudo dnf -y install cuda-toolkit-12-4
>>> do not delete cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm (will be needed later)

now there is /usr/local/cuda-12.4/bin/nvcc

Installing Chapel with GPU support natively

This step is optional, just to test GPU Chapel without installing it into the container.

almalinux@alex-gpu-testing
wget https://github.com/chapel-lang/chapel/releases/download/2.3.0/chapel-2.3.0.tar.gz
tar xvfz chapel-2.3.0.tar.gz
cd chapel-2.3.0
source util/setchplenv.bash
export CHPL_LLVM=bundled
export CHPL_COMM=none
export CHPL_TARGET_CPU=none   # full list https://chapel-lang.org/docs/usingchapel/chplenv.html#chpl-target-cpu
export CHPL_LOCALE_MODEL=gpu
export CHPL_GPU=nvidia
export CHPL_CUDA_PATH=/usr/local/cuda-12.4
mkdir -p ~/c1/chapel-2.3.0 && /bin/rm -rf ~/c1/chapel-2.3.0/*
./configure --chpl-home=$HOME/c1/chapel-2.3.0   # inspect the settings
make -j4
make install

source ~/startSingleLocaleGPU.sh
git clone git@bitbucket.org:razoumov/chapel.git ~/chapelCourse
cd ~/chapelCourse/gpu
chpl --fast probeGPU.chpl
./probeGPU
cd ../juliaSet
chpl --fast juliaSetSerial.chpl
chpl --fast juliaSetGPU.chpl
./juliaSetSerial --n=8000   # 9.28693s
./juliaSetGPU --n=8000      # 0.075753s
cd ../primeFactorization
chpl --fast primesGPU.chpl
./primesGPU --n=10_000_000   # 0.04065s; A = 4561 1428578 5000001 4894 49

Building a Chapel GPU container

--nv does not work with --writable, so you can’t create a writable sandbox that mounts the host’s GPU drivers and libraries, and into which you would install GPU Chapel. There are some solutions around this:

You could do this via a writable overlay image, creating and then starting an immutable SIF container with --nv and then installing GPU Chapel into the overlay. It works, but in my experience this is not the best option from the performance standpoint.
You can install the NVIDIA Container Toolkit:

URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
curl -s -L $URL | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y nvidia-container-toolkit

and then combine --nv and --writable:

apptainer build --sandbox almalinux.dir docker://almalinux
export NVIDIA_DRIVER_CAPABILITIES="compute,utility"
sudo apptainer shell --writable --nv --nvccli almalinux.dir

but I don’t see nvcc inside the container (maybe I am missing something obvious?).

You can bootstrap from an NVIDIA development image, and compile GPU Chapel in there.
Install CUDA into the container and use it to compile GPU Chapel:

cd /data/work
mkdir tmp
export APPTAINER_TMPDIR=/data/work/tmp
apptainer build --sandbox almalinux.dir docker://almalinux   # sudo not yet required
mkdir almalinux.dir/source
mv ~/cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm almalinux.dir/source/
sudo apptainer shell --writable almalinux.dir
Apptainer> dnf check-update
           dnf update -y
           dnf install -y cmake gcc g++ python3 wget
           # install CUDA inside the container
           dnf -y install /source/cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm
           dnf clean all   # remove cached package data
           dnf -y install cuda-toolkit-12-4
           /bin/rm /source/cuda-repo-rhel9-12-4-local-12.4.0_550.54.14-1.x86_64.rpm

mkdir almalinux.dir/c1/
cp ~/chapel-2.3.0.tar.gz almalinux.dir/source

sudo apptainer shell --writable almalinux.dir
Apptainer> cd /source
           tar xvfz chapel-2.3.0.tar.gz
           cd chapel-2.3.0
           source util/setchplenv.bash
           export CHPL_LLVM=bundled
           export CHPL_COMM=none
           export CHPL_TARGET_CPU=none
           export CHPL_LOCALE_MODEL=gpu
           export CHPL_GPU=nvidia
           export CHPL_CUDA_PATH=/usr/local/cuda-12.4
           mkdir -p /c1/chapel-2.3.0 && /bin/rm -rf /c1/chapel-2.3.0/*
           ./configure --chpl-home=/c1/chapel-2.3.0
           make -j4
           make install
           /bin/rm -r /source

sudo apptainer build almalinux.sif almalinux.dir
sudo apptainer shell --nv almalinux.sif
Apptainer> nvidia-smi   # should show the same info as above

Testing on the VM

cd
sudo apptainer shell --nv /data/work/almalinux.sif
cd chapelCourse/gpu
source /c1/chapel-2.3.0/util/setchplenv.bash
export CHPL_GPU=nvidia
export CHPL_CUDA_PATH=/usr/local/cuda-12.4
export PATH=$CHPL_CUDA_PATH/bin:$PATH
make clean
chpl --fast probeGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./probeGPU
cd ../juliaSet
make clean
chpl --fast juliaSetSerial.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
chpl --fast juliaSetGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./juliaSetSerial --n=8000   # 9.36665s
./juliaSetGPU --n=8000      # 0.06692s
cd ../primeFactorization
chpl --fast primesGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./primesGPU --n=10_000_000   # 0.039443s; A = 4561 1428578 5000001 4894 49

cd /data/work
scp almalinux.sif razoumov@fir.alliancecan.ca:/project/6003910/razoumov/apptainerImages/chapelGPU202503
scp almalinux.sif razoumov@rorqual.alliancecan.ca:scratch

Testing on Fir

fir
cd /project/6003910/razoumov/apptainerImages/chapelGPU202503
# salloc --time=0:30:0 --mem-per-cpu=3600 --gpus-per-node=1 --account=def-razoumov-ac
salloc --time=0:30:0 --mem-per-cpu=3600 --gpus-per-node=v100l:1 \
       --account=cc-debug --reservation=asasfu_756
nvidia-smi
module load apptainer
apptainer shell --nv -B $SLURM_TMPDIR almalinux.sif
source /c1/chapel-2.3.0/util/setchplenv.bash
export CHPL_GPU=nvidia
export CHPL_CUDA_PATH=/usr/local/cuda-12.4
export PATH=$PATH:/usr/local/cuda-12.4/bin
cp -r ~/chapelCourse $SLURM_TMPDIR
cd $SLURM_TMPDIR/chapelCourse/gpu
make clean
chpl --fast probeGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./probeGPU
cd ../juliaSet
chpl --fast juliaSetSerial.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
chpl --fast juliaSetGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./juliaSetSerial --n=8000   # 12.4182s
./juliaSetGPU --n=8000      # 0.083108s
cd ../primeFactorization
chpl --fast primesGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./primesGPU --n=10_000_000   # 0.166764s; A = 4561 1428578 5000001 4894 49

Testing on Rorqual

rorqual
cd ~/chapelCourse/codes
salloc --time=0:30:0 --nodes=1 --cpus-per-task=1 --mem-per-cpu=3600 --gpus-per-node=1 --account=cc-debug
nvidia-smi
module load apptainer
export CDIR=~/scratch/chapelGPU20240826
apptainer shell --nv --overlay ${CDIR}/extra.img:ro ${CDIR}/almalinux.sif
source /extra/c1/chapel-2.3.0/util/setchplenv.bash
export CHPL_GPU=nvidia
export CHPL_CUDA_PATH=/usr/local/cuda-12.4
export PATH=$PATH:/usr/local/cuda-12.4/bin
cd ~/tmp/2024/
chpl --fast probeGPU.chpl -L/usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs
./probeGPU
chpl --fast juliaSetGPU.chpl -L/usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs
./juliaSetGPU
./juliaSetGPU --height=8000

Using Grigory’s 2-step build, now as a 1-step build

multi-stage builds https://docs.sylabs.io/guides/latest/user-guide/definition_files.html#multi-stage-builds
available containers https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nvhpc/tags
new one is nvcr.io/nvidia/nvhpc:25.1-devel-cuda_multi-ubuntu24.04
older one nvcr.io/nvidia/nvhpc:23.11-devel-cuda_multi-ubuntu22.04

cd /data/work
export APPTAINER_TMPDIR=/data/work/tmp
apptainer build --nv test.sif docker://nvcr.io/nvidia/nvhpc:24.3-devel-cuda_multi-ubuntu22.04
sudo apptainer shell --nv test.sif
find / -name nvcc   # check if their CUDA installation includes `nvcc` (needed for GPU Chapel runtime)

cd /data/work
>>> create single.def

BootStrap: docker   # this is `single.def`
From: nvcr.io/nvidia/nvhpc:24.3-devel-cuda_multi-ubuntu22.04
Stage: build
%post
    . /.singularity.d/env/10-docker*.sh
    apt update -y
    apt install -y python3
    mkdir /source && cd /source
    wget https://github.com/chapel-lang/chapel/releases/download/2.3.0/chapel-2.3.0.tar.gz
    tar xvfz chapel-2.3.0.tar.gz
    cd chapel-2.3.0
    . util/setchplenv.sh
    export CHPL_LLVM=bundled
    export CHPL_COMM=none
    export CHPL_TARGET_CPU=none
    export CHPL_LOCALE_MODEL=gpu
    export CHPL_GPU=nvidia
    export CHPL_CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.3
    mkdir -p /c1/chapel-2.3.0
    ./configure --chpl-home=/c1/chapel-2.3.0
    make -j4
    make install
    /bin/rm -r /source

export APPTAINER_TMPDIR=/data/work/tmp
apptainer build --nv ubuntu.sif single.def

cd
sudo apptainer shell --nv /data/work/ubuntu.sif
cd chapelCourse/gpu
source /c1/chapel-2.3.0/util/setchplenv.bash
export CHPL_GPU=nvidia
export CHPL_CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/12.3
export PATH=$CHPL_CUDA_PATH/bin/:$PATH
make clean
chpl --fast probeGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./probeGPU
cd ../juliaSet
make clean
chpl --fast juliaSetSerial.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
chpl --fast juliaSetGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./juliaSetSerial --n=8000   # 9.36665s
./juliaSetGPU --n=8000      # 0.06692s
cd ../primeFactorization
chpl --fast primesGPU.chpl -L${CHPL_CUDA_PATH}/targets/x86_64-linux/lib/stubs
./primesGPU --n=10_000_000   # 0.039443s; A = 4561 1428578 5000001 4894 49

cd /data/work
scp ubuntu.sif razoumov@fir.alliancecan.ca:/project/6003910/razoumov/apptainerImages/chapelGPU202503
scp ubuntu.sif razoumov@rorqual.alliancecan.ca:scratch