What is Apptainer (formerly Singularity)
Until November 2021, Apptainer was known as Singularity. In 2021, stewardship of parts of the Singularity project was transferred to the Linux Foundation, and the fully open-source version was renamed Apptainer, while the commercial fork continues to be called Singularity.
- Apptainer is an open-source project developed within the research community since 2015, started at the Lawrence Berkeley National Lab.
- Its goal is to create a portable system to run Linux applications on HPC clusters independently of the specific host Linux version and distro
- in other words, distribute software and its compute environment
- Apptainer creates a custom, secure virtual Linux environment (a container) that is different from the host Linux system.
- e.g., on a CentOS/Rocky Linux machine you can create a virtual Ubuntu system where you can install any precompiled packaged software from the Ubuntu repositories
- in a sense, gives you control of your software environment without being
rooton the host system (with a catch: creating containers from scratch usually requiresroot)
- Apptainer is different from Docker, as it does not require
rootaccess on the host system to run it- specifically designed for running containers on multi-user HPC clusters
- On a Linux host Apptainer is very lightweight compared to a full virtual machine (VM).
- On MacOS or Windows hosts Apptainer can be deployed inside a VM, as you still need a Linux host layer
- Apptainer quickly became a way to package and deploy scientific software and its dependencies to different HPC systems.
Technical details
From the technical standpoint, Apptainer uses:
- kernel namespaces to virtualize and isolate OS resources – CPU, memory access, disk I/O, network access, user/group namespaces – so that processes inside the container see only a specific, virtualized set of resources
- Linux control groups (cgroups) to control and limit the use of these resources
- overlay images to enable writable filesystems in otherwise read-only containers
Why use a container
Idea: package and distribute the software environment along with the application, i.e. create a portable software environment.
Why:
- avoid compiling complex software chains from scratch for the host’s Linux OS and run it instead in the environment where it is available as a package,
- run older software that is now hard to compile (missing dependencies),
- use a familiar software environment across different HPC centres, independently of the underlying system
- popular, but somewhat dubious reason: data reproducibility (use the same software environment as the authors ⮕ same result)
Why/when not to use a container
Do not use Apptainer if your software is already installed on the Alliance clusters. Learning and understanding Apptainer is more difficult than learning how to use our software modules or pre-compiled Python packages.
In most cases, an off-the-shelf Apptainer image will meet your needs. Only build your own Apptainer images if you have a strong reason to require a custom one. In my experience, most users who believe they need a custom image actually don’t – but when you truly do, it is an excellent solution!
Installing/running Apptainer on your own computer
Apptainer was originally developed for use on HPC clusters, but you can also run it on your own computer:
| Host OS | Run Apptainer |
|---|---|
| Linux | Install and use an Apptainer package1 |
| Any host OS | In a VM running Linux |
| Windows or MacOS | Inside Vagrant |
| Windows or MacOS | Inside Docker (download a Docker image with Apptainer installed) |
Glossary
An image is a bundle of files including an operating system, software and potentially data and other application-related files. Apptainer uses the Singularity Image Format (SIF), and images are provided as single .sif files.
A container is a virtual environment that is based on an image. You can start multiple container instances from an image.
An operating system (OS) is all the software that let you interact with a computer, run applications, UI, etc, consists of the “kernel” and “userland” parts.
A kernel is the central piece of software that manages hardware and provides resources (CPU, I/O, memory, devices, filesystems) to the processes it is running.
A filesystem is an organized collection of files. Under UNIX/Linux, there is a single hierarchy under /, and additional filesystems are “mounted” somewhere under that hierarchy.
Containers vs virtual machines
- Container = the OS-level mechanism to isolate some parts of the OS along with a given application.
- virtualizes an operating system
- lets you run an application compiled for a specific Linux OS on another Linux OS
- almost no performance overhead
- Virtual machine (VM) = complete isolation from the host OS via virtualized hardware
- virtualizes hardware
- maximum flexibility, can mix any combination of host and guest OS’s
- significant performance overhead, as you run on simulated hardware
Docker: container platform for services, runs as root on the host system, uses cgroups for resource management between different VMs on a given node, very popular with software developers, can’t really use it on HPC systems (no root or sudo possible for users on clusters + cgroups resource management will conflict with HPC resource managers).
Apptainer: run containers entirely in user space, as a user, can use existing Docker containers (Apptainer will convert them to proper SIF images for you), works seamlessly with the schedulers.
There are few other container engines focusing on specific features.
Apptainer on HPC systems
We will now distribute usernames and passwords for our training cluster.
Let’s log in to the training cluster apptainer.vastcloud.org and try loading Apptainer:
module load apptainer/1.3.5 # the default version at the time of writing
apptainer --version
which apptainer
apptainer # show the list of available commandsApart from this short example, please do not run Apptainer on a cluster’s login node. Apptainer can be quite resource-demanding, so we will run on a compute node inside a Slurm job. I will explain how to do that in the next section. The same applies to our production clusters: always schedule either an interactive or a batch job to run Apptainer workflows.
Footnotes
When running a longer version of this course after a Cloud course, we install Apptainer as a package inside our VM.↩︎