Parallelizing the Julia set with ThreadsX
So far with ThreadsX, most of our parallel codes featured reduction – recall the functions ThreadsX.mapreduce() and ThreadsX.sum(). However, in the Julia set problem we want to fill an array without reduction.
Let’s first modify the serial code! We will use another function from Base library:
?map
map(x -> x * 2, [1, 2, 3])
map(+, [1, 2, 3], [10, 20, 30, 400, 5000]) # not a reduction!Let’s modify our serial code juliaSetSerial.jl:
- Define the complex array of points
point = zeros(Complex{Float32}, height, width);. - Precompute these points
for i in 1:height, j in 1:width
point[i,j] = (2*(j-0.5)/width-1) + (2*(i-0.5)/height-1)im # rescale to -1:1 in the complex plane
end- Replace the loop for computing
stabilitywith
stability = @btime map(pixel, point); # no const keyword this time!and remove the original definition of the stability array: it is now defined in the line above.
Running this new, vectorized version of the serial code on my laptop, I see @btime report 1.011 s.
Parallelizing the vectorized code
- Load ThreadsX library.
- Replace
map()withThreadsX.map().
With 8 threads on my laptop, the runtime went down to 180.815 – 5.6X speedup. On 8 cores on the training cluster I see 6.5X speedup.
Alternative parallel solution
Jeremiah O’Neil suggested an alternative, slightly faster (and not requiring the point array) implementation using ThreadsX.foreach (not covered in this workshop):
function juliaSet(height, width)
stability = zeros(Int32, height, width)
ThreadsX.foreach(1:height) do i
for j = 1:width
point = (2*(j-0.5)/width-1) + (2*(i-0.5)/height-1)im
stability[i,j] = pixel(point)
end
end
return stability
endRunning multi-threaded Julia codes on a production cluster
Before we jump to multi-processing in Julia, let us remind you how to run multi-threaded Julia codes on an HPC cluster.
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=...
#SBATCH --mem-per-cpu=3600M
#SBATCH --time=00:10:00
#SBATCH --account=def-user
module load julia
julia -t $SLURM_CPUS_PER_TASK juliaSetThreadsX.jlThere may be some other lines after loading the Julia module, e.g. setting some variables, if you have installed packages into a non-standard location (see our introduction.
Running the last example on Cedar cluster with julia/1.7.0, @btime reported 2.467 s (serial) and 180.003 ms (16 cores) – 13.7X speedup.