Julia set on a single locale

Recall the serial code juliaSetSerial.chpl (without output):

use Time;

config const c = 0.355 + 0.355i;

proc pixel(z0) {
  var z = z0*1.2;   // zoom out
  for i in 1..255 {
    z = z*z + c;
    if abs(z) >= 4 then
      return i;
  }
  return 255;
}

config const n = 2_000;   // vertical and horizontal size of our image
var y: real;
var point: complex;
var watch: stopwatch;

writeln("Computing ", n, "x", n, " Julia set ...");
var stability: [1..n,1..n] int;
watch.start();
for i in 1..n {
  y = 2*(i-0.5)/n - 1;
  for j in 1..n {
    point = 2*(j-0.5)/n - 1 + y*1i;   // rescale to -1:1 in the complex plane
    stability[i,j] = pixel(point);
  }
}
watch.stop();
writeln('It took ', watch.elapsed(), ' seconds');

Now let’s parallelize this code with forall in shared memory (single locale). Copy juliaSetSerial.chpl into juliaSetParallel.chpl and start modifying it:

For the outer loop, replace for with forall. This will produce an error about the scope of variables y and point:

error: cannot assign to const variable
note: The shadow variable 'y' is constant due to task intents in this loop
error: cannot assign to const variable
note: The shadow variable 'point' is constant due to task intents in this loop

Discussion

Why do you think this message was produced? How do we solve this problem?
Hint: each thread needs its own separate copy of these two variables.

What do we do next?

Compile and run the code on several CPU cores on 1 node:

# module load chapel-multicore/2.4.0
chpl --fast juliaSetParallel.chpl
srun --mem-per-cpu=3600 --cpus-per-task=4 ./juliaSetParallel

Once you have the working shared-memory parallel code, study its performance.

Here are my timings on the training cluster:

ncores	1	2	4	8
wallclock runtime (sec)	1.181	0.568	0.307	0.197

Discussion

Why do you think the code’s speed does not scale linearly (~6X on 8 cores) with the number of cores?