# Parallel Processing with Metakernel

Metakernel uses the `ipyparallel` system for running code in parallel. This notebook demonstrates the process using the kernel Calysto Scheme. However, other Metakernel-based kernels may also work. The kernel needs to be able to return values, and implement kernel.set_variable() and kernel.env.

Before opening a notebook, you should run the following command. This example starts 10 nodes in the cluster:

```shell
ipcluster start -n 10 --ip=10.0.0.190
```

We then initiate the parallel communication using the `%parallel` line magic. In this example, the 10 distributed kernels will be the same type of kernel as the host kernel, namely Calysto Scheme. We call %parallel with the name of the module and the name of the class of the kernel:

In [1]:
%parallel calysto_scheme CalystoScheme

got unknown result: 834e76c8-bee8d3c516994eed98c32d1c


To see if everything is working, we can "parallel execute" using the %%px cell magic:

In [2]:
%%px

(* cluster_rank cluster_rank)

#10(0 1 4 9 16 25 36 49 64 81)

If we don't want to return the results, but rather save them in a variable, use the `--set_variable` flag followed by the name of the variable. That sets the variable in the host kernel:

In [3]:
%%px --set_variable results

(* cluster_rank cluster_rank)

And we then have access to it:

In [4]:
results

#10(0 1 4 9 16 25 36 49 64 81)

Now, we use the Metakernel `%%%` triple magic to initiate "sticky magics". This means that the `%%px` magic will be active for the following cells, until we turn it off. So, all of the next few cells will be sent to all of the cluster kernels. The `-e` means to also evaluate the cell in the host kernel as well:

In [5]:
%%%px -e

cluster_size

px added to session magics.



#10(10 10 10 10 10 10 10 10 10 10)

Now, we define some useful Scheme code for creating a `for` syntax:

In [6]:
(define-syntax for
  [(for ?exp times do . ?bodies)
   (for-repeat ?exp (lambda () . ?bodies))]
  [(for ?var in ?exp do . ?bodies)
   (for-iterate1 ?exp (lambda (?var) . ?bodies))]
  [(for ?var at (?i) in ?exp do . ?bodies)
   (for-iterate2 0 ?exp (lambda (?var ?i) . ?bodies))]
  [(for ?var at (?i ?j . ?rest) in ?exp do . ?bodies)
   (for ?var at (?i) in ?exp do
    (for ?var at (?j . ?rest) in ?var do . ?bodies))])

(define for-repeat
  (lambda (n f)
    (if (< n 1)
    'done
    (begin
      (f)
      (for-repeat (- n 1) f)))))

(define for-iterate1
  (lambda (values f)
    (if (null? values)
    'done
    (begin
      (f (car values))
      (for-iterate1 (cdr values) f)))))

(define for-iterate2
  (lambda (i values f)
    (if (null? values)
    'done
    (begin
      (f (car values) i)
      (for-iterate2 (+ i 1) (cdr values) f)))))

Now, let's solve a problem in parallel! For this example, we will compute the Mandelbrot set:

In [7]:
(define MAX 255)
(define n  30)
(define xc -0.5)
(define yc 0)
(define size 2)

(define mandelbrot
  (lambda (z0 limit)
    (let loop ((z z0)
               (t 0))
      (if (> (abs z) 2.0)
          t
          (if (> t limit)
              limit
              (loop (+ (* z z) z0) (+ t 1)))))))

(define alphabet "@MHWRmEBSQKUGgqyp$8XDPFwdbkA&0ZTNhe9654YV*Cnsyza%3OLxo2JufIrc][vt}{71lji?|+)(=;-_!~:/,^.` ")

(define ascii
  (lambda (i)
      (get-item alphabet (int (* (/ i 256) 90)))))

We need a data structure to store our results, a 2D matrix (vector of vectors):

In [8]:
(define matrix 
  (list->vector 
    (map list->vector 
         (map range 
              (map (lambda (v) n) 
                   (range n))))))

And an easy way to get/set the 2D location:

In [9]:
(define mget
  (lambda (matrix x y)
    (vector-ref (vector-ref matrix x) y)))

(define mset!
  (lambda (matrix x y value)
    (vector-set! (vector-ref matrix x) y value)))

Let's get location (10, 10) from all of the matrices:

In [10]:
(mget matrix 10 10)

#10(10 10 10 10 10 10 10 10 10 10)

Good, they are all "10". Now let's set all of these locations to the string "a":

In [11]:
(mset! matrix 10 10 "a")

And check:

In [12]:
(mget matrix 10 10)

#10("a" "a" "a" "a" "a" "a" "a" "a" "a" "a")

Ok, good. everything is working. Let's now run some code just on the host. So, we remove the sticky magics:

In [13]:
%%%px

%%px removed from session magics.



How long does it take to compute a small picture of the Mandelbrot:

In [14]:
%%time
(for row in (range n) do
    (for col in (range n) do
         (let* ((x0 (+ (- xc (/ size 2)) (* size (/ col n))))
                (y0 (+ (- yc (/ size 2)) (* size (/ row n))))
                (z0 (complex x0 y0))
                (gray (- MAX (mandelbrot z0 MAX))))
            (printf "~a" (ascii gray))))
    (printf "~%"))

 ``````````````````.^^^```````
```````````````````.^~..``````
``````````````````.;/@^^``````
````````````````...l@@@^.`````
``````````````....^!@@@^....``
`````````````.~^^-o/@@:,^^../`
````````````..:@/@@@@@@@@/^/:.
```````````..^/@@@@@@@@@@@2@,.
````.````...~:@@@@@@@@@@@@@~^.
```.^.......^;@@@@@@@@@@@@@@,^
``..,^^/^^.^!@@@@@@@@@@@@@@@@~
``..^-:!!!^^)@@@@@@@@@@@@@@@@^
``..,;@@@@1/@@@@@@@@@@@@@@@@@=
..^^~@@@@@@!@@@@@@@@@@@@@@@@@.
.^!;@@@@@@@@@@@@@@@@@@@@@@@@^.
@@@@@@@@@@@@@@@@@@@@@@@@@@@,..
.^!;@@@@@@@@@@@@@@@@@@@@@@@@^.
..^^~@@@@@@!@@@@@@@@@@@@@@@@@.
``..,;@@@@1/@@@@@@@@@@@@@@@@@=
``..^-:!!!^^)@@@@@@@@@@@@@@@@^
``..,^^/^^.^!@@@@@@@@@@@@@@@@~
```.^.......^;@@@@@@@@@@@@@@,^
````.````...~:@@@@@@@@@@@@@~^.
```````````..^/@@@@@@@@@@@2@,.
````````````..:@/@@@@@@@@/^/:.
`````````````.~^^-o/@@:,^^../`
``````````````....^!@@@^....``
````````````````...l@@@^.`````
``````````````````.;/@^^``````
```````````````````.^~..``````
Time: 75.23063731193542 seconds.



done

Wow, that took 75 seconds on my computer. Note that this code could easily be parallelized by having each node in the cluster do a portion of the problem. 

Now, we execute the following cell on all nodes, but note that each node will do a different portion:

In [15]:
%%time
%%px --set_variable results

(define portion (int (/ n cluster_size)))

(for col in (range n) do
    (for row in (range (* cluster_rank portion) 
                       (* (+ cluster_rank 1) portion)) do
         (let* ((x0 (+ (- xc (/ size 2)) (* size (/ col n))))
                (y0 (+ (- yc (/ size 2)) (* size (/ row n))))
                (z0 (complex x0 y0))
                (gray (- MAX (mandelbrot z0 MAX))))
            (mset! matrix col row (ascii gray))
            (printf ".")))
     (printf "\n"))

matrix

Time: 19.94576096534729 seconds.



That went much faster, but not 10 times faster. Mostly because I really don't have 10 CPUs on my computer, so the nodes were being shared across fewer CPUs. There was also a little overhead time in collecting the results. But still, 20 seconds is much faster than 75 seconds.

Did the cluster produce the same results as a single computer?

In [17]:
(define portion (int (/ n cluster_size)))

(for row in (range n) do
  (for col in (range n) do
     (let ((vec (// row portion)))
         (printf "~a" (mget (vector-ref results vec) col row))))
    (printf "\n"))

 ``````````````````.^^^```````
```````````````````.^~..``````
``````````````````.;/@^^``````
````````````````...l@@@^.`````
``````````````....^!@@@^....``
`````````````.~^^-o/@@:,^^../`
````````````..:@/@@@@@@@@/^/:.
```````````..^/@@@@@@@@@@@2@,.
````.````...~:@@@@@@@@@@@@@~^.
```.^.......^;@@@@@@@@@@@@@@,^
``..,^^/^^.^!@@@@@@@@@@@@@@@@~
``..^-:!!!^^)@@@@@@@@@@@@@@@@^
``..,;@@@@1/@@@@@@@@@@@@@@@@@=
..^^~@@@@@@!@@@@@@@@@@@@@@@@@.
.^!;@@@@@@@@@@@@@@@@@@@@@@@@^.
@@@@@@@@@@@@@@@@@@@@@@@@@@@,..
.^!;@@@@@@@@@@@@@@@@@@@@@@@@^.
..^^~@@@@@@!@@@@@@@@@@@@@@@@@.
``..,;@@@@1/@@@@@@@@@@@@@@@@@=
``..^-:!!!^^)@@@@@@@@@@@@@@@@^
``..,^^/^^.^!@@@@@@@@@@@@@@@@~
```.^.......^;@@@@@@@@@@@@@@,^
````.````...~:@@@@@@@@@@@@@~^.
```````````..^/@@@@@@@@@@@2@,.
````````````..:@/@@@@@@@@/^/:.
`````````````.~^^-o/@@:,^^../`
``````````````....^!@@@^....``
````````````````...l@@@^.`````
``````````````````.;/@^^``````
```````````````````.^~..``````


done

Great! 