File: Dataflow.pod

package info (click to toggle)
pdl 1%3A2.100-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 6,816 kB
  • sloc: perl: 22,587; ansic: 14,969; sh: 31; makefile: 30; sed: 6
file content (388 lines) | stat: -rw-r--r-- 10,701 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
=head1 NAME

PDL::Dataflow -- description of the dataflow implementation and philosophy

=head1 SYNOPSIS

	pdl> $x = zeroes(10);
	pdl> $y = $x->slice("2:4:2");
	pdl> $y ++;
	pdl> print $x;
	[0 0 1 0 1 0 0 0 0 0]

=head1 DESCRIPTION

As of 2.079, this is now a description of the current implementation,
together with some design thoughts from its original author, Tuomas Lukka.

Two-directional dataflow (which implements C<< ->slice() >> etc.)
is fully functional, as shown in the SYNOPSIS. One-way is implemented,
but with restrictions.

=head1 TWO-WAY

Just about any function which returns some subset of the values in some
ndarray will make a binding.  C<$y> has become effectively a window to
some sub-elements of C<$x>. You can also define your own routines that
do different types of subsets. If you don't want C<$y> to be a window
to C<$x>, you must do

	$y = $x->slice("some parts")->sever;

The C<sever> destroys the C<slice> transform, thereby turning off all dataflow
between the two ndarrays.

=head2 Type conversions

This works, thanks to a two-way flowing transform that implements
type-conversions, particularly for supplied outputs of the "wrong"
type for the given transform:

  pdl> $a_bad = pdl double, '[1 BAD 3]';
  pdl> $b_float = zeroes float, 3;
  pdl> $a_bad->assgn($b_float); # could be written as $b_float .= $a_bad
  pdl> p $b_float->badflag;
  1
  pdl> p $b_float;
  [1 BAD 3]

=head1 ONE-WAY

You need to explicitly turn on one-way dataflow on an ndarray to activate
it for non-flowing operations (see L<PDL::Core/flowing>), so

	pdl> $x = pdl 2,3,4;
	pdl> $y = $x->flowing * 2;
	pdl> print $y;
	[4 6 8]
	pdl> $x->set(0,5);
	pdl> print $y;
	[10 6 8]

C<flowing> operates by turning on a flag that the core turns off
immediately after acting on it (rather like C<inplace>), so it only
operates on the next operation (aka transform) called on that ndarray,
making that operation be a forward-flowing one. That lasts until the
output(s) of that operation get C<sever>ed.

It is not possible to turn on backwards dataflow (such as is used by
C<slice>-type operations), because there is no general way for PDL (or
maths, in fact) to know how to reverse most operations - consider
C<$z = $x * $y>, then adding one to C<$z>.

Consider the following code:

	$u = sequence(3,3);
	$v = ones(3,3);
	$w = $u->flowing + $v->flowing;
	$y = $w->flowing + 1;
	$x = $w->diagonal(0,1);
	$z = $w->flowing + 2;

As of 2.090, C<$x> will I<not> have backward dataflow. This is because
PDL turns that off, on detecting an input with forward-only dataflow. This
means PDL will only create directed I<acyclic> graphs of dataflow.
It also means you can only modify the contents of C<$x> by modifying its
"upstream" data sources.

What do C<$x>, C<$y>, and C<$z> contain now?

	pdl> p $x
	[1 5 9]
	pdl> p $y
	[
	 [ 2  3  4]
	 [ 5  6  7]
	 [ 8  9 10]
	]
	pdl> p $z
	[
	 [ 3  4  5]
	 [ 6  7  8]
	 [ 9 10 11]
	]

What about when C<$u> is changed and a recalculation is triggered?

	pdl> $u += 7;
	pdl> p $x
	[8 12 16]
	pdl> p $y
	[
	 [ 9 10 11]
	 [12 13 14]
	 [15 16 17]
	]
	pdl> p $z
	[
	 [10 11 12]
	 [13 14 15]
	 [16 17 18]
	]

=head2 Example of dataflow to implement 3D space calculations

This is a complete, working example that demonstrates the use of
enduring C<flowing> relationships to model 3D entities, through a few
transformations:

  {package PDL::3Space;
  use PDL;
  sub new { my ($class, $parent) = @_;
    my $self = bless {basis_local=>identity(3), origin_local=>zeroes(3)}, $class;
    if (defined $parent) {
      $self->{parent} = $parent;
      $self->{basis} = $self->{basis_local}->flowing x $parent->{basis}->flowing;
      $self->{origin} = ($self->{origin_local}->flowing x $self->{basis}->flowing)->flowing + $parent->{origin}->flowing;
    } else {
      $self->{basis} = $self->{basis_local};
      $self->{origin} = $self->{origin_local}->flowing x $self->{basis}->flowing;
    }
    $self;
  }
  use overload '""' => sub {$_[0]{basis}->glue(1,$_[0]{origin}).''};
  sub basis_update { $_[0]{basis_local} .= $_[1] x $_[0]{basis_local} }
  sub origin_move { $_[0]{origin_local} += $_[1] }
  sub local { my $local = PDL::3Space->new; $local->{$_} .= $_[0]{$_} for qw(basis_local origin_local); $local}
  }

This is the class, heavily inspired by L<Math::3Space>, and
following discussions on interoperation between that and PDL (see
L<https://github.com/nrdvana/perl-Math-3Space/pull/8>). The C<basis>
and C<origin> members are "subscribed" to both their own local basis
and origin, and their parent's if any. The C<basis_update> and
C<origin_move> methods only update the local members, and C<basis_update>
does so in terms of its previous value.

The demonstrating code has a boat, and bird within its frame of
reference. Note that the "local" origin still gets affected by its
local basis.

The basis and origin are always in global coordinates, and thanks to
dataflow, are only recalculated on demand.

  $rot_90_about_z = PDL->pdl([0,1,0], [-1,0,0], [0,0,1]);

  $boat = PDL::3Space->new;
  print "boat=$boat";
  $bird = PDL::3Space->new($boat);
  print "bird=$bird";
  # boat=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]
  # bird=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]

  $boat->basis_update($rot_90_about_z);
  print "after boat rot:\nboat=$boat";
  print "bird=$bird";
  # after boat rot:
  # boat=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  0  0]
  # ]
  # bird=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  0  0]
  # ]

  $boat->origin_move(PDL->pdl(1,0,0));
  print "after boat move:\nboat=$boat";
  print "bird=$bird";
  print "bird local=".$bird->local;
  # after boat move:
  # boat=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  0]
  # ]
  # bird=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  0]
  # ]
  # bird local=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]

  $bird->basis_update($rot_90_about_z);
  $bird->origin_move(PDL->pdl(1,0,1));
  print "after bird rot and move:\nbird=$bird";
  print "bird local=".$bird->local;
  # after bird rot and move:
  # bird=
  # [
  #  [-1  0  0]
  #  [ 0 -1  0]
  #  [ 0  0  1]
  #  [-1  1  1]
  # ]
  # bird local=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  1]
  # ]

  $boat->basis_update(PDL::MatrixOps::identity(3) * 2);
  print "after boat expand:\nboat=$boat";
  print "bird=$bird";
  # after boat expand:
  # boat=
  # [
  #  [ 0  2  0]
  #  [-2  0  0]
  #  [ 0  0  2]
  #  [ 0  2  0]
  # ]
  # bird=
  # [
  #  [-2  0  0]
  #  [ 0 -2  0]
  #  [ 0  0  2]
  #  [-2  2  2]
  # ]

=head1 LAZY EVALUATION

In one-way flow context like the above, with:

	pdl> $y = $x * 2;

nothing will have been calculated at this point. Even the memory for
the contents of C<$y> has not been allocated. Only the command

	pdl> print $y

will actually cause C<$y> to be calculated. This is important to bear
in mind when doing performance measurements and benchmarks as well
as when tracking errors.

There is an explanation for this behaviour: it may save cycles
but more importantly, imagine the following:

	$x = pdl 2,3,4;
	$y = pdl 5,6,7;
	$z = $x->flowing + $y->flowing;
	$x->setdims([4]);
	$y->setdims([4]);
	print $z;

Now, if C<$z> were evaluated between the two resizes, an error condition
of incompatible sizes would occur.

What happens in the current version is that resizing C<$x> raises
a flag in C<$z>: C<PDL_PARENTDIMSCHANGED> and C<$y> just raises the same flag
again. When C<$z> is next evaluated, the flags are checked and it is found
that a recalculation is needed.

Of course, lazy evaluation can sometimes make debugging more painful
because errors may occur somewhere where you'd not expect them.

=head1 FAMILIES

This is one of the more intricate concepts of dataflow.
In order to make dataflow work like you'd expect, a rather strange
concept must be introduced: families. Let us make a diagram of the one-way
flow example - it uses a hypergraph because the transforms (with C<+>)
are connectors between ndarrays (with C<*>):

       u*   *v
         \ /
          +(plus)
          |
   1*     *w
     \   /|\
      \ / | \
 (plus)+  |  +(diagonal)
       |  |  |
      y*  |  *x
          |
          | *1
          |/
          +(plus)
          |
         z*

This is what PDL actually has in memory after the first three lines.
When C<$x> is changed, C<$w> changes due to C<diagonal> being a two-way operation.

If you want flow from C<$w>, you opt in using C<< $w->flowing >> (as shown
in this scenario). If you didn't, then don't enable it. If you have it
but want to stop it, call C<< $ndarray->sever >>. That will destroy the
ndarray's C<trans_parent> (here, a node marked with C<+>), and as you
can visually tell, will stop changes flowing thereafter. If you want to
leave the flow operating, but get a copy of the ndarray at that point,
use C<< $ndarray->copy >> - it will have the same data at that moment,
but have no flow relationships.

=head1 EVENTS

There is the start of a mechanism to bind events onto changed data,
intended to allow this to work:

	pdl> $x = pdl 2,3,4
	pdl> $y = $x + 1;
	pdl> $c = $y * 2;
	pdl> $c->bind( sub { print "A now: $x, C now: $c\n" } )
	pdl> PDL::dowhenidle();
	A now: [2,3,4], C now: [6 8 10]
	pdl> $x->set(0,1);
	pdl> $x->set(1,1);
	pdl> PDL::dowhenidle();
	A now: [1,1,4], C now: [4 4 10]

This hooks into PDL's C<magic> which resembles Perl's, but does not
currently operate.

There would be many kinds of uses for this feature: self-updating charts,
for instance. It is not yet fully clear whether it would be most useful
to queue up changes (useful for doing asynchronously, e.g. when idle),
or to activate things immediately.

In the 2022 era of both GPUs and multiple cores, it is a pity that
Perl's dominant model remains single-threaded on CPU, but PDL can use
multi-cores for CPU processing (albeit controlled in a single-threaded
style) - see L<PDL::ParallelCPU>. It is planned that PDL will gain the
ability to use GPUs, and there might be a way to hook that up albeit
probably with an event loop to "subscribe" to GPU events.

=head1 TRANSFORMATIONS

PDL implements nearly everything (except for XS oddities like
C<set>) using transforms which connect ndarrays. This includes data
transformations like addition, "slicing" to access/operate on subsets,
and data-type conversions (which have two-way dataflow, see
L</Type conversions>).

This does not currently include a resizing transformation, and C<setdims>
mutates its input. This is intended to change.

=head1 AUTHOR

Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu).
Same terms as the rest of PDL.