File: chap5.txt

package info (click to toggle)
gap-hap 1.73%2Bds-1
  • links: PTS
  • area: main
  • in suites: forky, sid
  • size: 58,508 kB
  • sloc: xml: 16,467; sh: 197; javascript: 155; makefile: 121; ansic: 47; perl: 24
file content (829 lines) | stat: -rw-r--r-- 48,208 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
  
  5 Topological data analysis
  
  
  5.1 Persistent homology
  
  Pairwise  distances  between  74  points  from  some  metric space have been
  recorded  and  stored  in a 74× 74 matrix D. The following commands load the
  matrix,  construct a filtration of length 100 on the first two dimensions of
  the assotiated clique complex (also known as the Vietoris-Rips Complex), and
  display  the  resulting  degree 0 persistent homology as a barcode. A single
  bar  with  label  n denotes n bars with common starting point and common end
  point.
  
    Example  
    gap> file:=HapFile("data253a.txt");;
    gap> Read(file);
    
    gap> G:=SymmetricMatrixToFilteredGraph(D,100);
    Filtered graph on 74 vertices.
    
    gap> K:=FilteredRegularCWComplex(CliqueComplex(G,2));
    Filtered regular CW-complex of dimension 2
    
    gap> P:=PersistentBettiNumbers(K,0);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The first 54 terms in the filtration each have 74 path components -- one for
  each  point  in the sample. During the next 9 filtration terms the number of
  path components reduces, meaning that sample points begin to coalesce due to
  the  formation  of  edges  in  the  simplicial  complexes.  Then,  two  path
  components  persist  over  an  interval  of 18 filtration terms, before they
  eventually coalesce.
  
  The  next  commands  display the resulting degree 1 persistent homology as a
  barcode.
  
    Example  
    gap> P:=PersistentBettiNumbers(K,1);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  Interpreting  short bars as noise, we see for instance that the 65th term in
  the  filtration  could  be  regarded as noiseless and belonging to a "stable
  interval"  in  the  filtration  with  regards  to  first and second homology
  functors.  The following command displays (up to homotopy) the 1 skeleton of
  the  simplicial  complex  arizing as the 65-th term in the filtration on the
  clique complex.
  
    Example  
    gap> Y:=FiltrationTerm(K,65);
    Regular CW-complex of dimension 1
    
    gap> Display(HomotopyGraph(Y));
    
  
  
  These  computations  suggest  that  the dataset contains two persistent path
  components  (or  clusters),  and  that  each path component is in some sense
  periodic. The final command displays one possible representation of the data
  as points on two circles.
  
  
  5.1-1 Background to the data
  
  Each  point  in the dataset was an image consisting of 732× 761 pixels. This
  point  was regarded as a vector in R^557052= R^732× 761 and the matrix D was
  constructed using the Euclidean metric. The images were the following:
  
  
  5.2 Mapper clustering
  
  The  following  example  reads in a set S of vectors of rational numbers. It
  uses  the  Euclidean  distance  d(u,v) between vectors. It fixes some vector
  u_0∈  S  and  uses  the associated function f: D→ [0,b] ⊂ R, v↦ d(u_0,v). In
  addition,  it  uses  an  open  cover of the interval [0,b] consisting of 100
  uniformly  distributed overlapping open subintervals of radius r=29. It also
  uses a simple clustering algorithm implemented in the function cluster.
  
  These  ingredients are input into the Mapper clustering procedure to produce
  a simplicial complex M which is intended to be a representation of the data.
  The  complex M is 1-dimensional and the final command uses GraphViz software
  to  visualize  the graph. The nodes of this simplicial complex are "buckets"
  containing  data  points.  A  data  point may reside in several buckets. The
  number  of  points  in the bucket determines the size of the node. Two nodes
  are connected by an edge when they contain common data points.
  
    Example  
    gap> file:=HapFile("data134.txt");;
    gap> Read(file);
    gap> dx:=EuclideanApproximatedMetric;;
    gap> dz:=EuclideanApproximatedMetric;;
    gap> L:=List(S,x->Maximum(List(S,y->dx(x,y))));;
    gap> n:=Position(L,Minimum(L));;
    gap> f:=function(x) return [dx(S[n],x)]; end;;
    gap> P:=30*[0..100];; P:=List(P, i->[i]);;
    gap> r:=29;;
    gap> epsilon:=75;;
    gap>  cluster:=function(S)
    >   local Y, P, C;
    >   if Length(S)=0 then return S; fi;
    >   Y:=VectorsToOneSkeleton(S,epsilon,dx);
    >   P:=PiZero(Y);
    >   C:=Classify([1..Length(S)],P[2]);
    >   return List(C,x->S{x});
    >  end;;
    gap> M:=Mapper(S,dx,f,dz,P,r,cluster);
    Simplicial complex of dimension 1.
    
    gap> Display(GraphOfSimplicialComplex(M));
    
  
  
  
  5.2-1 Background to the data
  
  The  datacloud  S  consists  of  the  400  points  in the plane shown in the
  following picture.
  
  
  5.3 Some tools for handling pure complexes
  
  A CW-complex X is said to be pure if all of its top-dimensional cells have a
  common  dimension.  There  are  instances  where  such  a space X provides a
  convenient  ambient  space whose subspaces can be used to model experimental
  data.  For  instance,  the  plane  X= R^2 admits a pure regular CW-structure
  whose  2-cells  are  open  unit squares with integer coordinate vertices. An
  alternative,  and sometimes preferrable, pure regular CW-structure on R^2 is
  one where the 2-cells are all reguar hexagons with sides of unit length. Any
  digital  image  can  be  thresholded to produce a black-white image and this
  black-white  image  can  naturally  be  regared  as  a  finite pure cellular
  subcomplex  of either of the two proposed CW-structures on R^2. Analogously,
  thresholding  can  be  used  to  represent 3-dimensional greyscale images as
  finite  pure cellular subspaces of cubical or permutahedral CW-structures on
  R^3,  and  to  represent RGB colour photographs as analogous subcomplexes of
  R^5.
  
  In  this  section we list a few functions for performing basic operations on
  n-dimensional  pure  cubical and pure permutahedral finite subcomplexes M of
  X=R^n.  We  refer  to  M simply as a pure complex. In subsequent sections we
  demonstrate  how  these  few  functions on pure complexes allow for in-depth
  analysis of experimental data.
  
  (Aside.  The  basic  operations  could equally well be implemented for other
  CW-decompositions of X= R^n such as the regular CW-decompositions arising as
  the  tessellations  by  a fundamental domain of a Bieberbach group (=torsion
  free  crytallographic  group).  Moreover, the basic operations could also be
  implemented  for  other manifolds such as an n-torus X=S^1× S^1 × ⋯ × S^1 or
  n-sphere  X=S^n  or for X the universal cover of some interesting hyperbolic
  3-manifold.  An example use of the ambient manifold X=S^1× S^1× S^1 could be
  for  the  construction of a cellular subspace recording the time of day, day
  of week and week of the year of crimes committed in a population.)
  
  Basic operations returning pure complexes. ( Function descriptions available
  here (../doc/chap1_mj.html#X7FD50DF6782F00A0).)
  
      PureCubicalComplex(binary array)
  
      PurePermutahedralComplex(binary array)
  
      ReadImageAsPureCubicalComplex(file,threshold)
  
      ReadImageSquenceAsPureCubicalComplex(file,threshold)
  
      PureComplexBoundary(M)
  
      PureComplexComplement(M)
  
      PureComplexRandomCell(M)
  
      PureComplexThickened(M)
  
      ContractedComplex(M, optional subcomplex of M)
  
      ExpandedComplex(M, optional supercomplex of M)
  
      PureComplexUnion(M,N)
  
      PureComplexIntersection(M,N)
  
      PureComplexDifference(M,N)
  
      FiltrationTerm(F,n)
  
  Basic operations returning filtered pure complexes.
  
      PureComplexThickeningFiltration(M,length)
  
      ReadImageAsFilteredPureCubicalComplex(file,length)
  
  
  5.4 Digital image analysis and persistent homology
  
  The  following  example  reads in a digital image as a filtered pure cubical
  complexex.  The  filtration  is  obtained  by  thresholding at a sequence of
  uniformly  spaced  values on the greyscale range. The persistent homology of
  this  filtered complex is calculated in degrees 0 and 1 and displayed as two
  barcodes.
  
    Example  
    gap> file:=HapFile("image1.3.2.png");;
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,40);
    Filtered pure cubical complex of dimension 2.
    gap> P:=PersistentBettiNumbers(F,0);;
    gap> BarCodeCompactDisplay(P);
    
  
  
    Example  
    gap> P:=PersistentBettiNumbers(F,1);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The 20 persistent bars in the degree 0 barcode suggest that the image has 20
  objects.  The  degree  1 barcode suggests that there are 14 (or possibly 17)
  holes in these 20 objects.
  
  
  5.4-1 Naive example of image segmentation by automatic thresholding
  
  Assuming that short bars and isolated points in the barcodes represent noise
  while  long  bars represent essential features, a "noiseless" representation
  of  the image should correspond to a term in the filtration corresponding to
  a column in the barcode incident with all the long bars but incident with no
  short  bars  or  isolated  points.  There  is no noiseless term in the above
  filtration  of  length 40. However (in conjunction with the next subsection)
  the  following  commands  confirm  that  the  64th term in the filtration of
  length 500 is such a term and display this term as a binary image.
  
    Example  
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,500);;
    gap> Y:=FiltrationTerm(F,64);            
    Pure cubical complex of dimension 2.
    gap> BettiNumber(Y,0);
    20
    gap> BettiNumber(Y,1);
    14
    gap> Display(Y);
    
  
  
  
  5.4-2 Refining the filtration
  
  The first filtration for the image has 40 terms. One may wish to investigate
  a  filtration with more terms, say 500 terms, with a view to analysing, say,
  those  1-cycles  that  are  born  by  term 25 of the filtration and that die
  between terms 50 and 60. The following commands produce the relevant barcode
  showing that there is precisely one such 1-cycle.
  
    Example  
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,500);;
    gap> L:=[20,60,61,62,63,64,65,66,67,68,69,70];;          
    gap> T:=FiltrationTerms(F,L);;
    gap> P0:=PersistentBettiNumbers(T,0);;
    gap> BarCodeCompactDisplay(P0);
    gap> P1:=PersistentBettiNumbers(T,1);;
    gap> BarCodeCompactDisplay(P1);
    
  
  
  β_0:
  
  β_1:
  
  
  5.4-3 Background to the data
  
  The following image was used in the example.
  
  
  5.5 A second example of digital image segmentation
  
  In order to automatically count the number of coins in this picture
  
  we  can  load  the  image as a filtered pure cubical complex F of filtration
  length  40  say,  and  observe  the  degree zero persistent Betti numbers to
  establish  that the 28-th term or so of F seems to be 'noise free' in degree
  zero.  We can then set M equal to the 28-th term of F and thicken M a couple
  of times say to remove any tiny holes it may have. We can then construct the
  complement  C  of  M.  Then  we  can  construct a 'neighbourhood thickening'
  filtration  T  of  C  with  say  50  consecutive thickenings. The degree one
  persistent  barcode  for  T  has  24 long bars, suggesting that the original
  picture consists of 24 coins.
  
    Example  
    gap> F:=ReadImageAsFilteredPureCubicalComplex("my_coins.png",40);;
    gap> M:=FiltrationTerm(F,24);;  #Chosen after viewing degree 0 barcode for F
    gap> M:=PureComplexThickened(M);;
    gap> M:=PureComplexThickened(M);;
    gap> C:=PureComplexComplement(M);;
    gap> T:=ThickeningFiltration(C,50);;
    gap> P:=PersistentBettiNumbers(T,1);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The  pure cubical complex W:=PureComplexComplement(FiltrationTerm(T,25)) has
  the  correct  number  of path components, namely 25, but its path components
  are  very  much  subsets of the regions in the image corresponding to coins.
  The complex W can be thickened repeatedly, subject to no two path components
  being  allowed  to  merge,  in  order  to  obtain  a  more  realistic  image
  segmentation  with path components corresponding more closely to coins. This
  is  done  in  the  follow  commands which use a makeshift function Basins(L)
  available  here  (tutex/basins.g).  The  commands  essentially  implement  a
  standard watershed segmentation algorithm but do so by using the language of
  filtered pure cubical complexes.
  
    Example  
    gap> W:=PureComplexComplement(FiltrationTerm(T,25));;
    gap> L:=[];;
    gap> for i in [1..PathComponentOfPureComplex(W,0)] do
    gap> P:=PathComponentOfPureComplex(W,i);;
    gap> Q:=ThickeningFiltration(P,50,M);;
    gap> Add(L,Q);;
    gap> od;;
    
    gap> B:=Basins(L);
    gap> Display(B);
    
  
  
  
  5.6 A third example of digital image segmentation
  
  The  following  image  is  number  3096  in  the  BSDS500 database of images
  (https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/) [MFTM01].
  
  A  common  first  step  in  segmenting  such  an  image  is to appropriately
  threshold the corresponding gradient image.
  
  The  following  commands  use  the  thresholded gradient image to produce an
  outline  of  the  aeroplane.  The outline is a pure cubical complex with one
  path component and with first Betti number equal to 1.
  
    Example  
    gap> file:=Filename(DirectoriesPackageLibrary("HAP"),"../tutorial/images/3096b.jpg");;
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,30);;
    gap> F:=ComplementOfFilteredPureCubicalComplex(F);;
    gap> M:=FiltrationTerm(F,27);;  #Thickening chosen based on degree 0 barcode
    gap> Display(M);;
    gap> P:=List([1..BettiNumber(M,0)],n->PathComponentOfPureComplex(M,n));;
    gap> P:=Filtered(P,m->Size(m)>10);;
    gap> M:=P[1];;
    gap> for m in P do
    > M:=PureComplexUnion(M,m);;
    > od;
    gap> T:=ThickeningFiltration(M,50);;
    gap> BettiNumber(FiltrationTerm(T,11),0);
    1
    gap> BettiNumber(FiltrationTerm(T,11),1);
    1
    gap> BettiNumber(FiltrationTerm(T,12),1);
    0
    gap> #Confirmation that 11-th filtration term has one hole and the 12-th term is contractible.
    gap> C:=FiltrationTerm(T,11);;
    gap> for n in Reversed([1..10]) do
    > C:=ContractedComplex(C,FiltrationTerm(T,n));
    > od;
    gap> C:=PureComplexBoundary(PureComplexThickened(C));;
    gap> H:=HomotopyEquivalentMinimalPureCubicalSubcomplex(FiltrationTerm(T,12),C);;
    gap> B:=ContractedComplex(PureComplexBoundary(H));;
    gap> Display(B);
    
  
  
  
  5.7 Naive example of digital image contour extraction
  
  The  following  greyscale  image  is  available  from  the  online  appendix
  (http://www.ipol.im/pub/art/2014/74/FrechetAndConnectedCompDemo.tgz)  to the
  paper [CKL14].
  
  The  following  commands produce a picture of contours from this image based
  on greyscale values. They also produce a picture of just the closed contours
  (the non-closed contours having been homotopy collapsed to points).
  
    Example  
    gap> file:=Filename(DirectoriesPackageLibrary("HAP"),"../tutorial/images/circularGradient.png");;
    gap> L:=[];;                                                          
    gap> for n in [1..15] do
    > M:=ReadImageAsPureCubicalComplex(file,n*30000);
    > M:=PureComplexBoundary(M);;
    > Add(L,M);
    > od;;
    gap> C:=L[1];;
    gap> for n in [2..Length(L)] do C:=PureComplexUnion(C,L[n]); od;
    gap> Display(C);
    gap> Display(ContractedComplex(C));
    
  
  
  Contours from the above greyscale image:
  
  Closed contours from the above greyscale image:
  
  Very   similar   results   are   obtained   when   applied   to   the   file
  circularGradientNoise.png,  containing  noise,  available  from  the  online
  appendix
  (http://www.ipol.im/pub/art/2014/74/FrechetAndConnectedCompDemo.tgz)  to the
  paper [CKL14].
  
  The  number  of  distinct  "light sources" in the image can be read from the
  countours.  Alternatively, this number can be read directly from the barcode
  produced by the following commands.
  
    Example  
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,20);;
    gap> P:=PersistentBettiNumbersAlt(F,1);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The seventeen bars in the barcode correspond to seventeen light sources. The
  length of a bar is a measure of the "persistence" of the corresponding light
  source. A long bar may initially represent a cluster of several lights whose
  members  may  eventually  be  distinguished  from each other as new bars (or
  persistent homology classes) are created.
  
  Here  the  command  PersistentBettiNumbersAlt has been used. This command is
  explained in the following section.
  
  The  follwowing  commands  use  a  watershed method to partition the digital
  image  into  regions,  one  region  per  light  source. A makeshift function
  Basins(L),  available  here  (tutex/basins.g), is called. (The efficiency of
  the  example  could  be  easily  improved.  For  simplicity  it uses generic
  commands  which,  in  principle, can be applied to cubical or permutarhedral
  complexes of higher dimensions.)
  
    Example  
    gap> file:=Filename(DirectoriesPackageLibrary("HAP"),"../tutorial/images/circularGradient.png");;
    gap> F:=ReadImageAsFilteredPureCubicalComplex(file,20);;
    gap> FF:=ComplementOfFilteredPureCubicalComplex(F);
    
    gap> W:=(FiltrationTerm(FF,3));
    gap> for n in [4..23] do
    > L:=[];;
    > for i in [1..PathComponentOfPureComplex(W,0)] do
    >  P:=PathComponentOfPureComplex(W,i);;
    >  Q:=ThickeningFiltration(P,150,FiltrationTerm(FF,n));;
    >  Add(L,Q);;
    > od;;
    > W:=Basins(L);
    > od;
    
    gap> C:=PureComplexComplement(W);;
    gap> T:=PureComplexThickened(C);; C:=ContractedComplex(T,C);;  
    gap> Display(C);
    
  
  
  
  5.8 Alternative approaches to computing persistent homology
  
  From any sequence X_0 ⊂ X_1 ⊂ X_2 ⊂ ⋯ ⊂ X_T of cellular spaces (such as pure
  cubical complexes, or cubical complexes, or simplicial complexes, or regular
  CW  complexes) we can construct a filtered chain complex C_∗ X_0 ⊂ C_∗ X_1 ⊂
  C_∗  X_2  ⊂  ⋯ C_∗ X_T. The induced homology homomorphisms H_n(C_∗ X_0, F) →
  H_n(C_∗ X_1, F) → H_n(C_∗ X_2, F) → ⋯ → H_n(C_∗ X_T, F) with coefficients in
  a  field F can be computed by applying an appropriate sequence of elementary
  row operations to the boundary matrices in the chain complex C_∗ X_T⊗ F; the
  boundary  matrices  are  sparse  and  are  best represented as such; the row
  operations  need  to  be  applied in a fashion that respects the filtration.
  This method is used in the above examples of persistent homology. The method
  is not practical when the number of cells in X_T is large.
  
  An  alternative approach is to construct an admissible discrete vector field
  on  each  term  X_k  in  the  filtration.  For  each vector field there is a
  non-regular  CW-complex  Y_k whose cells correspond to the critical cells in
  X_k  and  for which there is a homotopy equivalence X_k≃ Y_k. For each k the
  composite  homomorphism H_n(C_∗ Y_k, F) stackrel≅→ H_n(C_∗ X_k, F) → H_n(C_∗
  X_k+1,  F)  stackrel≅→  H_n(C_∗ Y_k+1, F) can be computed and the persistent
  homology  can  be  derived from these homology homomorphisms. This method is
  implemented  in the function PersistentBettiNUmbersAlt(X,n,p) where p is the
  characteristic  of  the  field,  n  is  the  homology degree, and X can be a
  filtered  pure  cubical  complex,  or  a  filtered  simplicial complex, or a
  filtered regular CW complex, or indeed a filtered chain complex (represented
  in    sparse    form).    This    function    incorporates   the   functions
  ContractedFilteredPureCubicalComplex(X)                                  and
  ContractedFilteredRegularComplex(X) which respectively input a filtered pure
  cubical  complex  and  filtered  regular  CW-complex  and  return a filtered
  complex of the same data type in which each term of the output filtration is
  a deformation retract of the corresponding term in the input filtration.
  
  In  this approach the vector fields on the various spaces X_k are completely
  independent  and so the method lends itself to a degree of easy parallelism.
  This is not incorporated into the current implementation.
  
  As  an  illustration  we  consider a synthetic data set S consisting of 3527
  points  sampled, with errors, from an `unknown' manifold M in R^3. From such
  a  data set one can associate a 3-dimensional cubical complex X_0 consisting
  of   one  unit  cube  centred  on  each  (suitably  scaled)  data  point.  A
  visualization of X_0 is shown below.
  
  Given  a  pure  cubical  complex  X_s  we  construct  X_s+1 =X_s ∪ {overline
  e^3_λ}_λ∈  Λ  by  adding  to X_s each closed unit cube overline e^3_λ in R^3
  that  intersects  non-trivially  with X_s. We construct the filtered cubical
  complex  X_∗ ={X_i}_0≤ i≤ 19 and compute the persistence matrices β_d^∗∗ for
  d=0,1,2  and  for Z_2 coefficients. The filtered complex X_∗ is quite large.
  In  particular,  the  final  space  X_19  in the filtration involves 1092727
  vertices,  3246354  edges, 3214836 faces of dimension 2 and 1061208 faces of
  dimension  3.  The  usual  matrix reduction approach to computing persistent
  Betti  numbers would involve an appropriate row reduction of sparse matrices
  one of which has over 3 million rows and 3 million columns.
  
    Example  
    gap> file:=HapFile("data247.txt");;
    gap> Read(file);;
    gap> F:=ThickeningFiltration(T,20);;
    gap> P:=PersistentBettiNumbersAlt(F,[0,1,2]);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The  barcodes  suggest  that  the data points might have been sampled from a
  manifold with the homotopy type of a torus.
  
  
  5.8-1 Non-trivial cup product
  
  Of  course,  a  wedge  S^2∨ S^1∨ S^1 has the same homology as the torus S^1×
  S^1.  By establishing that a 'noise free' model for our data points, say the
  10-th  term  X_10  in  the  filtration,  has  a  non-trivial  cup product ∪:
  H^1(X_10, Z) × H^1(X_10, Z) → H^2(X_10, Z) we can eliminate S^2∨ S^1∨ S^1 as
  a candidate from which the data was sampled.
  
    Example  
    gap> X10:=RegularCWComplex(FiltrationTerm(F,10));;
    gap> cup:=LowDimensionalCupProduct(X10);;
    gap> cup([1,0],[0,1]);
    [ 1 ]
    
  
  
  
  5.8-2 Explicit homology generators
  
  It  could  be desirable to obtain explicit representatives of the persistent
  homology  generators  that  "persist"  through  a  significant  sequence  of
  filtration  terms.  There  are  two such generators in degree 1 and one such
  generator  in  degree  2.  The  explicit  representatives  in degree n could
  consist  of  an inclusion of pure cubical complexes Y_n ⊂ X_10 for which the
  incuced  homology homomorphism H_n(Y_n, Z) → H_n(X_10, Z) is an isomorphism,
  and  for which Y_n is minimal in the sense that its homotopy type changes if
  any  one or more of its top dimensional cells are removed. Ideally the space
  Y_n  should  be  "close to the original dataset" X_0. The following commands
  first  construct an explicit degree 2 homology generator representative Y_2⊂
  X_10  where  Y_2  is  homotopy  equivalent  to  X_10. They then construct an
  explicit  degree 1 homology generators representative Y_1⊂ X_10 where Y_1 is
  homotopy  equivalent  to  a wedge of two circles. The final command displays
  the homology generators representative Y_1.
  
    Example  
    gap> Y2:=FiltrationTerm(F,10);;                   
    gap> for t in Reversed([1..9]) do
    > Y2:=ContractedComplex(Y2,FiltrationTerm(F,t));
    > od;
    gap> Y2:=ContractedComplex(Y2);;
    
    gap> Size(FiltrationTerm(F,10));
    918881
    gap> Size(Y2);                  
    61618
    
    gap> Y1:=PureComplexDifference(Y2,PureComplexRandomCell(Y2));;
    gap> Y1:=ContractedComplex(Y1);;
    gap> Size(Y1);
    474
    gap> Display(Y1);
    
  
  
  
  5.9 Knotted proteins
  
  The  Protein  Data  Bank  (https://www.rcsb.org/)  contains a wealth of data
  which  can  be  investigated  with  respect to knottedness. Information on a
  particular  protein  can be downloaded as a .pdb file. Each protein consists
  of  one  or  more  chains  of  amino  acids and the file gives 3-dimensional
  Euclidean  coordinates  of  the  atoms in amino acids. Each amino acid has a
  unique  "alpha  carbon"  atom  (labelled  as "CA" in the pdb file). A simple
  3-dimensional  curve,  the  protein backbone, can be constructed through the
  sequence  of  alpha carbon atoms. Typically the ends of the protein backbone
  lie near the "surface" of the protein and can be joined by a path outside of
  the protein to obtain a simple closed curve in Euclidean 3-space.
  
  The  following  command  reads  in  the pdb file for the T.thermophilus 1V2X
  protein, which consists of a single chain of amino acids, and uses Asymptote
  software  to  produce  an  interactive visualization of its backbone. A path
  joining the end vertices of the backbone is displayed in blue.
  
    Example  
    gap> file:=HapFile("data1V2X.pdb");;
    gap> DisplayPDBfile(file);
    
  
  
  The  next  command reads in the pdb file for the T.thermophilus 1V2X protein
  and represents it as a 3-dimensional pure cubical complex K. A resolution of
  r=5  is  chosen and this results in a representation as a subcomplex K of an
  ambient  rectangular  box  of  volume equal to 184× 186× 294 unit cubes. The
  complex K should have the homotopy type of a circle and the protein backbone
  is a 1-dimenional curve that should lie in K. The final command displays K.
  
    Example  
    gap> r:=5;;
    gap> K:=ReadPDBfileAsPureCubicalComplex(file,r);;      
    gap> K:=ContractedComplex(K);;
    gap> K!.properties;
    [ [ "dimension", 3 ], [ "arraySize", [ 184, 186, 294 ] ] ]
    
    gap> Display(K);
    
  
  
  Next  we  create a filtered pure cubical complex by repeatedly thickening K.
  We  perform  15 thickenings, each thickening being a term in the filtration.
  The  β_1  barcode  for  the  filtration  is  displayed.  This  barcode  is a
  descriptor for the geometry of the protein. For current purposes it suffices
  to note that the first few terms of the filtration have first homology equal
  to  that  of  a circle. This indicates that the Euclidean coordinates in the
  pdb file robustly determine some knot.
  
    Example  
    gap> F:=ThickeningFiltration(K,15);;
    gap> F:=FilteredPureCubicalComplexToCubicalComplex(F);;
    gap> F:=FilteredCubicalComplexToFilteredRegularCWComplex(F);;
    gap> P:=PersistentBettiNumbersAlt(F,1);;
    gap> BarCodeCompactDisplay(P);
    
  
  
  The next commands compute a presentation for the fundamental group π_1( R^3∖
  K)  and  the  Alexander  polynomial for the knot. This is the same Alexander
  polynomial as for the trefoil knot. Also, Tietze transformations can be used
  to see that the fundamental group is the same as for the trefoil knot.
  
    Example  
    gap> C:=PureComplexComplement(K);;
    gap> C:=ContractedComplex(C);;
    gap> G:=FundamentalGroup(C);;
    gap> GeneratorsOfGroup(G);
    [ f1, f2 ]
    gap> RelatorsOfFpGroup(G);
    [ f2*f1^-1*f2^-1*f1^-1*f2*f1 ]
    
    gap> AlexanderPolynomial(G);
    x_1^2-x_1+1
    
  
  
  
  5.10 Random simplicial complexes
  
  For  a  positive  integer  n  and  probability  p  we  denote  by Y(n,p) the
  Linial-Meshulam  random simplicial 2-complex. Its 1-skeleton is the complete
  graph  on n vertices; each possible 2-simplex is included independently with
  probability p.
  
  The  following  commands  first compute the number h_i of non-trivial cyclic
  summands  in  H_i(Y(100,p),  Z) for a range of probabilities p and i=1,2 and
  then  produce  a  plot of h_i versus p. The plot for h_1 is red and the plot
  for  h_2  is blue. A plot for the Euler characteristic 1-h_1+h_2 is shown in
  green.
  
    Example  
    gap> L:=[];;M:=[];;
    gap> for p in [1..100] do
    > K:=RegularCWComplex(RandomSimplicialTwoComplex(100,p/1000));;
    > h1:=Length(Homology(K,1));;
    > h2:=Length(Homology(K,2));;
    > Add(L, [1.0*(p/1000),h1,"red"]);
    > Add(L, [1.0*(p/1000),h2,"blue"]);
    > Add(M, [1.0*(p/1000),1-h1+h2,"green"]);
    > od;
    gap> ScatterPlot(L);
    gap> ScatterPlot(M);
    
  
  
  From  this  plot  it  seems that there is a phase change threshold at around
  p=0.025.  An  inspection of the first homology groups H_1(Y(100,p), Z) shows
  that  in  most cases there is no torsion. However, around the threshold some
  of the complexes do have torsion in their first homology.
  
  Similar commands for Y(75,p) suggest a phase transition at around p=0.035 in
  this  case.  The  following  commands compute H_1(Y(75,p), Z) for 900 random
  2-complexes  with p in a small interval around 0.035 and, in each case where
  there  is  torsion, the torsion coefficients are stored in a list. The final
  command  prints  these  lists  --  all but one of which are of length 1. For
  example,  there is one 2-dimensional simplicial complex on 75 vertices whose
  first  homology contains the summand Z_107879661870516800665161182578823128.
  The  largest  prime  factor  is  80555235907994145009690263  occuring in the
  summand Z_259837760616287294231081766978855.
  
    Example  
    gap> torsion:=function(n,p)
    > local H, Y;
    > Y:=RegularCWComplex(RandomSimplicialTwoComplex(n,p));
    > H:=Homology(Y,1);
    > H:=Filtered(H,x->not x=0);
    > return H;
    > end;
    function( n, p ) ... end
    
    
    gap> L:=[];;for n in [73000..73900] do
    > t:=torsion(75,n/2000000);  
    > if not t=[] then Add(L,t); fi;
    > od;
    
    gap> Display(L);
    [ [                                     2 ],
      [                                    26 ],
      [     259837760616287294231081766978855 ],
      [                                     2 ],
      [                                     3 ],
      [                                     2 ],
      [          2761642698060127444812143568 ],
      [       2626355281010974663776273381976 ],
      [                                     2 ],
      [                                     3 ],
      [         33112382751264894819430785350 ],
      [                                    16 ],
      [                                     4 ],
      [                                     3 ],
      [                                     2 ],
      [                                     3 ],
      [                                     2 ],
      [      85234949999183888967763100590977 ],
      [                                     2 ],
      [      24644196130785821107897718662022 ],
      [                                     2,                                     2 ],
      [                                     2 ],
      [           416641662889025645492982468 ],
      [         41582773001875039168786970816 ],
      [                                     2 ],
      [            75889883165411088431747730 ],
      [         33523474091636554792305315165 ],
      [  107879661870516800665161182578823128 ],
      [          5588265814409119568341729980 ],
      [                                     2 ],
      [          5001457249224115878015053458 ],
      [                                    10 ],
      [                                    12 ],
      [                                     2 ],
      [                                     2 ],
      [                                     3 ],
      [          7757870243425246987971789322 ],
      [       8164648856993269673396613497412 ],
      [                                     2 ] ]
    
  
  
  
  5.11 Computing homology of a clique complex (Vietoris-Rips complex)
  
  Topological data analysis provides one motivation for wanting to compute the
  homology of a clique complex. Consider for instance the cloud of data points
  shown in Example 5.2-1. This data is a set S of 400 points in the plane. Let
  Γ  be the graph with vertex set S and with two vertices joined by an edge if
  they lie within a Euclidean distance of 40 of each other. The clique complex
  K=K(Γ) could be studied to see what it reveals about the data. The following
  commands construct K and show that it is a 23-dimensional simplicial complex
  consisting of a total of 36191976 simplices.
  
    Example  
    gap> file:=HapFile("data134.txt");;                              
    gap> Read(file);
    gap> A:=VectorsToSymmetricMatrix(S,EuclideanApproximatedMetric);;
    gap> threshold:=40;; 
    gap> grph:=SymmetricMatrixToGraph(A,threshold);;
    gap> dimension_cap:=100;; 
    gap> K:=CliqueComplex(grph,dimension_cap);
    Simplicial complex of dimension 23.
    
    gap> Size(K);
    36191976
    
  
  
  The  computation  of  the  homology  of this clique complex K is a challenge
  because  of  its size. If we are only interested in K up to homotopy then we
  could  try to modify the graph Γ in such a way that the homotopy type of the
  clique  complex  is unchanged but the size of the clique complex is reduced.
  This  is  done in the following commands, producing a smaller 19-dimensional
  simplicial complex K with 4180652 simplices.
  
    Example  
    gap> ContractGraph(grph);;
    
    gap> dimension_cap:=100;; 
    gap> K:=CliqueComplex(grph,dimension_cap);
    Simplicial complex of dimension 19.
    
    gap> Size(K);
    4180652
    
  
  
  To  compute the homology of K in degrees 0 to 5 say, we could represent K as
  a  regular  CW-complex  Y and then compute the homology of Y as follows. The
  homology  H_n(K)= Z for n=0,1 and H_n(K)= 0 for n=2,3,4,5 is consistent with
  the  data  having  been  sampled  from  a  space with the homotopy type of a
  circle.
  
    Example  
    gap> Y:=RegularCWComplex(K);
    Regular CW-complex of dimension 19
    
    gap> Homology(Y,0);
    [ 0 ]
    gap> Homology(Y,1);
    [ 0 ]
    gap> Homology(Y,2);
    [  ]
    gap> Homology(Y,3);
    [  ]
    gap> Homology(Y,4);
    [  ]
    gap> Homology(Y,5)
    [  ]