File: doxy.h

package info (click to toggle)
vspline 1.0.2-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye
  • size: 1,436 kB
  • sloc: cpp: 12,411; ansic: 413; sh: 12; makefile: 2
file content (765 lines) | stat: -rw-r--r-- 51,517 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
/************************************************************************/
/*                                                                      */
/*    vspline - a set of generic tools for creation and evaluation      */
/*              of uniform b-splines                                    */
/*                                                                      */
/*            Copyright 2015 - 2020 by Kay F. Jahnke                    */
/*                                                                      */
/*    The git repository for this software is at                        */
/*                                                                      */
/*    https://bitbucket.org/kfj/vspline                                 */
/*                                                                      */
/*    Please direct questions, bug reports, and contributions to        */
/*                                                                      */
/*    kfjahnke+vspline@gmail.com                                        */
/*                                                                      */
/*    Permission is hereby granted, free of charge, to any person       */
/*    obtaining a copy of this software and associated documentation    */
/*    files (the "Software"), to deal in the Software without           */
/*    restriction, including without limitation the rights to use,      */
/*    copy, modify, merge, publish, distribute, sublicense, and/or      */
/*    sell copies of the Software, and to permit persons to whom the    */
/*    Software is furnished to do so, subject to the following          */
/*    conditions:                                                       */
/*                                                                      */
/*    The above copyright notice and this permission notice shall be    */
/*    included in all copies or substantial portions of the             */
/*    Software.                                                         */
/*                                                                      */
/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */
/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */
/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */
/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */
/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */
/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */
/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */
/*    OTHER DEALINGS IN THE SOFTWARE.                                   */
/*                                                                      */
/************************************************************************/

// This header doesn't contain any code, only the text for the main page of the documentation.

/*! \mainpage

 \section intro_sec Introduction

 vspline is a header-only generic C++ library for the creation and use of uniform B-splines. It aims to be as comprehensive as feasibly possible, yet at the same time producing code which performs well, so that it can be used in production. vspline is CPU-based, it does not use the GPU.
 
 vspline's main focus is interpolation of bulk nD raster data. It was developed originally to be used for image processing software. In image processing, oftentimes large amounts of pixels need to be submitted to identical operations, suggesting a functional approach. vspline offers functional programming elements to implement such programs.
 
 vspline was developed on a Linux system using clang++ and g++. It has not been tested much with other systems or compilers, and as of this writing I am aware that the code probably isn't fully portable. The code uses the C++11 standard.
 
 Note: in November 2017, with help from Bernd Gaucke, vspline's companion program pv, which uses vspline heavily, was successfully compiled with 'Visual Studio Platform toolset V141'.
 
 In 2019 I have ported pv, vspline's 'companion program', first to MacOS and then to to an msys2/mingw environment. These ports have established that vspline works on these platforms, as well, with no code modifications.
 
 vspline relies on one other library:
 
 - <a href="http://ukoethe.github.io/vigra/">VIGRA</a>, mainly for data handling using vigra's MultiArrayView and TinyVector types
 
 Optionally, vspline can make use of Vc
 
 - <a href="https://github.com/VcDevel/Vc">Vc</a>, for the use of explicit vectorization
 
 I find VIGRA indispensible, omitting it from vspline is not really an option. Use of Vc is optional, though, and may be activated by defining 'USE_VC'. This should be done by passing -D USE_VC to the compiler; defining USE_VC only for parts of a project may or may not work. Please note that vspline uses Vc's 1.3 or 1.4 branch, not the master branch. 1.3 is what you are likely to find in your distro's packet repositories; if you check out Vc from github, make sure you pick the 1.4 branch. Vc support for new vector units may peter out as std::simd (in c++17) emerges, but for now it's a good choice.
 
 I have made an attempt to generalize the code so that it can handle

 - fundamental numeric data types and their aggregates (like, pixels)
 
 - a reasonable selection of boundary conditions
 
 - arbitrary spline degrees (up to 45)
 
 - arbitrary dimensions of the spline
 
 - in multithreaded code
 
 - using the CPU's vector units if possible

On the evaluation side I provide

 - evaluation of the spline at point locations in the defined range
 
 - evaluation of the spline's derivatives

 - mapping of arbitrary coordinates into the defined range
 
 - evaluation of nD arrays of coordinates (remap function)
 
 - generalized 'transform' and 'apply' functions
 
 - restoration of the original data from the coefficients

To produce maximum perfomance, vspline also has a fair amount of collateral code,
and some of this code may be helpful beyond vspline:

 - range-based multithreading with a thread pool
 
 - functional constructs using vspline::unary_functor*
 
 - forward-backward n-pole recursive filtering*
 
 - separable convolution*
 
 - efficient access to the b-spline basis functions
 
 - extremely precise precalculated constants

  \*note that the filtering code and the transform-type routines multithread and vectorize automatically.

 \section install_sec Installation
 
 Note that vspline's takes it's claim to being 'generic' seriously: when template arguments are offered to produce variant code, you can depend on actually getting code for every allowed value, rather than having to implement the special cases yourself and then introducing them to some generic dispatch mechanism. So you want to create a 4-D b-spline over five-channel long double data? No problem. And spline degree is even a run-time parameter and not a template argument, making the code more flexible than code which requires that you specify the spline degree at compile-time.

 vspline is available as a debian package, so the easiest way to install vspline is to install the vspline package, if it is provided by your package manager. this will also take care of the dependencies and install the examples. ubuntu should provide a vspline package from 18.04 on. As of this writing, vspline is still under development, and the master branch in the git repository may offer new/different functionality to the code in the package. While I try to keep the 'higher-level' functions' and objects' interface consistent, the signatures may still change, especially in their default arguments, so switching from the packaged version to the master branch, or upgrading to a newer packaged version may require changes to the calling code.
 
 If you're installing manually, note that vspline is header-only, so it's sufficient to place the headers where your code can access them. They must all reside in the same folder. VIGRA (and optionally Vc) are supposed to be installed in a location where they can be found so that includes along the lines of #include <vigra/...> succeed.

 \section compile_sec Compilation
 
 While your distro's packages may be sufficient to get vspline up and running, you may need newer versions of VIGRA and Vc. At the time of this writing the latest versions commonly available were Vc 1.3.0 and VIGRA 1.11.0; I compiled Vc and VIGRA from source, using up-to-date pulls from their respective repositories. Vc 0.x.x will not work with vspline, and only Vc's 1.3 and 1.4 branches are compatible with vspline. Using Vc only makes sense if you are aiming for maximum performance; vspline's code autovectorizes well and oftentimes performs just about as well without Vc, but if you are writing more complex pipelines, the gains from using explicit vectorization tend to increase. Some components of vspline, like prefiltering and the genralized digital filter code, do not benefit from Vc use.
 
 update: ubuntu 17.04 has vigra and Vc packages which are sufficiently up-to-date.
 
 To compile software using vspline, I preferably use clang++. I found that clang++ often produces faster code than other compilers.
 
~~~~~~~~~~~~~~

 clang++ -pthread -O3 -march=native --std=c++11 your_code.cc
 
 // or, using Vc:

 clang++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc
 
~~~~~~~~~~~~~~
 
 -pthread is needed to run the automatic multithreading used for most of vspline's bulk data processing, like prefiltering or the transform routines. It's possible to disable multithreading by passing -D VSPLINE_SINGLETHREAD, in which case you can omit -pthread:
 
~~~~~~~~~~~~~~

 clang++ -DVSPLINE_SINGLETHREAD --std=c++11 your_code.cc
 
~~~~~~~~~~~~~~ 
 
 optimization is essential to get decent performance, but if you just work with a few values or have trouble debugging optimzed code, you can omit it. I have found using -Ofast usually works as well.
 
 vspline uses the C++11 standard and will not compile with lower standard versions. using C++14 is fine.
 
 vigraimpex (VIGRA's image import/export library) is used by some of the examples, for these you'll need an additional -lvigraimpex

 Please note that an executable using vectorization produced for your system may likely not work on a machine with another CPU. It's best to compile for the intended target architecture explicitly using -march... 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment. If you're compiling for the same system you're on, you can use -march=native. If you don't specify the architecture for the compilation, you'll likely get some lowest-common-denominator binary (like, SSE) which will not exploit your hardware optimally, but run on a wider range of systems. If you intend to build software which can run on several vector ISAs, there is a simple trick: build a separate object file for each ISA where you use a preprocessor statement which #defines vspline to be some ISA-specific value, the use a dispatcher class calling ISA-specific variants via a pure virtual base class. In pv, vspline's companion program, this technique is used and you can study it in more detail.
 
 If you tell the compiler to do so, autovectorization will still be used even without Vc being present, and if you compile for a given target architecture, the binary may contain vector instructions specific to the architecture and will crash on targets not supporting the relevant ISA.
 
 If you don't want to use clang++, g++ will also work. In my tests I found that clang++ produced faster code, but this may or may not be the case for specific compiler versions or taget architectures. You'll have to try it out. 
 
 \section license_sec License

 vspline is free software, licensed under this license:
 
<BLOCKQUOTE>

    vspline - a set of generic tools for creation and evaluation
              of uniform b-splines

            Copyright 2015 - 2020 by Kay F. Jahnke

    Permission is hereby granted, free of charge, to any person
    obtaining a copy of this software and associated documentation
    files (the "Software"), to deal in the Software without
    restriction, including without limitation the rights to use,
    copy, modify, merge, publish, distribute, sublicense, and/or
    sell copies of the Software, and to permit persons to whom the
    Software is furnished to do so, subject to the following
    conditions:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the
    Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    OTHER DEALINGS IN THE SOFTWARE.

</BLOCKQUOTE>

 \section strategy_sec Strategy
 
 This section lines out the strategies used in vspline to get from
 original data to an interpolated result. If you prefer to start
 coding straight away, continue to the next section.
 
 When you are using vspline, your information goes through several
 stages. Starting out with your original data, the first step is to
 'prefilter' these data to obtain 'b-spline coefficients'. vspline
 keeps these coefficients in a 'bspline' object. Next you obtain an
 'evaluator' using these coefficients. This is a functor; you call it
 with (real) coordinates and receive interpolated values. While you can
 produce your results by repeated calls to the evaluator, vspline also
 offers mass evaluation, a typical example is a 'classic' remap, where
 you provide an array full of coordinates and receive an array full of
 interpolated values. Using vspline's mass evaluation methods is a
 good idea, because this is done very efficiently using multithreading
 and hardware vectorization. I'll go through these stages in detail.
 
 'direct' interpolation uses some arithmetic construct to derive an
 interpolated value from several data points in the 'vicinity' of the
 interpolation locus. The value of this arithmetic construct collapses
 to the data point itself if it coincides with the interpolation locus.
 b-splines are *not* direct interpolators: the interpolated value is not
 directly derived from an arithmetic construct involving the data points,
 instead it is derived from a set of coefficients which have been
 produced from the data points by 'prefiltering' them. This implies
 a sequence of operations which has to be followed in order to use
 b-spline interpolation:
 
 - the 'original' data (aka 'knot point data') are 'prefiltered',
   yielding the b-spline's 'coefficients'
   
 - for evaluation, a subset of the coefficients is selected and
   processed to yield the interpolated value.
   
 This two-step process has one distict disadvantage: the coefficients
 have to be held in memory until all evaluations of the spline are done.
 While there is no way around storing the coefficients, storing the
 original data is *not* necessary to evaluate the spline. So if the
 original data aren't used somewhere 'further down the line', they can
 be discarded as soon as the coefficients are known. Discarding the
 original data can even be done *while* calculating the coefficients,
 and it is possible to replace the original data with the corresponding
 coefficients in the process of prefiltering: the operation can be done
 'in place'. vspline can exploit situations where the original data
 won't be used after the coefficients have been produced, reducing
 storage requirements to just above what is needed for the original
 data. This can be done in two ways:
 
 - when first obtaining the original data (like, from a file), they
   can be immediately placed into a bspline object's 'core', where they
   are prefiltered in-place. This is the best way since it requires no
   further intermediate storage. For an example, see ca_correct.cc,
   where this way of handling data is used in main(), look for
   'reading image'. another example is in 'gradient.cc', where the
   original data are produced arithmetically and directly placed
   into the bspline object's 'core'.
   
 - if the data are already present in memory, they can be 'sucked into'
   a bspline object during prefiltering. This is achieved by creating the
   bspline object and passing the data to it's prefilter routine. This
   way is second best: the original data will only be read once, and as
   soon as the coefficients are ready, they may or may not be discarded.
   You can find an example for this strategy in n_shift.cc, look for
   invocations of 'prefilter'.
   
 Of course, if you don't care for maximum performance, you can use a third
 way, which is the most legible way to code the process:
 
 - Start out with the data in memory and *assign* them to the bspline
   object's 'core', then call prefilter.
   Performance-wise this is the worst way of doing it, since the data
   are read from one part of memory, written to another part (the bspline
   objet's core) and reread from there as soon as prefiltering starts.

 Now you might ask why a bspline object shouldn't be created by somehow 
 transforming the raw data. This is due to the fact that vspline holds
 more coefficients than original data: for efficient evaluation without
 needs for bounds checking, a small 'frame' of additional coefficients
 is put around the 'core' coefficients. This frame is handled by the
 bspline object, and the space in memory holding the 'core' coefficients
 is a 'subarray' of the total amount of coefficients held by the bspline
 object. It is possible to 'adopt' storage holding original data with a
 bspline object, but this can only be done if the raw data are already
 surrounded by a suitably-sized frame of additional storage to accept
 the 'frame' after prefiltering. If you set up your data so that the
 'frame' is present, you can make the spline 'adopt' the data when it
 is created, but you might as well create the bspline object first and
 populate it's core afterwards, instead of creating, populating and
 adpoting the data.
 
 When you set up a bspline object, you instantiate class bspline with
 the data type it should use for coefficients. Oftentimes this will
 be the same data type as your original data. But you can use a different
 type for the coefficients, as long as the types are convertible. When
 using a different type, you can still use the overload of 'prefilter'
 'sucking in' the original data: they will be converted in the process.
 This is especially useful if your source data are of a type with
 small dynamic range (like 8 bit pixels), which aren't well-suited
 for arithmetic operations. While you get most precise results with
 real data types (like float) for b-spline coefficients, you are free
 to operate with integral b-spline coefficients: per default, all
 internal calculations (prefiltering and evaluation) are done with
 suitably 'promoted' data types, so when using integral coefficients,
 you only suffer from quantization errors. You have freedom of choice
 here: your original data, the b-spline coefficients, the type used
 for internal calculations and the result type of the evaluation may
 all be different, while there are sensible defaults if you don't want
 to concern yourself with the details. The only constraint is that
 arithmetics have to be done with a real data type.
 
 Note that prefiltering is highly optimized: the operation is
 multithreaded automatically, and it is also vectorized, provided
 that the data types you're using can be handled by your processor's
 vector units. Even when you aren't using Vc, prefiltering will use
 the CPU's vector units via a process I call 'goading': The data are
 presented in small aggregates of vector-friendly size, and with the
 filter code being reasonably trivial, the operation is auto-vectorized
 if permitted by the given compiler flags (like -mavx).
 
 Whichever way of producing the coefficients you may use, eventually
 you'll have a 'bspline' object holding the coefficients in a form
 which can be used by vspline's evaluation mechanism. Evaluation
 involves two steps:
 
 - first you create an 'evaluator' object. This is a functor which
   is created from a bspline object and provides methods to evaluate
   the spline.
   
 - next you use the evaluator to obtain interpolated values.
 
 Yet again you might ask why there isn't simply an evaluation method
 in class bspline itself. This is due to the fact that evaluation
 can be done in different ways, which are specified when the 'evaluator'
 object is created. And when I say 'evaluator' here, I use the term
 for a whole set of objects capable of evaluating a b-spline, where
 the capabilities may go beyond what class vspline::evaluator provides.
 Consider, for example, the factory functions make_evaluator() and
 make_safe_evaluator() in eval.h. They create functors which contain
 vspline::evaluator objects specialized for the task at hand, and
 extra arithmetics on the input for 'safe' evaluators - and they
 return the functors as type-erased objects in a common type.
 
 Once an evaluator is ready, it can be used without further
 specialization: it's immutable after creation and constitutes a
 'pure' functor in a functional-programming sense. It is an object
 which can be passed around (it's trivially copyable) even between
 several threads. Creating an evaluator is a cheap operation, so
 the strategy is to create an evaluator specific to the task at hand
 and use this specific evaluator for the specific task only.
 Evaluator objects are also lightweight, so having many evaluators
 does not use much memory. Evaluators refer to the coefficients, and
 having an evaluator for a given b-spline object does not automatically
 keep the coefficients 'alive'. If you destroy the bspline object,
 evaluators referring to the coefficients hold dangling pointers and
 using them will produce undefined behaviour. It's your responsibility
 to keep the bspline object alive until the coefficients won't be
 used anymore.
 
 vspline uses RAII, and the way to create evaluators fits this scheme.
 Typically you'd create the evaluator in some inner scope just before it
 is used, letting it perish just afterwards when leaving the scope.
 
 There is a third aspect to the strategy proposed by vspline. While
 the obvious way of using an evaluator would be to write a loop in
 which the evaluator is called repeatedly, this is only a good idea
 if performance is not an issue. If, on the other hand, you have large
 amounts of data to process, you want to employ methods like multithreading
 and vectorization to get better performance. You are of course free to
 use whatever multithreading and vectorization code you like, but you
 might as well use what vspline offers: I call it 'wielding' code. In
 an evaluation context, it means applying functors to large data sets.
 
 vspline offers code for the purpose which automatically multithreads
 and vectorizes the operation of evaluating a b-spline at many locations.
 You can pass an array full of coordinates and receive an array of values,
 an operation called 'remapping'. vspline even goes further and uses
 an abstraction of this process: it has code to feed large amounts of
 data to a functor, where the effect is that of processing all data in
 turn, while the operation is in fact multithreaded and vectorized
 automatically. The result is stored in a (possibly multidimensional)
 array. A 'classic' remap is merely a special case of this procedure,
 which, in vspline, is done by vspline::transform().
 
 When coding geometric transformations, the task can often be reduced to
 formulating a function which, given a 'target' coordinate - a coordinate
 into the result array - can produce a 'source' coordinate, namely the
 locus of the interpolation. For such situations, vspline offers an
 overload of vspline::transform which feeds the functor with discrete
 target coordinates. If the functor is merely an evaluator, the result
 will be a 'restoration' of the original data. But if you use a functor
 transforming the 'incoming' coordinate and feeding the transformed
 coordinate to en evaluator, the result will be a geometrically
 transformed image, volume, etc...
 
 Creating such functors is actually quite simple, and the examples
 have code to show how it's done. Have a look at gsm.cc, which produces
 an image holding the gradient squared magnitude of an input image.
 There, the target pixels are created one by one in a loop. Now look
 at gsm2.cc. The result is the same, but here it is obtained by
 coding the 'interesting' bit as a functor derived from
 vspline::unary_functor and the coordinate-fed overload of
 vspline::transform is used. You can see that the coding is quite
 straightforward. What is not immediately apparent is the fact that
 doing it this way is also much faster: the operation is automatically
 multithreaded and, possibly, vectorized. For a single image,
 the difference is not a big deal, but when you're producing images
 for real-time display, every millisecond counts.
   
 My use of functors in vspline isn't as straightforward as using,
 say, std::function. This is due to the fact that the functors used in
 vspline can process both unvectorized and vectorized data. The relevant
 'eval' routines provide the dual functionality either as two distinct
 overloads or as a template if the operation can be coded as one.
 This is why I had to code my own version of the subset of functional
 programming constructs used in vspline, rather than using some ready-made
 expression template code.
   
 There is one last step which I'll mention briefly. Suppose you have
 a bspline object and you want to recreate your original data from the
 coefficients it holds. vspline offers functions called 'restore' for
 the purpose. Restoration will only be possible within arithmetic
 precision - if you are using integral data, you may suffer from
 quantization errors, with real data, the restored data will be very
 close to the original data. Restoration is done with a separable
 convolution, which is multithreaded and vectorized automatically,
 just like prefiltering or transformations, so it's also very fast.
 
 vspline's prefiltering code, which is simply a forward-backward
 recursive n-pole filter, and the convolution code used for restoration,
 can be used to apply other filters than the b-spline prefilter or
 restoration, with the same benefits of multithreading and vectorization,
 requiring little more than the specification of the filter itself,
 like the convolution kernel to use. And vspline's 'range-based
 multithreading' is also commodified so that you can use it for your own
 purposes with just a few lines of code, see a simple example in
 eval.cc, where the speed test is run with a thread pool to get
 a measurement of the overall evaluation performance.
 
 How about processing 1D data like sounds or simple time series?
 vspline has specialized code for the purpose. While simply iterating
 over the 1D sequence is trivial, it's slow. So vspline's specialized
 1D code uses mathematical tricks to process the 1D data with multiple
 threads and vectorization just as images or volumes. These specializations
 exist for prefiltering, transformations and restoration, which are
 therefore just as fast as for nD data, and, as far as the filters are
 concerned, even faster, since the filter has to be applied along one axis
 only. As a user, again you don't have to 'do' anything, the 1D code is
 used automatically if your data are 1D. The mathematical trickery only
 fails if you require 'zero tolerance' for the coefficient generation,
 which forces the data to be prefiltered en bloc - an option which only
 makes sense in rare circumstances.
 
 vspline will accept high-dimension data, but when dimensionality gets
 very high, vspline's use of 'braced' coefficient arrays has some drawbacks:
 with increasing dimensionality, the 'surface' of an array grows very large,
 and the 'brace' around the coefficients is just such a surface, possibly
 several units wide. So with high-D splines, the braced coefficient array
 is (potentially much) larger than the knot point data. You've been warned ;)
 
 An important specialization is code for degree-0 and degree-1
 b-splines. These are old acquaintances in disguise: a degree-0 b-spline
 is simply another name for a nearest-neigbour interpolation, and it's
 degree-1 cousin encodes linear interpolation. While the 'general-purpose'
 b-spline evaluation code can be used to evaluate degree-0 and degree-1
 b-splines, doing so wastes some potential for optimization. vspline's
 evaluator objects can be 'specialized' for degree-0 and degree-1 splines,
 and when you're using the factory functions like make_evaluator, this
 is done automatically. With the optimizations for low-degree splines and
 1D data, I feel that the 'problem' at hand - bspline processing - is
 dealt with exhaustively, leaving no use scenario with second-best
 solutions but providing good performance throughout.
 
 \section quickstart_sec Quickstart
 
 vspline uses vigra to handle data. There are two vigra data types which are used throughout: vigra::MultiArrayView is used for multidimensional arrays. It's a thin wrapper around the three parameters needed to describe arbitrary n-dimensional arrays of data in memory: a pointer to some 'base' location coinciding with the coordinate origin, a shape and a stride. If your code does not use vigra MultiArrayViews, it's easy to create them for the data you have: vigra offers a constructor for MultiArrayViews taking these three parameters. Very similar is vigra's MultiArray, which is is a MultiArrayView owning and allocating the data it refers to.
 
 The other vigra data type used throughout vspline is vigra::TinyVector, a small fixed-size container type used to hold things like multidimensional coordinates or pixels. This type is also just a wrapper around a small 1D C array. It's zero overhead and contains nothing else, but offers lots of functionality like arithmetic operations. I recommend looking into vigra's documentation to get an idea about these data types, even if you only wrap your extant data in them to interface with vspline. vspline follows vigra's default axis ordering scheme: the fastest-varying index is first, so coordinates are (x,y,z...). This is important to keep in mind when evaluating a b-spline. Always pass the x coordinate first. Coordinates, strides and shapes are given in units of to the MultiArrayView's value_type. So if you have a MultiArrayView of, say, float RGB pixels, a stride of one means an offset of ( 3 * sizeof ( float ) ) in bytes. This is important to note, since there is no standard for expressing strides - some systems use bytes, some use the elementary type. vigra uses the array's value_type for the unit step.
 
 If you stick with the high-level code, using class bspline or the transform functions, most of the parametrization is easy. Here are a few quick examples what you can do. This is really just to give you a first idea - there is example code covering most features of vspline where things are covered in more detail, with plenty of comments. the code in this text is also there, see quickstart.cc.
 
 Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle integer and real data like float and double, and also their 'aggregates', meaning data types like pixels or vigra's TinyVectors. But for now, let's assume you have plain float data. Creating the bspline object is easy:
 
~~~~~~~~~~~~~~

#include "vspline/vspline.h"

// given a 10 X 20 vigra::MultiArray of data

vigra::MultiArray < 2 , float > a ( 10 , 20 ) ;

// let's initialize the whole array with 42

a = 42 ;

// fix the type of the corresponding b-spline. We want a bspline
// object with float coefficients and two dimensions:

typedef vspline::bspline < float , 2 > spline_type ;
 
// create bspline object 'bspl' fitting the shape of your data

spline_type bspl ( a.shape() ) ;
 
// copy the source data into the bspline object's 'core' area

bspl.core = a ;
 
// run prefilter() to convert original data to b-spline coefficients

bspl.prefilter() ;

~~~~~~~~~~~~~~
 
 The memory needed to hold the coefficients is allocated when the bspline object is constructed.
 
 Obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. The spline is 'braced' so that it can be evaluated with vspline's evaluation routines, and the process is automatically partitioned and run in parallel by a thread pool. The only mandatory template arguments are the value type and the number of dimensions, which have to be known at compile time.
 
 While the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you would use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this  where the bspline object provides the array and you interface it via a view to it's 'core':
   
~~~~~~~~~~~~~~

vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ;
 
auto v1 = bsp.core ; // get a view to the bspline's 'core'
 
for ( auto & r : v1 ) r = ... ; // assign some values
 
bsp.prefilter() ; // perform the prefiltering

~~~~~~~~~~~~~~
 
 This is a common idiom, because it reflects a common mode of operation where you don't need the original, unfiltered data any more after creating the spline, so the prefiltering is done in-place overwriting the original data. If you do need the original data later, you can also use a third idiom:
 
~~~~~~~~~~~~~~

vigra::MultiArrayView < 3 , double > my_data ( vigra::Shape3 ( 5 , 6 , 7 ) ) ;
 
vspline::bspline < double , 3 > bsp ( my_data.shape() ) ;
 
bsp.prefilter ( my_data ) ;

~~~~~~~~~~~~~~
 
 Here, the bspline object is first created with the appropriate 'core' size, and prefilter() is called with an array matching the bspline's core. This results in my_data being read into the bspline object during the first pass of the prefiltering process.
 
 There are more ways of setting up a bspline object, please refer to class bspline's constructor. Of course you are also free to directly use vspline's lower-level routines to create a set of coefficients. The lowest level of filtering routine is simply a forward-backward recursive filter with a set of arbitrary poles. This code is in prefilter.h.
 
 Next you may want to evaluate the spline from the first example at some pair of coordinates x, y. Evaluation of the spline can be done using vspline's 'evaluator' objects. Using the highest level of access, these objects are set up with a bspline object and, after being set up, provide methods to evaluate the spline at given coordinates. Technically, evaluator objects are functors which don't have mutable state (all state is created at creation time and remains constant afterwards), so they are thread-safe and 'pure' in a functional-programming sense. The evaluation is done by calling the evaluator's eval() member function, which takes it's first argument (the coordinate) as a const reference and writes the result to it's second argument, which is a reference to a variable capable of holding the result.
 
~~~~~~~~~~~~~~

// for a 2D spline, we want 2D coordinates
 
typedef vigra::TinyVector < float , 2 > coordinate_type ;
 
// get the appropriate evaluator type
 
typedef vspline::evaluator < coordinate_type , float > eval_type ;
 
// create the evaluator
 
eval_type ev ( bspl ) ;

// create variables for input and output,

coordinate_type coordinate ( 3 , 4 ) ;
float result ;

// use the evaluator to evaluate the spline at ( 3 , 4 )
// storing the result in 'result'

ev.eval ( coordinate , result ) ;

~~~~~~~~~~~~~~

 Again, some things have happened by default. The evaluator was constructed from a bspline object, making sure that the evaluator is compatible.
 
 Class evaluator does provide operator(). So do all the other functors vspline
 can generate. This is for convenience, so you can use vspline's unary_functors
 like a 'normal' function:
 
~~~~~~~~~~~~~
 
// evaluators can be called as a function

auto r = ev ( coordinate ) ;

assert ( r == result ) ;

~~~~~~~~~~~~~
 
 vspline offers a limited set of functional programming constructs - as of this
 writing, it provides just those constructs which it uses itself, usually in
 factory functions. You can find the functional constructs in unary_functor.h.
 While it's tempting to implement more along the lines of expression templates,
 I have tried to keep things limited to a comfortable minimum. Most of the time
 user code may remain ignorant of the functional programming aspects - the
 functional constructs obtained from the factory functions can just be assigned
 to auto variables; it's usually not necessary to make these types explicit.
 
 What about the remap function? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap does. Instead of a single coordinate, you pass a whole vigra::MultiArrayView full of coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every coordinate in the first array. Here's a simple example, using the same array 'a' as above:

~~~~~~~~~~~~

// create a 1D array containing three (2D) coordinates

vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ;

// we initialize the coordinate array by hand...

coordinate_array = coordinate ;

// create an array to accommodate the result of the remap operation

vigra::MultiArray < 1 , float > target_array ( 3 ) ;
 
// perform the remap

vspline::remap ( a , coordinate_array , target_array ) ;

// reassure yourself the result is correct

auto ic = coordinate_array.begin() ;
for ( auto k : target_array )
  assert ( k == ev ( *(ic++) ) ;

~~~~~~~~~~~~

 This is an 'ad-hoc' remap, passing source data as an array. You can also set up a bspline object and perform a 'transform' using an evaluator for this bspline object, with the same effect:
 
~~~~~~~~~~~~

// instead of the remap, we can use transform, passing the evaluator for
// the b-spline over 'a' instead of 'a' itself. the result is the same.

vspline::transform ( ev , coordinate_array , target_array ) ; 

~~~~~~~~~~~~

 This routine has wider scope: while in this example, ev is a b-spline evaluator, ev's type can be any functor capable of yielding a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be any data type suitable for processing with vspline, you're not limited to things which can be thought of as 'coordinates' etc.
 
 This generalization of remap is named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called vspline::apply.

 There is one variation of transform(). This overload doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in.
 It's probably easiest to understand this variant if you start out thinking of feeding the previous transform() with an array which contains discrete indices. In 2D, this array would contain
 
<BLOCKQUOTE>

    (0,0) , (1,0) , (2,0) ...
    (0,1) , (1,1) , (2,1) ...
    ...

</BLOCKQUOTE>

 So why would you set up such an array, if it merely contains the coordinates of every cell? You might as well create these values on-the-fly and omit the coordinate array. This is precisely what the second variant of transform does:
 
~~~~~~~~~~~~~

// create a 2D array for the result of the index-based transform operation

vigra::MultiArray < 2 , float > target_array_2d ( 3 , 4 ) ;

// use transform to evaluate the spline for the coordinates of
// all values in this array

vspline::transform ( ev , target_array_2d ) ;

// verify

for ( int x = 0 ; x < 3 ; x ++ )
{
  for ( int y = 0 ; y < 4 ; y++ )
  {
    coordinate_type c { x , y } ;
    assert ( target_array_2d [ c ] == ev ( c ) ) ;
  }
}

~~~~~~~~~~~~~

 If you use this variant of transform directly with a vspline::evaluator, it will reproduce your original data - within arithmetic precision of the evaluation. While this is one way to restore the original data, there's also a (more efficient) routine called 'restore'.
 
~~~~~~~~~~~~~

// use an index-based transform to restore the original data to 'b'

vigra::MultiArray < 2 , float > b ( a.shape() ) ;
vspline::transform ( ev , b ) ;

// confirm the restoration has succeeded

auto ia = a.begin() ;
for ( auto r : b )
  assert ( vigra::closeAtTolerance ( *(ia++) , r , .00001 ) ) ;

// now use vspline::restore to restore the original data into 'c'

vigra::MultiArray < 2 , float > c ( a.shape() ) ;
vspline::restore ( bspl , c ) ;

// confirm that both methods produced similar results

auto ib = b.begin() ;
for ( auto & ic : c )
  assert ( vigra::closeAtTolerance ( *(ib++) , ic , .00001 ) ) ;

~~~~~~~~~~~~~
 
 Class vspline::unary_functor is coded to make it easy to implement functors for things like image processing pipelines. For more complex operations, you'd code a functor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use transform() to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's transform code does the tedious work of multithreading, deinterleaving and interleaving for you, while you are left to concentrate on the interesting bit, writing the processing pipeline code. vspline::unary_functors are reasonably straightforward to set up; you can start out with scalar code, and you'll see that writing vectorized code with Vc isn't too hard either - if your code doesn't need conditionals, you can often even get away with using the same code for vectorized and unvectorized operation by simply making 'eval' a function template. Please refer to the examples. vspline offers some functional programming constructs for functor combination, like feeding one functor's output as input to the next (vspline::chain) or translating coordinates to a different range (vspline::domain).
 
 And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arbitrarily shaped and strided nD arrays of data. If you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level objects 'bspline' and 'evaluator' and the transform-like functions.
 
 While one might argue that the remap/transform routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that I can only make them really fast by tightly coupling them with the b-spline code. And the hardware can be exploited fully only by processing several values at once (by multithreading and vectorization).
 
\section speed_sec Speed

 While performance will vary from system to system and between different compile(r)s, I'll quote some measurements from my own system. I include benchmarking code (like roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:

<BLOCKQUOTE>

    testing bc code MIRROR    spline degree 3 using SIMD emulation
    avg 32 x prefilter:............................ 12.93750 ms
    avg 32 x restore original data from 1D:........ 6.03125  ms
    avg 32 x transform with ready-made bspline:.... 46.18750 ms
    avg 32 x restore original data: ............... 15.90625 ms

    testing bc code MIRROR    spline degree 3 using Vc
    avg 32 x prefilter:............................ 13.12500 ms
    avg 32 x restore original data from 1D:........ 5.50000  ms
    avg 32 x transform with ready-made bspline:.... 21.40625 ms
    avg 32 x restore original data: ............... 10.15625 ms

</BLOCKQUOTE>

As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higher-degree splines doesn't take much more time:

<BLOCKQUOTE>

    testing bc code MIRROR spline degree 5 using Vc
    avg 32 x prefilter:............................ 13.59375 ms

    testing bc code MIRROR spline degree 7 using Vc
    avg 32 x prefilter:............................ 15.00000 ms

</BLOCKQUOTE>

Using double precision arithmetics, vectorization doesn't help so much. Note that it is entirely possible to work in long double, but since these operations can't be vectorized in hardware, this is slow. vspline will - by default - use vector code for all operations - if hardware vectorization can't be used, vspline will use code to emulate the hardware vectorization, and use data types which are compatible with 'proper' SIMD data. This way, having stages in a processing pipeline using types which can't be vectorized in hardware is no problem and will not force the whole pipeline to be run in scalar code. To have vspline use scalar code, you can fix the vectorization width at 1, and at times this may even produce faster code. Again you'll have to try out what best suits your needs.

\section design_sec Design
 
 You can probably do everything vspline does with other software - there are several freely available implementations of b-spline interpolation and remap/transform routines. What I wanted to create was an implementation which was as general as possible and at the same time as fast as possible, and, on top of that, comprehensive and easy to use.

 These demands are not easy to satisfy at the same time, but I feel that my design comes  close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was from multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by independent threads. You can see the partitioning both in prefiltering and later in the transform routines.
 
 Another speedup method is data-parallel processing. This is often thought to be the domain of GPUs, but modern CPUs also offer it in the form of vector units. I chose implementing data-parallel processing in the CPU, as it offers tight integration with unvectorized CPU code. It's almost familiar terrain, and the way from writing conventional CPU code to vector unit code is not too far. Using horizontal vectorization does require some rethinking, though - mainly a conceptual shift from an AoS to an SoA approach. vspline doesn't use vertical vectorization at all, so the code may look odd to someone looking for vector representations of, say, pixels: instead of finding SIMD vectors with three elements, there are structures of three SIMD vectors of vsize elements. I chose to code so that vectorization is manifest in the program's structure. The specific mode of vectorization - emulation or explicit vectorization - can be chosen at compile time: while I implemented the vectorized code, I noticed that vectorization is, to a high degree, something that expresses itself in the code's structure: the data have to be 'presented' as SoA of vector-friendly size. If this is done, use of explicit vector code may not even be necessary: the structure of the data flow implies vectorization, and if the implicit vectorization is expressed in  a way the compiler can 'understand', it will result in vector code produced by autovectorization.
 
 To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved  memory. Keeping the data in interleaved memory is even desirable, because it preserves locality, and usually processing accesses all parts of a value (i.e. all three channels of an RGB value) at once. After the evaluation is complete, data have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on deinterleaved, vectorized data and finally store the result back to interleaved memory. The sequence of operations performed on the vectorized data constitute a processing pipeline, and some data access code will feed the pipeline and dispose of it's result. vspline's unary_functor class is designed to occupy the niche of pipeline code, while remap, apply and transform provide the feed-and-dispose code - a task which I like to call 'wielding'. So with the framework of these routines, setting up vectorized processing pipelines becomes easy, since all the boilerplate code is there already, and only the point operations/operations on single vectors need to be provided by deriving from unary_functor.

 Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via transform from a precalculated coordinate array. On my system, I have almost reached that goal - my transform times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main testing ground is <a href="https://bitbucket.org/kfj/pv/">pv</a>, my panorama viewer. Here I can often take the spline degree up to two (a quadratic spline) and still have smooth animation in full HD. Using pv has another benefit: it makes it possible to immediately *see* the results of vspline's operation. If anything is amiss, it'll likely be visible.
 
 Even without using Vc, the code is certainly fast enough for most purposes. This way, vigra becomes the only dependency.
 
 \section Literature
 
 There is a large amount of literature on b-splines available online. Here's a pick:
 
 http://bigwww.epfl.ch/thevenaz/interpolation/
 
 http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html
 
 http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-ex-1.html
*/