1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765
|
/************************************************************************/
/* */
/* vspline - a set of generic tools for creation and evaluation */
/* of uniform b-splines */
/* */
/* Copyright 2015 - 2020 by Kay F. Jahnke */
/* */
/* The git repository for this software is at */
/* */
/* https://bitbucket.org/kfj/vspline */
/* */
/* Please direct questions, bug reports, and contributions to */
/* */
/* kfjahnke+vspline@gmail.com */
/* */
/* Permission is hereby granted, free of charge, to any person */
/* obtaining a copy of this software and associated documentation */
/* files (the "Software"), to deal in the Software without */
/* restriction, including without limitation the rights to use, */
/* copy, modify, merge, publish, distribute, sublicense, and/or */
/* sell copies of the Software, and to permit persons to whom the */
/* Software is furnished to do so, subject to the following */
/* conditions: */
/* */
/* The above copyright notice and this permission notice shall be */
/* included in all copies or substantial portions of the */
/* Software. */
/* */
/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */
/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */
/* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */
/* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */
/* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */
/* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */
/* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */
/* OTHER DEALINGS IN THE SOFTWARE. */
/* */
/************************************************************************/
// This header doesn't contain any code, only the text for the main page of the documentation.
/*! \mainpage
\section intro_sec Introduction
vspline is a header-only generic C++ library for the creation and use of uniform B-splines. It aims to be as comprehensive as feasibly possible, yet at the same time producing code which performs well, so that it can be used in production. vspline is CPU-based, it does not use the GPU.
vspline's main focus is interpolation of bulk nD raster data. It was developed originally to be used for image processing software. In image processing, oftentimes large amounts of pixels need to be submitted to identical operations, suggesting a functional approach. vspline offers functional programming elements to implement such programs.
vspline was developed on a Linux system using clang++ and g++. It has not been tested much with other systems or compilers, and as of this writing I am aware that the code probably isn't fully portable. The code uses the C++11 standard.
Note: in November 2017, with help from Bernd Gaucke, vspline's companion program pv, which uses vspline heavily, was successfully compiled with 'Visual Studio Platform toolset V141'.
In 2019 I have ported pv, vspline's 'companion program', first to MacOS and then to to an msys2/mingw environment. These ports have established that vspline works on these platforms, as well, with no code modifications.
vspline relies on one other library:
- <a href="http://ukoethe.github.io/vigra/">VIGRA</a>, mainly for data handling using vigra's MultiArrayView and TinyVector types
Optionally, vspline can make use of Vc
- <a href="https://github.com/VcDevel/Vc">Vc</a>, for the use of explicit vectorization
I find VIGRA indispensible, omitting it from vspline is not really an option. Use of Vc is optional, though, and may be activated by defining 'USE_VC'. This should be done by passing -D USE_VC to the compiler; defining USE_VC only for parts of a project may or may not work. Please note that vspline uses Vc's 1.3 or 1.4 branch, not the master branch. 1.3 is what you are likely to find in your distro's packet repositories; if you check out Vc from github, make sure you pick the 1.4 branch. Vc support for new vector units may peter out as std::simd (in c++17) emerges, but for now it's a good choice.
I have made an attempt to generalize the code so that it can handle
- fundamental numeric data types and their aggregates (like, pixels)
- a reasonable selection of boundary conditions
- arbitrary spline degrees (up to 45)
- arbitrary dimensions of the spline
- in multithreaded code
- using the CPU's vector units if possible
On the evaluation side I provide
- evaluation of the spline at point locations in the defined range
- evaluation of the spline's derivatives
- mapping of arbitrary coordinates into the defined range
- evaluation of nD arrays of coordinates (remap function)
- generalized 'transform' and 'apply' functions
- restoration of the original data from the coefficients
To produce maximum perfomance, vspline also has a fair amount of collateral code,
and some of this code may be helpful beyond vspline:
- range-based multithreading with a thread pool
- functional constructs using vspline::unary_functor*
- forward-backward n-pole recursive filtering*
- separable convolution*
- efficient access to the b-spline basis functions
- extremely precise precalculated constants
\*note that the filtering code and the transform-type routines multithread and vectorize automatically.
\section install_sec Installation
Note that vspline's takes it's claim to being 'generic' seriously: when template arguments are offered to produce variant code, you can depend on actually getting code for every allowed value, rather than having to implement the special cases yourself and then introducing them to some generic dispatch mechanism. So you want to create a 4-D b-spline over five-channel long double data? No problem. And spline degree is even a run-time parameter and not a template argument, making the code more flexible than code which requires that you specify the spline degree at compile-time.
vspline is available as a debian package, so the easiest way to install vspline is to install the vspline package, if it is provided by your package manager. this will also take care of the dependencies and install the examples. ubuntu should provide a vspline package from 18.04 on. As of this writing, vspline is still under development, and the master branch in the git repository may offer new/different functionality to the code in the package. While I try to keep the 'higher-level' functions' and objects' interface consistent, the signatures may still change, especially in their default arguments, so switching from the packaged version to the master branch, or upgrading to a newer packaged version may require changes to the calling code.
If you're installing manually, note that vspline is header-only, so it's sufficient to place the headers where your code can access them. They must all reside in the same folder. VIGRA (and optionally Vc) are supposed to be installed in a location where they can be found so that includes along the lines of #include <vigra/...> succeed.
\section compile_sec Compilation
While your distro's packages may be sufficient to get vspline up and running, you may need newer versions of VIGRA and Vc. At the time of this writing the latest versions commonly available were Vc 1.3.0 and VIGRA 1.11.0; I compiled Vc and VIGRA from source, using up-to-date pulls from their respective repositories. Vc 0.x.x will not work with vspline, and only Vc's 1.3 and 1.4 branches are compatible with vspline. Using Vc only makes sense if you are aiming for maximum performance; vspline's code autovectorizes well and oftentimes performs just about as well without Vc, but if you are writing more complex pipelines, the gains from using explicit vectorization tend to increase. Some components of vspline, like prefiltering and the genralized digital filter code, do not benefit from Vc use.
update: ubuntu 17.04 has vigra and Vc packages which are sufficiently up-to-date.
To compile software using vspline, I preferably use clang++. I found that clang++ often produces faster code than other compilers.
~~~~~~~~~~~~~~
clang++ -pthread -O3 -march=native --std=c++11 your_code.cc
// or, using Vc:
clang++ -D USE_VC -pthread -O3 -march=native --std=c++11 your_code.cc -lVc
~~~~~~~~~~~~~~
-pthread is needed to run the automatic multithreading used for most of vspline's bulk data processing, like prefiltering or the transform routines. It's possible to disable multithreading by passing -D VSPLINE_SINGLETHREAD, in which case you can omit -pthread:
~~~~~~~~~~~~~~
clang++ -DVSPLINE_SINGLETHREAD --std=c++11 your_code.cc
~~~~~~~~~~~~~~
optimization is essential to get decent performance, but if you just work with a few values or have trouble debugging optimzed code, you can omit it. I have found using -Ofast usually works as well.
vspline uses the C++11 standard and will not compile with lower standard versions. using C++14 is fine.
vigraimpex (VIGRA's image import/export library) is used by some of the examples, for these you'll need an additional -lvigraimpex
Please note that an executable using vectorization produced for your system may likely not work on a machine with another CPU. It's best to compile for the intended target architecture explicitly using -march... 'Not work' in this context means that it may as well crash due to an illegal instruction or wrong alignment. If you're compiling for the same system you're on, you can use -march=native. If you don't specify the architecture for the compilation, you'll likely get some lowest-common-denominator binary (like, SSE) which will not exploit your hardware optimally, but run on a wider range of systems. If you intend to build software which can run on several vector ISAs, there is a simple trick: build a separate object file for each ISA where you use a preprocessor statement which #defines vspline to be some ISA-specific value, the use a dispatcher class calling ISA-specific variants via a pure virtual base class. In pv, vspline's companion program, this technique is used and you can study it in more detail.
If you tell the compiler to do so, autovectorization will still be used even without Vc being present, and if you compile for a given target architecture, the binary may contain vector instructions specific to the architecture and will crash on targets not supporting the relevant ISA.
If you don't want to use clang++, g++ will also work. In my tests I found that clang++ produced faster code, but this may or may not be the case for specific compiler versions or taget architectures. You'll have to try it out.
\section license_sec License
vspline is free software, licensed under this license:
<BLOCKQUOTE>
vspline - a set of generic tools for creation and evaluation
of uniform b-splines
Copyright 2015 - 2020 by Kay F. Jahnke
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the
Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
</BLOCKQUOTE>
\section strategy_sec Strategy
This section lines out the strategies used in vspline to get from
original data to an interpolated result. If you prefer to start
coding straight away, continue to the next section.
When you are using vspline, your information goes through several
stages. Starting out with your original data, the first step is to
'prefilter' these data to obtain 'b-spline coefficients'. vspline
keeps these coefficients in a 'bspline' object. Next you obtain an
'evaluator' using these coefficients. This is a functor; you call it
with (real) coordinates and receive interpolated values. While you can
produce your results by repeated calls to the evaluator, vspline also
offers mass evaluation, a typical example is a 'classic' remap, where
you provide an array full of coordinates and receive an array full of
interpolated values. Using vspline's mass evaluation methods is a
good idea, because this is done very efficiently using multithreading
and hardware vectorization. I'll go through these stages in detail.
'direct' interpolation uses some arithmetic construct to derive an
interpolated value from several data points in the 'vicinity' of the
interpolation locus. The value of this arithmetic construct collapses
to the data point itself if it coincides with the interpolation locus.
b-splines are *not* direct interpolators: the interpolated value is not
directly derived from an arithmetic construct involving the data points,
instead it is derived from a set of coefficients which have been
produced from the data points by 'prefiltering' them. This implies
a sequence of operations which has to be followed in order to use
b-spline interpolation:
- the 'original' data (aka 'knot point data') are 'prefiltered',
yielding the b-spline's 'coefficients'
- for evaluation, a subset of the coefficients is selected and
processed to yield the interpolated value.
This two-step process has one distict disadvantage: the coefficients
have to be held in memory until all evaluations of the spline are done.
While there is no way around storing the coefficients, storing the
original data is *not* necessary to evaluate the spline. So if the
original data aren't used somewhere 'further down the line', they can
be discarded as soon as the coefficients are known. Discarding the
original data can even be done *while* calculating the coefficients,
and it is possible to replace the original data with the corresponding
coefficients in the process of prefiltering: the operation can be done
'in place'. vspline can exploit situations where the original data
won't be used after the coefficients have been produced, reducing
storage requirements to just above what is needed for the original
data. This can be done in two ways:
- when first obtaining the original data (like, from a file), they
can be immediately placed into a bspline object's 'core', where they
are prefiltered in-place. This is the best way since it requires no
further intermediate storage. For an example, see ca_correct.cc,
where this way of handling data is used in main(), look for
'reading image'. another example is in 'gradient.cc', where the
original data are produced arithmetically and directly placed
into the bspline object's 'core'.
- if the data are already present in memory, they can be 'sucked into'
a bspline object during prefiltering. This is achieved by creating the
bspline object and passing the data to it's prefilter routine. This
way is second best: the original data will only be read once, and as
soon as the coefficients are ready, they may or may not be discarded.
You can find an example for this strategy in n_shift.cc, look for
invocations of 'prefilter'.
Of course, if you don't care for maximum performance, you can use a third
way, which is the most legible way to code the process:
- Start out with the data in memory and *assign* them to the bspline
object's 'core', then call prefilter.
Performance-wise this is the worst way of doing it, since the data
are read from one part of memory, written to another part (the bspline
objet's core) and reread from there as soon as prefiltering starts.
Now you might ask why a bspline object shouldn't be created by somehow
transforming the raw data. This is due to the fact that vspline holds
more coefficients than original data: for efficient evaluation without
needs for bounds checking, a small 'frame' of additional coefficients
is put around the 'core' coefficients. This frame is handled by the
bspline object, and the space in memory holding the 'core' coefficients
is a 'subarray' of the total amount of coefficients held by the bspline
object. It is possible to 'adopt' storage holding original data with a
bspline object, but this can only be done if the raw data are already
surrounded by a suitably-sized frame of additional storage to accept
the 'frame' after prefiltering. If you set up your data so that the
'frame' is present, you can make the spline 'adopt' the data when it
is created, but you might as well create the bspline object first and
populate it's core afterwards, instead of creating, populating and
adpoting the data.
When you set up a bspline object, you instantiate class bspline with
the data type it should use for coefficients. Oftentimes this will
be the same data type as your original data. But you can use a different
type for the coefficients, as long as the types are convertible. When
using a different type, you can still use the overload of 'prefilter'
'sucking in' the original data: they will be converted in the process.
This is especially useful if your source data are of a type with
small dynamic range (like 8 bit pixels), which aren't well-suited
for arithmetic operations. While you get most precise results with
real data types (like float) for b-spline coefficients, you are free
to operate with integral b-spline coefficients: per default, all
internal calculations (prefiltering and evaluation) are done with
suitably 'promoted' data types, so when using integral coefficients,
you only suffer from quantization errors. You have freedom of choice
here: your original data, the b-spline coefficients, the type used
for internal calculations and the result type of the evaluation may
all be different, while there are sensible defaults if you don't want
to concern yourself with the details. The only constraint is that
arithmetics have to be done with a real data type.
Note that prefiltering is highly optimized: the operation is
multithreaded automatically, and it is also vectorized, provided
that the data types you're using can be handled by your processor's
vector units. Even when you aren't using Vc, prefiltering will use
the CPU's vector units via a process I call 'goading': The data are
presented in small aggregates of vector-friendly size, and with the
filter code being reasonably trivial, the operation is auto-vectorized
if permitted by the given compiler flags (like -mavx).
Whichever way of producing the coefficients you may use, eventually
you'll have a 'bspline' object holding the coefficients in a form
which can be used by vspline's evaluation mechanism. Evaluation
involves two steps:
- first you create an 'evaluator' object. This is a functor which
is created from a bspline object and provides methods to evaluate
the spline.
- next you use the evaluator to obtain interpolated values.
Yet again you might ask why there isn't simply an evaluation method
in class bspline itself. This is due to the fact that evaluation
can be done in different ways, which are specified when the 'evaluator'
object is created. And when I say 'evaluator' here, I use the term
for a whole set of objects capable of evaluating a b-spline, where
the capabilities may go beyond what class vspline::evaluator provides.
Consider, for example, the factory functions make_evaluator() and
make_safe_evaluator() in eval.h. They create functors which contain
vspline::evaluator objects specialized for the task at hand, and
extra arithmetics on the input for 'safe' evaluators - and they
return the functors as type-erased objects in a common type.
Once an evaluator is ready, it can be used without further
specialization: it's immutable after creation and constitutes a
'pure' functor in a functional-programming sense. It is an object
which can be passed around (it's trivially copyable) even between
several threads. Creating an evaluator is a cheap operation, so
the strategy is to create an evaluator specific to the task at hand
and use this specific evaluator for the specific task only.
Evaluator objects are also lightweight, so having many evaluators
does not use much memory. Evaluators refer to the coefficients, and
having an evaluator for a given b-spline object does not automatically
keep the coefficients 'alive'. If you destroy the bspline object,
evaluators referring to the coefficients hold dangling pointers and
using them will produce undefined behaviour. It's your responsibility
to keep the bspline object alive until the coefficients won't be
used anymore.
vspline uses RAII, and the way to create evaluators fits this scheme.
Typically you'd create the evaluator in some inner scope just before it
is used, letting it perish just afterwards when leaving the scope.
There is a third aspect to the strategy proposed by vspline. While
the obvious way of using an evaluator would be to write a loop in
which the evaluator is called repeatedly, this is only a good idea
if performance is not an issue. If, on the other hand, you have large
amounts of data to process, you want to employ methods like multithreading
and vectorization to get better performance. You are of course free to
use whatever multithreading and vectorization code you like, but you
might as well use what vspline offers: I call it 'wielding' code. In
an evaluation context, it means applying functors to large data sets.
vspline offers code for the purpose which automatically multithreads
and vectorizes the operation of evaluating a b-spline at many locations.
You can pass an array full of coordinates and receive an array of values,
an operation called 'remapping'. vspline even goes further and uses
an abstraction of this process: it has code to feed large amounts of
data to a functor, where the effect is that of processing all data in
turn, while the operation is in fact multithreaded and vectorized
automatically. The result is stored in a (possibly multidimensional)
array. A 'classic' remap is merely a special case of this procedure,
which, in vspline, is done by vspline::transform().
When coding geometric transformations, the task can often be reduced to
formulating a function which, given a 'target' coordinate - a coordinate
into the result array - can produce a 'source' coordinate, namely the
locus of the interpolation. For such situations, vspline offers an
overload of vspline::transform which feeds the functor with discrete
target coordinates. If the functor is merely an evaluator, the result
will be a 'restoration' of the original data. But if you use a functor
transforming the 'incoming' coordinate and feeding the transformed
coordinate to en evaluator, the result will be a geometrically
transformed image, volume, etc...
Creating such functors is actually quite simple, and the examples
have code to show how it's done. Have a look at gsm.cc, which produces
an image holding the gradient squared magnitude of an input image.
There, the target pixels are created one by one in a loop. Now look
at gsm2.cc. The result is the same, but here it is obtained by
coding the 'interesting' bit as a functor derived from
vspline::unary_functor and the coordinate-fed overload of
vspline::transform is used. You can see that the coding is quite
straightforward. What is not immediately apparent is the fact that
doing it this way is also much faster: the operation is automatically
multithreaded and, possibly, vectorized. For a single image,
the difference is not a big deal, but when you're producing images
for real-time display, every millisecond counts.
My use of functors in vspline isn't as straightforward as using,
say, std::function. This is due to the fact that the functors used in
vspline can process both unvectorized and vectorized data. The relevant
'eval' routines provide the dual functionality either as two distinct
overloads or as a template if the operation can be coded as one.
This is why I had to code my own version of the subset of functional
programming constructs used in vspline, rather than using some ready-made
expression template code.
There is one last step which I'll mention briefly. Suppose you have
a bspline object and you want to recreate your original data from the
coefficients it holds. vspline offers functions called 'restore' for
the purpose. Restoration will only be possible within arithmetic
precision - if you are using integral data, you may suffer from
quantization errors, with real data, the restored data will be very
close to the original data. Restoration is done with a separable
convolution, which is multithreaded and vectorized automatically,
just like prefiltering or transformations, so it's also very fast.
vspline's prefiltering code, which is simply a forward-backward
recursive n-pole filter, and the convolution code used for restoration,
can be used to apply other filters than the b-spline prefilter or
restoration, with the same benefits of multithreading and vectorization,
requiring little more than the specification of the filter itself,
like the convolution kernel to use. And vspline's 'range-based
multithreading' is also commodified so that you can use it for your own
purposes with just a few lines of code, see a simple example in
eval.cc, where the speed test is run with a thread pool to get
a measurement of the overall evaluation performance.
How about processing 1D data like sounds or simple time series?
vspline has specialized code for the purpose. While simply iterating
over the 1D sequence is trivial, it's slow. So vspline's specialized
1D code uses mathematical tricks to process the 1D data with multiple
threads and vectorization just as images or volumes. These specializations
exist for prefiltering, transformations and restoration, which are
therefore just as fast as for nD data, and, as far as the filters are
concerned, even faster, since the filter has to be applied along one axis
only. As a user, again you don't have to 'do' anything, the 1D code is
used automatically if your data are 1D. The mathematical trickery only
fails if you require 'zero tolerance' for the coefficient generation,
which forces the data to be prefiltered en bloc - an option which only
makes sense in rare circumstances.
vspline will accept high-dimension data, but when dimensionality gets
very high, vspline's use of 'braced' coefficient arrays has some drawbacks:
with increasing dimensionality, the 'surface' of an array grows very large,
and the 'brace' around the coefficients is just such a surface, possibly
several units wide. So with high-D splines, the braced coefficient array
is (potentially much) larger than the knot point data. You've been warned ;)
An important specialization is code for degree-0 and degree-1
b-splines. These are old acquaintances in disguise: a degree-0 b-spline
is simply another name for a nearest-neigbour interpolation, and it's
degree-1 cousin encodes linear interpolation. While the 'general-purpose'
b-spline evaluation code can be used to evaluate degree-0 and degree-1
b-splines, doing so wastes some potential for optimization. vspline's
evaluator objects can be 'specialized' for degree-0 and degree-1 splines,
and when you're using the factory functions like make_evaluator, this
is done automatically. With the optimizations for low-degree splines and
1D data, I feel that the 'problem' at hand - bspline processing - is
dealt with exhaustively, leaving no use scenario with second-best
solutions but providing good performance throughout.
\section quickstart_sec Quickstart
vspline uses vigra to handle data. There are two vigra data types which are used throughout: vigra::MultiArrayView is used for multidimensional arrays. It's a thin wrapper around the three parameters needed to describe arbitrary n-dimensional arrays of data in memory: a pointer to some 'base' location coinciding with the coordinate origin, a shape and a stride. If your code does not use vigra MultiArrayViews, it's easy to create them for the data you have: vigra offers a constructor for MultiArrayViews taking these three parameters. Very similar is vigra's MultiArray, which is is a MultiArrayView owning and allocating the data it refers to.
The other vigra data type used throughout vspline is vigra::TinyVector, a small fixed-size container type used to hold things like multidimensional coordinates or pixels. This type is also just a wrapper around a small 1D C array. It's zero overhead and contains nothing else, but offers lots of functionality like arithmetic operations. I recommend looking into vigra's documentation to get an idea about these data types, even if you only wrap your extant data in them to interface with vspline. vspline follows vigra's default axis ordering scheme: the fastest-varying index is first, so coordinates are (x,y,z...). This is important to keep in mind when evaluating a b-spline. Always pass the x coordinate first. Coordinates, strides and shapes are given in units of to the MultiArrayView's value_type. So if you have a MultiArrayView of, say, float RGB pixels, a stride of one means an offset of ( 3 * sizeof ( float ) ) in bytes. This is important to note, since there is no standard for expressing strides - some systems use bytes, some use the elementary type. vigra uses the array's value_type for the unit step.
If you stick with the high-level code, using class bspline or the transform functions, most of the parametrization is easy. Here are a few quick examples what you can do. This is really just to give you a first idea - there is example code covering most features of vspline where things are covered in more detail, with plenty of comments. the code in this text is also there, see quickstart.cc.
Let's suppose you have data in a 2D vigra MultiArray 'a'. vspline can handle integer and real data like float and double, and also their 'aggregates', meaning data types like pixels or vigra's TinyVectors. But for now, let's assume you have plain float data. Creating the bspline object is easy:
~~~~~~~~~~~~~~
#include "vspline/vspline.h"
// given a 10 X 20 vigra::MultiArray of data
vigra::MultiArray < 2 , float > a ( 10 , 20 ) ;
// let's initialize the whole array with 42
a = 42 ;
// fix the type of the corresponding b-spline. We want a bspline
// object with float coefficients and two dimensions:
typedef vspline::bspline < float , 2 > spline_type ;
// create bspline object 'bspl' fitting the shape of your data
spline_type bspl ( a.shape() ) ;
// copy the source data into the bspline object's 'core' area
bspl.core = a ;
// run prefilter() to convert original data to b-spline coefficients
bspl.prefilter() ;
~~~~~~~~~~~~~~
The memory needed to hold the coefficients is allocated when the bspline object is constructed.
Obviously many things have been done by default here: The default spline degree was used - it's 3, for a cubic spline. Also, boundary treatment mode 'MIRROR' was used per default. The spline is 'braced' so that it can be evaluated with vspline's evaluation routines, and the process is automatically partitioned and run in parallel by a thread pool. The only mandatory template arguments are the value type and the number of dimensions, which have to be known at compile time.
While the sequence of operations indicated here looks a bit verbose (why not create the bspline object by a call like bspl(a) ?), in 'real' code you would use bspl.core straight away as the space to contain your data - you might get the data from a file or by some other process or do something like this where the bspline object provides the array and you interface it via a view to it's 'core':
~~~~~~~~~~~~~~
vspline::bspline < double , 1 > bsp ( 10001 , degree , vspline::MIRROR ) ;
auto v1 = bsp.core ; // get a view to the bspline's 'core'
for ( auto & r : v1 ) r = ... ; // assign some values
bsp.prefilter() ; // perform the prefiltering
~~~~~~~~~~~~~~
This is a common idiom, because it reflects a common mode of operation where you don't need the original, unfiltered data any more after creating the spline, so the prefiltering is done in-place overwriting the original data. If you do need the original data later, you can also use a third idiom:
~~~~~~~~~~~~~~
vigra::MultiArrayView < 3 , double > my_data ( vigra::Shape3 ( 5 , 6 , 7 ) ) ;
vspline::bspline < double , 3 > bsp ( my_data.shape() ) ;
bsp.prefilter ( my_data ) ;
~~~~~~~~~~~~~~
Here, the bspline object is first created with the appropriate 'core' size, and prefilter() is called with an array matching the bspline's core. This results in my_data being read into the bspline object during the first pass of the prefiltering process.
There are more ways of setting up a bspline object, please refer to class bspline's constructor. Of course you are also free to directly use vspline's lower-level routines to create a set of coefficients. The lowest level of filtering routine is simply a forward-backward recursive filter with a set of arbitrary poles. This code is in prefilter.h.
Next you may want to evaluate the spline from the first example at some pair of coordinates x, y. Evaluation of the spline can be done using vspline's 'evaluator' objects. Using the highest level of access, these objects are set up with a bspline object and, after being set up, provide methods to evaluate the spline at given coordinates. Technically, evaluator objects are functors which don't have mutable state (all state is created at creation time and remains constant afterwards), so they are thread-safe and 'pure' in a functional-programming sense. The evaluation is done by calling the evaluator's eval() member function, which takes it's first argument (the coordinate) as a const reference and writes the result to it's second argument, which is a reference to a variable capable of holding the result.
~~~~~~~~~~~~~~
// for a 2D spline, we want 2D coordinates
typedef vigra::TinyVector < float , 2 > coordinate_type ;
// get the appropriate evaluator type
typedef vspline::evaluator < coordinate_type , float > eval_type ;
// create the evaluator
eval_type ev ( bspl ) ;
// create variables for input and output,
coordinate_type coordinate ( 3 , 4 ) ;
float result ;
// use the evaluator to evaluate the spline at ( 3 , 4 )
// storing the result in 'result'
ev.eval ( coordinate , result ) ;
~~~~~~~~~~~~~~
Again, some things have happened by default. The evaluator was constructed from a bspline object, making sure that the evaluator is compatible.
Class evaluator does provide operator(). So do all the other functors vspline
can generate. This is for convenience, so you can use vspline's unary_functors
like a 'normal' function:
~~~~~~~~~~~~~
// evaluators can be called as a function
auto r = ev ( coordinate ) ;
assert ( r == result ) ;
~~~~~~~~~~~~~
vspline offers a limited set of functional programming constructs - as of this
writing, it provides just those constructs which it uses itself, usually in
factory functions. You can find the functional constructs in unary_functor.h.
While it's tempting to implement more along the lines of expression templates,
I have tried to keep things limited to a comfortable minimum. Most of the time
user code may remain ignorant of the functional programming aspects - the
functional constructs obtained from the factory functions can just be assigned
to auto variables; it's usually not necessary to make these types explicit.
What about the remap function? The little introduction demonstrated how you can evaluate the spline at a single location. Most of the time, though, you'll require evaluation at many coordinates. This is what remap does. Instead of a single coordinate, you pass a whole vigra::MultiArrayView full of coordinates to it - and another MultiArrayView of the same dimension and shape to accept the results of evaluating the spline at every coordinate in the first array. Here's a simple example, using the same array 'a' as above:
~~~~~~~~~~~~
// create a 1D array containing three (2D) coordinates
vigra::MultiArray < 1 , coordinate_type > coordinate_array ( 3 ) ;
// we initialize the coordinate array by hand...
coordinate_array = coordinate ;
// create an array to accommodate the result of the remap operation
vigra::MultiArray < 1 , float > target_array ( 3 ) ;
// perform the remap
vspline::remap ( a , coordinate_array , target_array ) ;
// reassure yourself the result is correct
auto ic = coordinate_array.begin() ;
for ( auto k : target_array )
assert ( k == ev ( *(ic++) ) ;
~~~~~~~~~~~~
This is an 'ad-hoc' remap, passing source data as an array. You can also set up a bspline object and perform a 'transform' using an evaluator for this bspline object, with the same effect:
~~~~~~~~~~~~
// instead of the remap, we can use transform, passing the evaluator for
// the b-spline over 'a' instead of 'a' itself. the result is the same.
vspline::transform ( ev , coordinate_array , target_array ) ;
~~~~~~~~~~~~
This routine has wider scope: while in this example, ev is a b-spline evaluator, ev's type can be any functor capable of yielding a value of the type held in 'target_array' for a value held in 'coordinate_array'. Here, you'd typically use an object derived from class vspline::unary_functor, and vspline::evaluator is in fact derived from this base class. A unary_functor's input and output can be any data type suitable for processing with vspline, you're not limited to things which can be thought of as 'coordinates' etc.
This generalization of remap is named 'transform' and is similar to vigra's point operator code, but uses vspline's automatic multithreading and vectorization to make it very efficient. There's a variation of it where the 'coordinate array' and the 'target array' are the same, effectively performing an in-place transformation, which is useful for things like coordinate transformations or colour space manipulations. This variation is called vspline::apply.
There is one variation of transform(). This overload doesn't take a 'coordinate array', but instead feeds the unary_functor with discrete coordinates of the target location that is being filled in.
It's probably easiest to understand this variant if you start out thinking of feeding the previous transform() with an array which contains discrete indices. In 2D, this array would contain
<BLOCKQUOTE>
(0,0) , (1,0) , (2,0) ...
(0,1) , (1,1) , (2,1) ...
...
</BLOCKQUOTE>
So why would you set up such an array, if it merely contains the coordinates of every cell? You might as well create these values on-the-fly and omit the coordinate array. This is precisely what the second variant of transform does:
~~~~~~~~~~~~~
// create a 2D array for the result of the index-based transform operation
vigra::MultiArray < 2 , float > target_array_2d ( 3 , 4 ) ;
// use transform to evaluate the spline for the coordinates of
// all values in this array
vspline::transform ( ev , target_array_2d ) ;
// verify
for ( int x = 0 ; x < 3 ; x ++ )
{
for ( int y = 0 ; y < 4 ; y++ )
{
coordinate_type c { x , y } ;
assert ( target_array_2d [ c ] == ev ( c ) ) ;
}
}
~~~~~~~~~~~~~
If you use this variant of transform directly with a vspline::evaluator, it will reproduce your original data - within arithmetic precision of the evaluation. While this is one way to restore the original data, there's also a (more efficient) routine called 'restore'.
~~~~~~~~~~~~~
// use an index-based transform to restore the original data to 'b'
vigra::MultiArray < 2 , float > b ( a.shape() ) ;
vspline::transform ( ev , b ) ;
// confirm the restoration has succeeded
auto ia = a.begin() ;
for ( auto r : b )
assert ( vigra::closeAtTolerance ( *(ia++) , r , .00001 ) ) ;
// now use vspline::restore to restore the original data into 'c'
vigra::MultiArray < 2 , float > c ( a.shape() ) ;
vspline::restore ( bspl , c ) ;
// confirm that both methods produced similar results
auto ib = b.begin() ;
for ( auto & ic : c )
assert ( vigra::closeAtTolerance ( *(ib++) , ic , .00001 ) ) ;
~~~~~~~~~~~~~
Class vspline::unary_functor is coded to make it easy to implement functors for things like image processing pipelines. For more complex operations, you'd code a functor representing your processing pipeline - often by delegating to 'inner' objects also derived from vspline::unary_functor - and finally use transform() to bulk-process your data with this functor. This is about as efficient as it gets, since the data are only accessed once, and vspline's transform code does the tedious work of multithreading, deinterleaving and interleaving for you, while you are left to concentrate on the interesting bit, writing the processing pipeline code. vspline::unary_functors are reasonably straightforward to set up; you can start out with scalar code, and you'll see that writing vectorized code with Vc isn't too hard either - if your code doesn't need conditionals, you can often even get away with using the same code for vectorized and unvectorized operation by simply making 'eval' a function template. Please refer to the examples. vspline offers some functional programming constructs for functor combination, like feeding one functor's output as input to the next (vspline::chain) or translating coordinates to a different range (vspline::domain).
And that's about it - vspline aims to provide all possible variants of b-splines, code to create and evaluate them and to do so for arbitrarily shaped and strided nD arrays of data. If you dig deeper into the code base, you'll find that you can stray off the default path, but there should rarely be any need not to use the high-level objects 'bspline' and 'evaluator' and the transform-like functions.
While one might argue that the remap/transform routines I present shouldn't be lumped together with the 'proper' b-spline code, I feel that I can only make them really fast by tightly coupling them with the b-spline code. And the hardware can be exploited fully only by processing several values at once (by multithreading and vectorization).
\section speed_sec Speed
While performance will vary from system to system and between different compile(r)s, I'll quote some measurements from my own system. I include benchmarking code (like roundtrip.cc in the examples folder). Here are some measurements done with "roundtrip", working on a full HD (1920*1080) RGB image, using single precision floats internally - the figures are averages of 32 runs:
<BLOCKQUOTE>
testing bc code MIRROR spline degree 3 using SIMD emulation
avg 32 x prefilter:............................ 12.93750 ms
avg 32 x restore original data from 1D:........ 6.03125 ms
avg 32 x transform with ready-made bspline:.... 46.18750 ms
avg 32 x restore original data: ............... 15.90625 ms
testing bc code MIRROR spline degree 3 using Vc
avg 32 x prefilter:............................ 13.12500 ms
avg 32 x restore original data from 1D:........ 5.50000 ms
avg 32 x transform with ready-made bspline:.... 21.40625 ms
avg 32 x restore original data: ............... 10.15625 ms
</BLOCKQUOTE>
As can be seen from these test results, using Vc on my system speeds evaluation up a good deal. When it comes to prefiltering, a lot of time is spent buffering data to make them available for fast vector processing. The time spent on actual calculations is much less. Therefore prefiltering for higher-degree splines doesn't take much more time:
<BLOCKQUOTE>
testing bc code MIRROR spline degree 5 using Vc
avg 32 x prefilter:............................ 13.59375 ms
testing bc code MIRROR spline degree 7 using Vc
avg 32 x prefilter:............................ 15.00000 ms
</BLOCKQUOTE>
Using double precision arithmetics, vectorization doesn't help so much. Note that it is entirely possible to work in long double, but since these operations can't be vectorized in hardware, this is slow. vspline will - by default - use vector code for all operations - if hardware vectorization can't be used, vspline will use code to emulate the hardware vectorization, and use data types which are compatible with 'proper' SIMD data. This way, having stages in a processing pipeline using types which can't be vectorized in hardware is no problem and will not force the whole pipeline to be run in scalar code. To have vspline use scalar code, you can fix the vectorization width at 1, and at times this may even produce faster code. Again you'll have to try out what best suits your needs.
\section design_sec Design
You can probably do everything vspline does with other software - there are several freely available implementations of b-spline interpolation and remap/transform routines. What I wanted to create was an implementation which was as general as possible and at the same time as fast as possible, and, on top of that, comprehensive and easy to use.
These demands are not easy to satisfy at the same time, but I feel that my design comes close. While generality is achieved by generic programming, speed needs exploitation of hardware features, and merely relying on the compiler is not enough. The largest speedup I saw was from multithreading the code. This may seem like a trivial observation, but my design is influenced by it: in order to efficiently multithread, the problem has to be partitioned so that it can be processed by independent threads. You can see the partitioning both in prefiltering and later in the transform routines.
Another speedup method is data-parallel processing. This is often thought to be the domain of GPUs, but modern CPUs also offer it in the form of vector units. I chose implementing data-parallel processing in the CPU, as it offers tight integration with unvectorized CPU code. It's almost familiar terrain, and the way from writing conventional CPU code to vector unit code is not too far. Using horizontal vectorization does require some rethinking, though - mainly a conceptual shift from an AoS to an SoA approach. vspline doesn't use vertical vectorization at all, so the code may look odd to someone looking for vector representations of, say, pixels: instead of finding SIMD vectors with three elements, there are structures of three SIMD vectors of vsize elements. I chose to code so that vectorization is manifest in the program's structure. The specific mode of vectorization - emulation or explicit vectorization - can be chosen at compile time: while I implemented the vectorized code, I noticed that vectorization is, to a high degree, something that expresses itself in the code's structure: the data have to be 'presented' as SoA of vector-friendly size. If this is done, use of explicit vector code may not even be necessary: the structure of the data flow implies vectorization, and if the implicit vectorization is expressed in a way the compiler can 'understand', it will result in vector code produced by autovectorization.
To use vectorized evaluation efficiently, incoming data have to be presented to the evaluation code in vectorized form, but usually they will come from interleaved memory. Keeping the data in interleaved memory is even desirable, because it preserves locality, and usually processing accesses all parts of a value (i.e. all three channels of an RGB value) at once. After the evaluation is complete, data have to be stored again to interleaved memory. The deinterleaving and interleaving operations take time and the best strategy is to load once from interleaved memory, perform all necessary operations on deinterleaved, vectorized data and finally store the result back to interleaved memory. The sequence of operations performed on the vectorized data constitute a processing pipeline, and some data access code will feed the pipeline and dispose of it's result. vspline's unary_functor class is designed to occupy the niche of pipeline code, while remap, apply and transform provide the feed-and-dispose code - a task which I like to call 'wielding'. So with the framework of these routines, setting up vectorized processing pipelines becomes easy, since all the boilerplate code is there already, and only the point operations/operations on single vectors need to be provided by deriving from unary_functor.
Using all these techniques together makes vspline fast. The target I was roughly aiming at was to achieve frame rates of ca. 50 fps in RGB and full HD, producing the images via transform from a precalculated coordinate array. On my system, I have almost reached that goal - my transform times are around 25 msec (for a cubic spline), and with memory access etc. I come up to frame rates over half of what I was aiming at. My main testing ground is <a href="https://bitbucket.org/kfj/pv/">pv</a>, my panorama viewer. Here I can often take the spline degree up to two (a quadratic spline) and still have smooth animation in full HD. Using pv has another benefit: it makes it possible to immediately *see* the results of vspline's operation. If anything is amiss, it'll likely be visible.
Even without using Vc, the code is certainly fast enough for most purposes. This way, vigra becomes the only dependency.
\section Literature
There is a large amount of literature on b-splines available online. Here's a pick:
http://bigwww.epfl.ch/thevenaz/interpolation/
http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-basis.html
http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/spline/B-spline/bspline-ex-1.html
*/
|