1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512
|
/************************************************************************/
/* */
/* vspline - a set of generic tools for creation and evaluation */
/* of uniform b-splines */
/* */
/* Copyright 2015 - 2023 by Kay F. Jahnke */
/* */
/* The git repository for this software is at */
/* */
/* https://bitbucket.org/kfj/vspline */
/* */
/* Please direct questions, bug reports, and contributions to */
/* */
/* kfjahnke+vspline@gmail.com */
/* */
/* Permission is hereby granted, free of charge, to any person */
/* obtaining a copy of this software and associated documentation */
/* files (the "Software"), to deal in the Software without */
/* restriction, including without limitation the rights to use, */
/* copy, modify, merge, publish, distribute, sublicense, and/or */
/* sell copies of the Software, and to permit persons to whom the */
/* Software is furnished to do so, subject to the following */
/* conditions: */
/* */
/* The above copyright notice and this permission notice shall be */
/* included in all copies or substantial portions of the */
/* Software. */
/* */
/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */
/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */
/* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */
/* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */
/* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */
/* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */
/* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */
/* OTHER DEALINGS IN THE SOFTWARE. */
/* */
/************************************************************************/
/*! \file vector.h
\brief code for horizontal vectorization in vspline
vspline currently has three ways of approaching vectorization:
- no vectorization. Scalar code is less complex since it does not
have to aggregate the data into vectorization-friendly parcels,
and for some data types, the performance is just as good as with
vectorization. Use of scalar code results from setting the
vectorization width to 1. This is usually a template argument
going by the name 'vsize'.
- Use of Vc for vectorization. This requires the presence of Vc
during compilation and results in explicit vectorization for all
elementary types Vc can handle. Vc provides code for several
operations which are outside the scope of autovectorization,
most prominently hardware gather and scatter operations, and the
explicit vectorization with Vc makes sure that vectorization is
indeed used whenever possible, rather than having to rely on the
compiler to recognize the opportunity. Use of Vc has to be
explicitly activated by defining USE_VC during compilation.
Using this option usually produces the fastest code. The downside
is the dependence on an external library which may or may not
actually implement the intended vector operations with vector
code for a given target: Newer processors may not yet be supported,
or support may be implemented for part of the instructions only.
Also, the Vc version coming from the distro's packet management
may not be up-to-date. Building processing pipelines based on
Vc::SimdArray is, on the other hand, straightforward - the type
is well-thought-out and there is good library support for many
operations. Use of Vc triggers use of fallback code for elementary
types which Vc can't vectorize - such types are pseudo-vectorized:
- The third option is to produce code which is designed to be
easily recognized by the compiler as amenable to autovectorization.
This option is implemented in simd_type.h, which defines an
arithmetic type 'vspline::simd_type' holding data in a C vector.
This is a technique I call 'goading': data are processed in small
aggregates of vector friendly size, resulting in inner loops
which oftentimes are recognized by the autovectorization stage,
resulting in hardware vector code if the compiler flags allow
for it and the compiler can generate code for the intended target.
Since this approach relies entirely on the compiler's capability
to autovectorize the (deliberately vectorization-friendly) code,
the mileage varies. If it works, this is a clean and simple
solution. A disadvantage is the use of class simd_type for
vectorization, which is mildly exotic and very much a vspline
creature - building processing pipelines using this type will
not be as effortless as using Vc::SimdArray. As long as you're
not building your own functors to be used with vspline's family
of transform-like functions, the precise mode of vectorization
remains an internal issue and you needn't concern yourself with
with it beyond choosing whether you want vspline to use Vc or not,
and choosing a suitable vectorization width if the default does
not suit you. Class vspline::simd_type can 'vectorize' every
fundamental and is used as fallback type when Vc use is allowed
but Vc can't provide a vectorized data type, like for 'long double'
data, so it will be used even with Vc active when the need arises.
It's important to understand that using SIMD is not simply mapping,
say, pixels of three floats to a vector of three floats - that would
be 'vertical' vectorization, which is represented by vspline's *scalar*
code. Instead, vspline is coded to use *horizontal* vectorization,
which produces vector data fitting the size of the vector unit's
registers, where each element held by the vector has exactly the same
meaning as every other: rather than vectors holding, like, the colour
channels of a pixel, we have a 'red', a 'green' and a 'blue' vector
holding, say, eight floats each. Horizontal vectorization is best
explicitly coded, and if it is coded explicitly, the code structure
itself suggests vectorization to the compiler. Using code like Vc
gives more structure to this process and adds capabilities beyond the
scope of autovectorization, but having the horizontal vectorization
manifest in the code's structure already goes a long way, and if the
'structurally' vectorized code autovectorizes well, that may well be
'good enough' as it is. In my experience, it is often significantly
faster than scalar code - provided the processor has vector units.
So it turns out that successful vectorization is, to a large degree,
a *conceptual change* making the intended vectorization explicit by
choosing appropriate data types. I am indebted to Matthis Kretz, the
author of the Vc library, who has opened my eyes to this fact with his
thesis: 'Extending C++ for explicit data-parallel programming via SIMD
vector types'.
With vspline::simd_type 'in the back hand' vspline code can rely on
the presence of a vectorized type for every fundamental, and, by
extension, vectorized 'xel' data - i.e. vectorized pixels, voxels
etc. which are implemented as vigra::TinyVectors of vectorized
fundamentals. This allows vspline to be coded so that it relies
on vectorization, but not necessarily on Vc: Vc is an option to
provide extra-fast, tailormade vector code for some operations,
but when it can't be used, vspline's own vector code will be used
instead, providing *the same interface*. This makes maintainance
much easier compared to a scenario where, without Vc, the code
would have to fall back to a scalar version - as indeed it did in
early vspline versions, giving me plenty of headaches.
Note that this header is included by vspline/common.h, so this code
is available throughout vspline.
*/
#ifndef VSPLINE_VECTOR_H
#define VSPLINE_VECTOR_H
#include "common.h"
// we have several SIMD back-ends to provide SIMD code, which are all
// activated by preprocessor #defines. 'Proper' SIMD code may be
// generated with Vc (USE_VC), highway (USE_HWY) and std::simd
// (USE_STDSIMD). Only one of these should be defined - it may be
// possible to mix them, but this is uncharted territory. If none
// of the 'proper' SIMD backends are specified, vspline will fall
// back to it's own 'goading' implementation, which codes SIMD
// operations as small loops, hoping that they will be autovectorized.
// when std::simd is used as backend, this directly implements
// vspline::simd_type, which is possible because std::simd supports
// all elementary types, so vspline::simd_type is not needed as
// fallback for some elementary types.
#if defined USE_STDSIMD
#include "std_simd_type.h"
// #define VSPLINE_SIMD_TYPE_H
#else // defined USE_STDIMD
// simd_type.h is needed as a fallback for those fundamentals
// which can't be vectorized, and it's also used if no SIMD library
// is available, in which case the operations are vectorized with
// 'goading', relying on autovectorization of small loops.
#include "simd_type.h"
#if defined USE_VC
#include "vc_simd_type.h"
#elif defined USE_HWY
#include "hwy_simd_type.h"
#endif
#endif // else to defined USE_STDIMD
namespace vspline
{
#ifndef VSPLINE_VECTOR_NBYTES
#if defined USE_VC
#define VSPLINE_VECTOR_NBYTES (2*sizeof(Vc::Vector<float>))
#else
#define VSPLINE_VECTOR_NBYTES 64
#endif
#endif
/// traits class simd_traits provides three traits:
///
/// - 'hsize' holds the hardware vector width if applicable (used only with Vc)
/// - 'type': template yielding the vector type for a given vectorization width
/// - 'default_size': the default vectorization width to use for T
/// default simd_traits: without further specialization, T will be vectorized
/// as a vspline::simd_type. This way, *all* types will be vectorized, there is
/// no fallback to scalar code for certain types. Scalar code will only be
/// produced if the vectorization width is set to 1 in code taking this
/// datum as a template argument. Note that the type which simd_traits produces
/// for sz == 1 is T itself, not a simd_type of one element.
template < typename T >
struct simd_traits
{
template < size_t sz > using type =
typename std::conditional < sz == 1 ,
T ,
vspline::simd_type < T , sz >
> :: type ;
static const size_t hsize = 0 ;
// the default vector size picked here comes out as 16 for floats,
// which is twice the vector size used by AVX2.
enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES
? 1
: VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ;
} ;
// next, for some SIMD backends, we specialize simd_traits for a given
// set of fundamentals. fundamental T which we don't mark this way
// will be routed to use vspline::simd_type, the 'goading' implementation.
#if defined USE_VC
// in Vc ML discussion M. Kretz states that the set of types Vc can vectorize
// (with 1.3) is consistent throughout all ABIs, so we can just list the
// acceptable types without having to take the ABI into account.
// So, for these types we specialize 'simd_traits', resulting in the use of
// the appropriate Vc::SimdArray.
#define VC_SIMD(T) \
template<> struct simd_traits<T> \
{ \
static const size_t hsize = Vc::Vector < T > :: size() ; \
template < size_t sz > using type = \
typename std::conditional \
< sz == 1 , \
T , \
vc_simd_type < T , sz > \
> :: type ; \
enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES \
? 1 \
: VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ; \
} ;
VC_SIMD(float)
VC_SIMD(double)
VC_SIMD(int)
VC_SIMD(unsigned int)
VC_SIMD(short)
VC_SIMD(unsigned short)
#undef VC_SIMD
#endif // USE_VC
#if defined USE_HWY
// with highway, we route pretty much all fundamentals to hwy_simd_type;
// highway seems to have a wider 'feeding spectrum' of fundamentals.
// additionally, route only vector widths which are a power of two
// to produce hwy_simd_type. hwy_simd_type does not handle other
// vector sizes.
#define HWY_SIMD(T) \
template<> struct simd_traits<T> \
{ \
static const size_t hsize = 0 ; \
template < size_t sz > using type = \
typename std::conditional \
< sz == 1 , \
T , \
typename std::conditional \
< ( sz & ( sz - 1 ) ) == 0 , \
hwy_simd_type < T , sz > , \
vspline::simd_type < T , sz > \
> :: type \
> :: type ; \
enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES \
? 1 \
: VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ; \
} ;
HWY_SIMD(float)
HWY_SIMD(double)
HWY_SIMD(long)
HWY_SIMD(unsigned long)
HWY_SIMD(int)
HWY_SIMD(unsigned int)
HWY_SIMD(short)
HWY_SIMD(unsigned short)
HWY_SIMD(signed char)
HWY_SIMD(unsigned char)
#undef HWY_SIMD
#endif // USE_HWY
/// with the definition of 'simd_traits', we can proceed to implement
/// 'vector_traits':
/// struct vector_traits is a traits class fixing the types used for
/// vectorized code in vspline. These types go beyond mere vectors of
/// fundamentals: most of the time, the data vspline has to process
/// are *not* fundamentals, but what I call 'xel' data: pixels, voxels,
/// stereo sound samples, etc. - so, small aggregates of a fundamental
/// type. vector_traits defines how fundamentals and 'xel' data are to
/// be vectorized.
/// with the types defined by vector_traits, a system of type names is
/// introduced which uses a set of patterns:
/// - 'ele' stands for 'elementary', the type of an aggregate's component
/// - 'nd' stands for 'n-dimensional', a type of an aggregate of one or
/// more components of a given elementary type.
/// - 'v' suffix indicates a 'simdized' type, vspline uses Vc::SimdArrays
/// and vigra::TinyVectors of Vc::SimdArrays if Vc is used and the type
/// can be used with Vc::SimdArray, and the equivalent types using
/// vspline::simd_type instead of Vc::SimdArray otherwise.
/// the unspecialized definition of class vector_traits will vectorize
/// by concatenating instances of T into the type simd_traits produces,
/// taking, per default, as many T as the default_size given there.
/// This will work with any type T, even though it makes most sense with
/// fundamentals.
template < typename T ,
size_t _vsize = 0 ,
typename Enable = void >
struct vector_traits
{
// T is not 'element-expandable', so 'dimension' is 1 and T is ele_type
enum { dimension = 1 } ;
typedef T ele_type ;
// find the right vectorization width
enum { size = _vsize == 0
? simd_traits < ele_type > :: default_size
: _vsize } ;
enum { vsize = size } ;
enum { hsize = simd_traits < ele_type > :: hsize } ;
// produce the 'synthetic' type,
typedef vigra::TinyVector < ele_type , 1 > nd_ele_type ;
// the vectorized type
template < typename U , size_t sz >
using vector = typename simd_traits < U > :: template type < sz > ;
typedef vector < ele_type , vsize > ele_v ;
// and the 'synthetic' vectorized type
typedef vigra::TinyVector < ele_v , 1 > nd_ele_v ;
// for not 'element-expandable' T, we produce ele_v as 'type'
typedef ele_v type ;
} ;
/// specialization of vector_traits for 'element-expandable' types.
/// These types are recognized by vigra's ExpandElementResult mechanism,
/// resulting in the formation of a 'vectorized' version of the type.
/// These data are what I call 'xel' data. As explained above, vectorization
/// is *horizontal*, so if T is, say, a pixel of three floats, the type
/// generated here will be a TinyVector of three vectors of vsize floats.
template < typename T , size_t _vsize >
struct vector_traits
< T ,
_vsize ,
typename std::enable_if
< vspline::is_element_expandable < T > :: value
> ::type
>
{
// T is 'element-expandable' - meaning it can be element-expanded
// with vigra's ExpandElementResult mechanism. We use that to obtain
// the elementary type and the dimension of T. Note that, if T is
// fundamental, the resulting traits are the same as they would be
// for the unspecialized case. What we're interested in here are
// multi-channel types; that fundamentals are routed through here
// is just as good as if they were routed through the unspecialized
// case above.
enum { dimension = vigra::ExpandElementResult < T > :: size } ;
typedef typename vigra::ExpandElementResult < T > :: type ele_type ;
// given the elementary type, we define nd_ele_type as a vigra::TinyVector
// of ele_type. This is the 'synthetic' type.
typedef vigra::TinyVector < ele_type , dimension > nd_ele_type ;
// next we glean the number of elements a 'vector' should contain.
// if the template argument 'vsize' was passed as 0, which is the default,
// We use the default vector size which simd_traits provides. For
// explicitly specified _vsize we take the explicitly specified value.
enum { size = _vsize == 0
? simd_traits < ele_type > :: default_size
: _vsize } ;
// I prefer to use 'vsize' as it is more specific than mere 'size'
enum { vsize = size } ;
// hardware vector register size, if applicable - only used with Vc
enum { hsize = simd_traits < T > :: hsize } ;
// now we obtain the template for a vector of a given size. This will
// be either Vc::SimdArray or vspline::simd_type
template < typename U , size_t sz >
using vector = typename simd_traits < U > :: template type < sz > ;
// using this template and the vectorization width we have established,
// we obtain the vectorized data type for a component:
typedef vector < ele_type , vsize > ele_v ;
// nd_ele_v is the 'synthetic' vectorized type, which is always a
// TinyVector of the vectorized component type, possibly with only
// one element:
typedef vigra::TinyVector < ele_v , dimension > nd_ele_v ;
// finally, 'type' is the 'canonical' vectorized type, meaning that if
// T is a fundamental we produce the component vector type itself, but if
// it is some aggregate (like a TinyVector) we produce a TinyVector of the
// component vector data type. So if T is float, 'type' is a vector of float,
// If T is a TinyVector of one float, 'type' is a TinyVector of one vector
// of float.
typedef typename std::conditional
< std::is_fundamental < T > :: value ,
ele_v ,
nd_ele_v
> :: type type ;
} ;
/// this alias is used as a shorthand to pick the vectorized type
/// for a given type T and a size N from 'vector_traits':
template < typename T , size_t N >
using simdized_type = typename vector_traits < T , N > :: type ;
// In order to avoid syntax which is specific to a specific vectorization
// method, I use some the free function 'assign' for assignments, which
// avoids member functions of the vector objects. While this produces some
// notational inconvenience, it allows a formulation which is independent
// of the vector type used. this way I can use Vc::SimdArray as a target
// of an assignment from another vectorized data type, which would be
// impossible with operator=, which has to be a member function.
// the fallback is to use an assignment via operator=. Most of the time,
// this is defined and works as expected.
template < typename T , typename U >
void assign ( T & t , const U & u )
{
t = T ( u ) ;
}
// assignment between two vigra::TinyVectors of equal size. This 'rolls
// out' the per-element assignment
template < typename T , typename U , int N >
void assign ( vigra::TinyVector < T , N > & t ,
const vigra::TinyVector < U , N > & u )
{
for ( int i = 0 ; i < N ; i++ )
vspline::assign ( t [ i ] , u [ i ] ) ;
}
// conditional assignment as a free function. This is helpful to code
// uniformly, avoiding the idiomatic difference between
// if ( p ) t = s ; and t ( p ) = s ;
template < typename VT1 , typename PT , typename VT2 >
void assign_if ( VT1 & target ,
const PT & predicate ,
const VT2 & source )
{
target ( predicate ) = source ;
}
template < typename T >
void assign_if ( T & target ,
const bool & predicate ,
const T & source )
{
if ( predicate )
target = source ;
}
} ; // end of namespace vspline
#endif // #ifndef VSPLINE_VECTOR_H
|