File: vectorized_code.rst

package info (click to toggle)
xsimd 13.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,716 kB
  • sloc: cpp: 36,557; sh: 541; makefile: 184; python: 117
file content (73 lines) | stat: -rw-r--r-- 3,877 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
.. Copyright (c) 2016, Johan Mabille and Sylvain Corlay

   Distributed under the terms of the BSD 3-Clause License.

   The full license is in the file LICENSE, distributed with this software.

Writing vectorized code
=======================

Assume that we have a simple function that computes the mean of two vectors, something like:

.. literalinclude:: ../../test/doc/writing_vectorized_code.cpp

How can we use `xsimd` to take advantage of vectorization?

Explicit use of an instruction set
----------------------------------

`xsimd` provides the template class :cpp:class:`xsimd::batch` parametrized by ``T`` and ``A`` types where ``T`` is the type of the values involved in SIMD
instructions and ``A`` is the target architecture. If you know which instruction set is available on your machine, you can directly use the corresponding specialization
of ``batch``. For instance, assuming the AVX instruction set is available, the previous code can be vectorized the following way:

.. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean.cpp


However, if you want to write code that is portable, you cannot rely on the use of ``batch<double, xsimd::avx>``.
Indeed this won't compile on a CPU where only SSE2 instruction set is available for instance. Fortunately, if you don't set the second template parameter, `xsimd` picks the best architecture among the one available, based on the compiler flag you use.


Aligned vs unaligned memory
---------------------------

In the previous example, you may have noticed the :cpp:func:`xsimd::batch::load_unaligned` and :cpp:func:`xsimd::batch::store_unaligned` functions. These
are meant for loading values from contiguous dynamically allocated memory into SIMD registers and
reciprocally. When dealing with memory transfer operations, some instructions sets required the memory
to be aligned by a given amount, others can handle both aligned and unaligned modes. In that latter case,
operating on aligned memory is generally faster than operating on unaligned memory.

`xsimd` provides an aligned memory allocator, namely :cpp:class:`xsimd::aligned_allocator` which follows the standard requirements, so it can be used
with STL containers. Let's change the previous code so it can take advantage of this allocator:

.. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean_aligned.cpp


Memory alignment and tag dispatching
------------------------------------

You may need to write code that can operate on any type of vectors or arrays, not only the STL ones. In that
case, you cannot make assumption on the memory alignment of the container. `xsimd` provides a tag dispatching
mechanism that allows you to easily write such a generic code:


.. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean_tag_dispatch.cpp


Here, the ``Tag`` template parameter can be :cpp:class:`xsimd::aligned_mode` or :cpp:class:`xsimd::unaligned_mode`. Assuming the existence
of a ``get_alignment_tag`` meta-function in the code, the previous code can be invoked this way:

.. code::

    mean(a, b, res, get_alignment_tag<decltype(a)>());

Writing arch-independent code
-----------------------------

If your code may target either SSE2, AVX2 or AVX512 instruction set, `xsimd`
make it possible to make your code even more generic by using the architecture
as a template parameter:

.. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean_arch_independent.cpp

This can be useful to implement runtime dispatching, based on the instruction set detected at runtime. `xsimd` provides a generic machinery :cpp:func:`xsimd::dispatch()` to implement
this pattern. Based on the above example, instead of calling ``mean{}(arch, a, b, res, tag)``, one can use ``xsimd::dispatch(mean{})(a, b, res, tag)``. More about this can be found in the :ref:`Arch Dispatching` section.