1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
|
/*! \page concepts_terms_and_techniques Concepts, Terms, and Techniques
This page is primarily a list of definitions and brief overview of successful
techniques previously used to develop VOLK protokernels.
## Definitions and Concepts
### SIMD
SIMD stands for Single Instruction Multiple Data. Leveraging SIMD instructions
is the primary optimization in VOLK.
### Architecture
A VOLK architecture is normally called an Instruction Set Architecture (ISA).
The architectures we target in VOLK usually have SIMD instructions.
### Vector
A vector in VOLK is the same as a C array. It sometimes, but not always
coincides with the mathematical definition of a vector.
### Kernel
The 'kernel' part of the VOLK name comes from the high performance computing
use of the word. In this context it is the inner loop of a vector operation.
Since we don't use the word vector in the math sense a vector operation is an
operation that is performed on a C array.
### Protokernel
A protokernel is an implementation of a kernel. Every kernel has a 'generic'
protokernel that is implemented in C. Other protokernels are optimized for a
particular architecture.
## Techniques
### New Kernels
Add new kernels to the list in lib/kernel_tests.h. This adds the kernel to
VOLK's QA tool as well as the volk profiler. Many kernels are able to
share test parameters, but new kernels might need new ones.
If the VOLK kernel does not 'fit' the standard set of function parameters
expected by the volk_profile structure, you need to create a VOLK puppet
function to help the profiler call the kernel. This is essentially due to the
function run_volk_tests which has a limited number of function prototypes that
it can test.
### Protokernels
Adding new proto-kernels (implementations of VOLK kernels for specific
architectures) is relatively easy. In the relevant <kernel>.h file in
the volk/include/volk/volk_<input-fingerprint_function-name_output-fingerprint>.h
file, add a new #ifdef/#endif block for the LV_HAVE_<arch> corresponding
to the <arch> you a working on (e.g. SSE, AVX, NEON, etc.).
For example, for volk_32f_s32f_multiply_32f_u_neon:
\code
#ifdef LV_HAVE_NEON
#include <arm_neon.h>
/*!
\brief Scalar float multiply
\param cVector The vector where the results will be stored
\param aVector One of the vectors to be multiplied
\param scalar the scalar value
\param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector
*/
static inline void volk_32f_s32f_multiply_32f_u_neon(float* cVector, const float* aVector, const float scalar, unsigned int num_points){
unsigned int number = 0;
const float* inputPtr = aVector;
float* outputPtr = cVector;
const unsigned int quarterPoints = num_points / 4;
float32x4_t aVal, cVal;
for(number = 0; number < num_points; number++){
aVal = vld1q_f32(inputPtr); // Load into NEON regs
cVal = vmulq_n_f32 (aVal, scalar); // Do the multiply
vst1q_f32(outputPtr, cVal); // Store results back to output
inputPtr += 8;
outputPtr += 8;
}
for(number = quarterPoints * 4; number < num_points; number++){
*outputPtr++ = (*inputPtr++) * scalar;
}
}
#endif /* LV_HAVE_NEON */
\endcode
### Allocating Memory
SIMD code can be very sensitive to the alignment of the vectors, which is
generally something like a 16-byte or 32-byte alignment requirement. The
VOLK dispatcher functions, which is what we will normally call as users of
VOLK, makes sure that the correct aligned or unaligned version is called
depending on the state of the vectors passed to it. However, things typically
work faster and more efficiently when the vectors are aligned. As such, VOLK
has memory allocate and free methods to provide us with properly aligned
vectors. We can also ask VOLK to give us the current machine's alignment
requirement, which makes our job even easier when porting code.
To get the machine's alignment, simply call the size_t volk_get_alignment().
Allocate memory using void* volk_malloc(size_t size, size_t alignment).
Make sure that any memory allocated by VOLK is also freed by VOLK with volk_free(void *p).
*/
|