1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
|
==========================
Vector Predication Roadmap
==========================
.. contents:: Table of Contents
:depth: 3
:local:
Motivation
==========
This proposal defines a roadmap towards native vector predication in LLVM,
specifically for vector instructions with a mask and/or an explicit vector
length. LLVM currently has no target-independent means to model predicated
vector instructions for modern SIMD ISAs such as AVX512, ARM SVE, the RISC-V V
extension and NEC SX-Aurora. Only some predicated vector operations, such as
masked loads and stores, are available through intrinsics [MaskedIR]_.
The Vector Predication (VP) extensions is a concrete RFC and prototype
implementation to achieve native vector predication in LLVM. The VP prototype
and all related discussions can be found in the VP patch on Phabricator
[VPRFC]_.
Roadmap
=======
1. IR-level VP intrinsics
-------------------------
- There is a consensus on the semantics/instruction set of VP.
- VP intrinsics and attributes are available on IR level.
- TTI has capability flags for VP (``supportsVP()``?,
``haveActiveVectorLength()``?).
Result: VP usable for IR-level vectorizers (LV, VPlan, RegionVectorizer),
potential integration in Clang with builtins.
2. CodeGen support
------------------
- VP intrinsics translate to first-class SDNodes
(eg ``llvm.vp.fdiv.* -> vp_fdiv``).
- VP legalization (legalize explicit vector length to mask (AVX512), legalize VP
SDNodes to pre-existing ones (SSE, NEON)).
Result: Backend development based on VP SDNodes.
3. Lift InstSimplify/InstCombine/DAGCombiner to VP
--------------------------------------------------
- Introduce PredicatedInstruction, PredicatedBinaryOperator, .. helper classes
that match standard vector IR and VP intrinsics.
- Add a matcher context to PatternMatch and context-aware IR Builder APIs.
- Incrementally lift DAGCombiner to work on VP SDNodes as well as on regular
vector instructions.
- Incrementally lift InstCombine/InstSimplify to operate on VP as well as
regular IR instructions.
Result: Optimization of VP intrinsics on par with standard vector instructions.
4. Deprecate llvm.masked.* / llvm.experimental.reduce.*
-------------------------------------------------------
- Modernize llvm.masked.* / llvm.experimental.reduce* by translating to VP.
- DCE transitional APIs.
Result: VP has superseded earlier vector intrinsics.
5. Predicated IR Instructions
-----------------------------
- Vector instructions have an optional mask and vector length parameter. These
lower to VP SDNodes (from Stage 2).
- Phase out VP intrinsics, only keeping those that are not equivalent to
vectorized scalar instructions (reduce, shuffles, ..)
- InstCombine/InstSimplify expect predication in regular Instructions (Stage (3)
has laid the groundwork).
Result: Native vector predication in IR.
References
==========
.. [MaskedIR] `llvm.masked.*` intrinsics,
https://llvm.org/docs/LangRef.html#masked-vector-load-and-store-intrinsics
.. [VPRFC] RFC: Prototype & Roadmap for vector predication in LLVM,
https://reviews.llvm.org/D57504
|