1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307
|
/*========================== begin_copyright_notice ============================
Copyright (C) 2017-2021 Intel Corporation
SPDX-License-Identifier: MIT
============================= end_copyright_notice ===========================*/
#pragma once
#include "Compiler/CISACodeGen/helper.h"
#include "Compiler/CISACodeGen/TranslationTable.hpp"
#include "Compiler/CISACodeGen/ShaderCodeGen.hpp"
#include "Compiler/CISACodeGen/WIAnalysis.hpp"
#include "Compiler/MetaDataUtilsWrapper.h"
#include "Compiler/MetaDataApi/MetaDataApi.h"
#include "common/LLVMWarningsPush.hpp"
#include <llvm/IR/PassManager.h>
#include <llvmWrapper/IR/IRBuilder.h>
#include <llvm/ADT/SmallBitVector.h>
#include <llvmWrapper/Transforms/Utils.h>
#include "common/LLVMWarningsPop.hpp"
#include "common/IGCIRBuilder.h"
#include <map>
/// @brief ConstantCoalescing merges multiple constant loads into one load
/// of larger quantity
/// - change to oword loads if the address is uniform
/// - change to gather4 or sampler loads if the address is not uniform
using namespace llvm;
namespace IGC
{
struct BufChunk
{
llvm::Value* bufIdxV; // buffer index when it is indirect
llvm::Value* baseIdxV; // base-address index when it is indirect
uint addrSpace; // resource address space when it is direct
uint elementSize; // size in bytes of the basic data element
uint chunkStart; // offset of the first data element in chunk in units of elementSize
uint chunkSize; // chunk size in elements
llvm::Instruction* chunkIO; // coalesced load
uint loadOrder; // direct CB used order.
};
class ConstantCoalescing : public llvm::FunctionPass
{
public:
static char ID;
ConstantCoalescing();
virtual void getAnalysisUsage(llvm::AnalysisUsage& AU) const override
{
AU.setPreservesCFG();
AU.addPreservedID(WIAnalysis::ID);
AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<WIAnalysis>();
AU.addRequired<MetaDataUtilsWrapper>();
AU.addRequired<CodeGenContextWrapper>();
AU.addRequired<TranslationTable>();
AU.addPreservedID(TranslationTable::ID);
}
void ProcessBlock(llvm::BasicBlock* blk,
std::vector<BufChunk*>& dircb_owlds,
std::vector<BufChunk*>& indcb_owlds,
std::vector<BufChunk*>& indcb_gathers);
void ProcessFunction(llvm::Function* function);
void FindAllDirectCB(llvm::BasicBlock* blk,
std::vector<BufChunk*>& dircb_owloads);
virtual bool runOnFunction(llvm::Function& func) override;
private:
enum ExtensionKind {
EK_NotExtended,
EK_SignExt,
EK_ZeroExt,
};
class IRBuilderWrapper : protected llvm::IGCIRBuilder<>
{
public:
IRBuilderWrapper(LLVMContext& C, TranslationTable* pTT) : llvm::IGCIRBuilder<>(C), m_TT(pTT)
{
}
/// \brief Get a constant 32-bit value.
ConstantInt* getInt32(uint32_t C) {
return IGCIRBuilder<>::getInt32(C);
}
ConstantInt* getFalse() {
return IGCIRBuilder<>::getFalse();
}
using IGCIRBuilder<>::getCurrentDebugLocation;
using IGCIRBuilder<>::getInt32Ty;
using IGCIRBuilder<>::getFloatTy;
using IGCIRBuilder<>::SetInsertPoint;
//Instruction creators:
Value* CreateAdd(Value* LHS, Value* RHS, const Twine& Name = "",
bool HasNUW = false, bool HasNSW = false) {
Value* val = IGCIRBuilder<>::CreateAdd(LHS, RHS, Name, HasNUW, HasNSW);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateOr(Value* LHS, Value* RHS, const Twine& Name = "") {
Value* val = IGCIRBuilder<>::CreateOr(LHS, RHS, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateAnd(Value* LHS, Value* RHS, const Twine& Name = "") {
Value* val = IRBuilder<>::CreateAnd(LHS, RHS, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreatePtrToInt(Value* V, Type* DestTy,
const Twine& Name = "") {
Value* val = IGCIRBuilder<>::CreatePtrToInt(V, DestTy, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
LoadInst* CreateLoad(Value* Ptr, const char* Name) {
LoadInst* val = IGCIRBuilder<>::CreateLoad(Ptr, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
LoadInst* CreateLoad(Value* Ptr, const Twine& Name = "") {
LoadInst* val = IGCIRBuilder<>::CreateLoad(Ptr, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
LoadInst* CreateLoad(Value* Ptr, bool isVolatile, const Twine& Name = "") {
LoadInst* val = IGCIRBuilder<>::CreateLoad(Ptr, isVolatile, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
CallInst* CreateCall2(Value* Callee, Value* Arg1, Value* Arg2,
const Twine& Name = "") {
CallInst* val = IGCIRBuilder<>::CreateCall2(Callee, Arg1, Arg2, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateMul(Value* LHS, Value* RHS, const Twine& Name = "",
bool HasNUW = false, bool HasNSW = false) {
Value* val = IGCIRBuilder<>::CreateMul(LHS, RHS, Name, HasNUW, HasNSW);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateShl(Value* LHS, Value* RHS, const Twine& Name = "",
bool HasNUW = false, bool HasNSW = false) {
Value* val = IGCIRBuilder<>::CreateShl(LHS, RHS, Name, HasNUW, HasNSW);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateLShr(Value* LHS, Value* RHS, const Twine& Name = "",
bool isExact = false) {
Value* val = IGCIRBuilder<>::CreateLShr(LHS, RHS, Name, isExact);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateBitCast(Value* V, Type* DestTy,
const Twine& Name = "") {
Value* val = IGCIRBuilder<>::CreateBitCast(V, DestTy, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateZExt(Value* V, Type* DestTy, const Twine& Name = "") {
Value* val = IRBuilder<>::CreateZExt(V, DestTy, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateSExt(Value* V, Type* DestTy, const Twine& Name = "") {
Value* val = IRBuilder<>::CreateSExt(V, DestTy, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateZExtOrTrunc(Value* V, Type* DestTy, const Twine& Name = "") {
Value* val = IRBuilder<>::CreateZExtOrTrunc(V, DestTy, Name);
if (val != V)
{
m_TT->RegisterNewValueAndAssignID(val);
}
return val;
}
Value* CreateExtractElement(Value* Vec, Value* Idx,
const Twine& Name = "") {
Value* val = IGCIRBuilder<>::CreateExtractElement(Vec, Idx, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
CallInst* CreateCall(Value* Callee, ArrayRef<Value*> Args,
const Twine& Name = "") {
CallInst* val = IGCIRBuilder<>::CreateCall(Callee, Args, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
Value* CreateInsertElement(Value* Vec, Value* NewElt, Value* Idx,
const Twine& Name = "") {
Value* val = IGCIRBuilder<>::CreateInsertElement(Vec, NewElt, Idx, Name);
m_TT->RegisterNewValueAndAssignID(val);
return val;
}
private:
TranslationTable* m_TT;
};
CodeGenContext* m_ctx;
llvm::Function* curFunc;
// agent to modify the llvm-ir
IRBuilderWrapper* irBuilder;
// maintain the uniformness info
WIAnalysis* wiAns;
const llvm::DataLayout* dataLayout;
TranslationTable* m_TT;
/// Examines the uniformity of the load and the number of used elements
/// to determine whether we should try to merge it.
bool isProfitableLoad(const Instruction* I, uint32_t &MaxEltPlus) const;
/// Is this a chunk we should be creating?
static bool profitableChunkSize(uint32_t ub, uint32_t lb, uint32_t eltSizeInBytes);
static bool profitableChunkSize(uint32_t chunkSize, uint32_t eltSizeInBytes);
/// check if two access have the same buffer-base
static bool CompareBufferBase(const llvm::Value* bufIdxV1, uint bufid1, const llvm::Value* bufIdxV2, uint bufid2);
/// find element base and element imm-offset
llvm::Value* SimpleBaseOffset(llvm::Value* elt_idxv, uint& offset, ExtensionKind &Extension);
/// finds the minimum power-of-2 alignment for an offset in buffer
uint GetOffsetAlignment(llvm::Value* val) const;
/// used along ocl path, based upon int2ptr
bool DecomposePtrExp(
llvm::Value* ptr_val, llvm::Value*& buf_idxv,
llvm::Value*& elt_idxv, uint& eltid, ExtensionKind &Extension);
static uint CheckVectorElementUses(const llvm::Instruction* load);
void AdjustChunk(BufChunk* cov_chunk, uint start_adj, uint size_adj, const ExtensionKind &Extension);
void EnlargeChunk(BufChunk* cov_chunk, uint size_adj);
void MoveExtracts(BufChunk* cov_chunk, llvm::Instruction* load, uint start_adj);
llvm::Value* FormChunkAddress(BufChunk* chunk, const ExtensionKind &Extension);
void CombineTwoLoads(BufChunk* cov_chunk, llvm::Instruction* load, uint eltid, uint numelt, const ExtensionKind &Extension);
llvm::Instruction* CreateChunkLoad(
llvm::Instruction* load, BufChunk* chunk, uint eltid, uint alignment, const ExtensionKind &Extension);
llvm::Instruction* AddChunkExtract(llvm::Instruction* load, uint offset);
llvm::Instruction* FindOrAddChunkExtract(BufChunk* cov_chunk, uint eltid);
llvm::Instruction* EnlargeChunkAddExtract(BufChunk* cov_chunk, uint size_adj, uint eltid);
llvm::Instruction* AdjustChunkAddExtract(
BufChunk* cov_chunk, uint start_adj, uint size_adj, uint eltid, const ExtensionKind &Extension);
llvm::Instruction* CreateSamplerLoad(llvm::Value* index, llvm::Value* resourcePtr, uint addrSpace);
void ReplaceLoadWithSamplerLoad(
Instruction* loadToReplace,
Instruction* ldData,
uint offsetInBytes);
void MergeUniformLoad(llvm::Instruction* load,
llvm::Value* bufIdxV, uint addrSpace,
llvm::Value* eltIdxV, uint offsetInBytes,
uint maxEltPlus,
const ExtensionKind &Extension,
std::vector<BufChunk*>& chunk_vec);
void MergeScatterLoad(llvm::Instruction* load,
llvm::Value* bufIdxV, uint bufid,
llvm::Value* eltIdxV, uint offsetInBytes,
uint maxEltPlus,
const ExtensionKind &Extension,
std::vector<BufChunk*>& chunk_vec);
void ScatterToSampler(llvm::Instruction* load,
llvm::Value* bufIdxV, uint bufid,
llvm::Value* eltIdxV, uint eltid,
std::vector<BufChunk*>& chunk_vec);
bool CleanupExtract(llvm::BasicBlock* bb);
void VectorizePrep(llvm::BasicBlock* bb);
bool safeToMoveInstUp(Instruction* inst, Instruction* newLocation);
bool IsSamplerAlignedAddress(Value* addr) const;
Value* GetSamplerAlignedAddress(Value* inst);
uint GetAlignment(Instruction* load) const;
void SetAlignment(Instruction* load, uint alignment);
};
}
|