1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
|
/*
* Copyright (C) 2018-2025 Intel Corporation
*
* SPDX-License-Identifier: MIT
*
*/
#pragma once
#include "shared/source/command_stream/linear_stream.h"
#include "shared/source/command_stream/preemption.h"
#include "shared/source/helpers/register_offsets.h"
#include "shared/source/indirect_heap/indirect_heap.h"
#include "opencl/source/command_queue/cl_local_work_size.h"
#include "opencl/source/command_queue/command_queue.h"
#include "opencl/source/helpers/hardware_commands_helper.h"
namespace NEO {
class Surface;
struct RootDeviceEnvironment;
template <typename GfxFamily>
using MI_STORE_REG_MEM = typename GfxFamily::MI_STORE_REGISTER_MEM_CMD;
struct FlushL3Args {
bool containsPrintBuffer;
bool usingSharedObjects;
bool signalEvent;
bool blocking;
bool usingSystemAllocation;
};
template <typename GfxFamily>
class GpgpuWalkerHelper {
using DefaultWalkerType = typename GfxFamily::DefaultWalkerType;
public:
static size_t getSizeForWaDisableRccRhwoOptimization(const Kernel *pKernel);
template <typename WalkerType>
static size_t setGpgpuWalkerThreadData(
WalkerType *walkerCmd,
const KernelDescriptor &kernelDescriptor,
const size_t startWorkGroups[3],
const size_t numWorkGroups[3],
const size_t localWorkSizesIn[3],
uint32_t simd,
uint32_t workDim,
bool localIdsGenerationByRuntime,
bool inlineDataProgrammingRequired,
uint32_t requiredWorkgroupOrder);
static void dispatchProfilingCommandsStart(
TagNodeBase &hwTimeStamps,
LinearStream *commandStream,
const RootDeviceEnvironment &rootDeviceEnvironment);
static void dispatchProfilingCommandsEnd(
TagNodeBase &hwTimeStamps,
LinearStream *commandStream,
const RootDeviceEnvironment &rootDeviceEnvironment);
static void dispatchPerfCountersCommandsStart(
CommandQueue &commandQueue, TagNodeBase &hwPerfCounter,
LinearStream *commandStream);
static void dispatchPerfCountersCommandsEnd(
CommandQueue &commandQueue,
TagNodeBase &hwPerfCounter,
LinearStream *commandStream);
template <typename WalkerType>
static void setupTimestampPacket(
LinearStream *cmdStream,
WalkerType *walkerCmd,
TagNodeBase *timestampPacketNode,
const RootDeviceEnvironment &rootDeviceEnvironment);
template <typename WalkerType>
static void setupTimestampPacketFlushL3(
WalkerType &walkerCmd, CommandQueue &commandQueue, const FlushL3Args &args);
static void adjustMiStoreRegMemMode(MI_STORE_REG_MEM<GfxFamily> *storeCmd);
private:
using PIPE_CONTROL = typename GfxFamily::PIPE_CONTROL;
template <typename WalkerType>
static void setSystolicModeEnable(WalkerType *walkerCmd);
};
template <typename GfxFamily>
struct EnqueueOperation {
using PIPE_CONTROL = typename GfxFamily::PIPE_CONTROL;
static size_t getTotalSizeRequiredCS(uint32_t eventType, const CsrDependencies &csrDeps, bool reserveProfilingCmdsSpace, bool reservePerfCounters, bool blitEnqueue, CommandQueue &commandQueue, const MultiDispatchInfo &multiDispatchInfo, bool isMarkerWithProfiling, bool eventsInWaitList, bool resolveDependenciesByPipecontrol, cl_event *outEvent);
static size_t getSizeRequiredCS(uint32_t cmdType, bool reserveProfilingCmdsSpace, bool reservePerfCounters, CommandQueue &commandQueue, const Kernel *pKernel, const DispatchInfo &dispatchInfo);
static size_t getSizeRequiredForTimestampPacketWrite();
static size_t getSizeForCacheFlushAfterWalkerCommands(const Kernel &kernel, const CommandQueue &commandQueue);
private:
template <typename WalkerType>
static size_t getSizeRequiredCSKernel(bool reserveProfilingCmdsSpace, bool reservePerfCounters, CommandQueue &commandQueue, const Kernel *pKernel, const DispatchInfo &dispatchInfo);
static size_t getSizeRequiredCSNonKernel(bool reserveProfilingCmdsSpace, bool reservePerfCounters, CommandQueue &commandQueue);
};
template <typename GfxFamily, uint32_t eventType>
LinearStream &getCommandStream(CommandQueue &commandQueue, const CsrDependencies &csrDeps, bool reserveProfilingCmdsSpace,
bool reservePerfCounterCmdsSpace, bool blitEnqueue, const MultiDispatchInfo &multiDispatchInfo,
Surface **surfaces, size_t numSurfaces, bool isMarkerWithProfiling, bool eventsInWaitList, bool resolveDependenciesByPipecontrol, cl_event *outEvent) {
size_t expectedSizeCS = EnqueueOperation<GfxFamily>::getTotalSizeRequiredCS(eventType, csrDeps, reserveProfilingCmdsSpace, reservePerfCounterCmdsSpace, blitEnqueue, commandQueue, multiDispatchInfo, isMarkerWithProfiling, eventsInWaitList, resolveDependenciesByPipecontrol, outEvent);
return commandQueue.getCS(expectedSizeCS);
}
template <typename GfxFamily, IndirectHeap::Type heapType>
IndirectHeap &getIndirectHeap(CommandQueue &commandQueue, const MultiDispatchInfo &multiDispatchInfo) {
size_t expectedSize = 0;
IndirectHeap *ih = nullptr;
// clang-format off
switch (heapType) {
case IndirectHeap::Type::dynamicState: expectedSize = HardwareCommandsHelper<GfxFamily>::getTotalSizeRequiredDSH(multiDispatchInfo); break;
case IndirectHeap::Type::indirectObject: expectedSize = HardwareCommandsHelper<GfxFamily>::getTotalSizeRequiredIOH(multiDispatchInfo); break;
case IndirectHeap::Type::surfaceState: expectedSize = HardwareCommandsHelper<GfxFamily>::getTotalSizeRequiredSSH(multiDispatchInfo); break;
}
// clang-format on
if (ih == nullptr) {
ih = &commandQueue.getIndirectHeap(heapType, expectedSize);
}
return *ih;
}
} // namespace NEO
|