1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007
|
//===- PluginInterface.h - Target independent plugin device interface -----===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
#ifndef OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_PLUGININTERFACE_H
#define OPENMP_LIBOMPTARGET_PLUGINS_NEXTGEN_COMMON_PLUGININTERFACE_H
#include <cstddef>
#include <cstdint>
#include <list>
#include <map>
#include <shared_mutex>
#include <vector>
#include "Debug.h"
#include "DeviceEnvironment.h"
#include "GlobalHandler.h"
#include "JIT.h"
#include "MemoryManager.h"
#include "Utilities.h"
#include "omptarget.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Triple.h"
#include "llvm/Frontend/OpenMP/OMPConstants.h"
#include "llvm/Frontend/OpenMP/OMPGridValues.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MemoryBufferRef.h"
namespace llvm {
namespace omp {
namespace target {
namespace plugin {
struct GenericPluginTy;
struct GenericKernelTy;
struct GenericDeviceTy;
/// Class that wraps the __tgt_async_info to simply its usage. In case the
/// object is constructed without a valid __tgt_async_info, the object will use
/// an internal one and will synchronize the current thread with the pending
/// operations on object destruction.
struct AsyncInfoWrapperTy {
AsyncInfoWrapperTy(Error &Err, GenericDeviceTy &Device,
__tgt_async_info *AsyncInfoPtr)
: Err(Err), ErrOutParam(&Err), Device(Device),
AsyncInfoPtr(AsyncInfoPtr ? AsyncInfoPtr : &LocalAsyncInfo) {}
/// Synchronize with the __tgt_async_info's pending operations if it's the
/// internal one.
~AsyncInfoWrapperTy();
/// Get the raw __tgt_async_info pointer.
operator __tgt_async_info *() const { return AsyncInfoPtr; }
/// Get a reference to the underlying plugin-specific queue type.
template <typename Ty> Ty &getQueueAs() const {
static_assert(sizeof(Ty) == sizeof(AsyncInfoPtr->Queue),
"Queue is not of the same size as target type");
return reinterpret_cast<Ty &>(AsyncInfoPtr->Queue);
}
/// Indicate whether there is queue.
bool hasQueue() const { return (AsyncInfoPtr->Queue != nullptr); }
private:
Error &Err;
ErrorAsOutParameter ErrOutParam;
GenericDeviceTy &Device;
__tgt_async_info LocalAsyncInfo;
__tgt_async_info *const AsyncInfoPtr;
};
/// Class wrapping a __tgt_device_image and its offload entry table on a
/// specific device. This class is responsible for storing and managing
/// the offload entries for an image on a device.
class DeviceImageTy {
/// Class representing the offload entry table. The class stores the
/// __tgt_target_table and a map to search in the table faster.
struct OffloadEntryTableTy {
/// Add new entry to the table.
void addEntry(const __tgt_offload_entry &Entry) {
Entries.push_back(Entry);
TTTablePtr.EntriesBegin = &Entries[0];
TTTablePtr.EntriesEnd = TTTablePtr.EntriesBegin + Entries.size();
}
/// Get the raw pointer to the __tgt_target_table.
operator __tgt_target_table *() {
if (Entries.empty())
return nullptr;
return &TTTablePtr;
}
private:
__tgt_target_table TTTablePtr;
llvm::SmallVector<__tgt_offload_entry> Entries;
};
/// Image identifier within the corresponding device. Notice that this id is
/// not unique between different device; they may overlap.
int32_t ImageId;
/// The pointer to the raw __tgt_device_image.
const __tgt_device_image *TgtImage;
const __tgt_device_image *TgtImageBitcode;
/// Table of offload entries.
OffloadEntryTableTy OffloadEntryTable;
public:
DeviceImageTy(int32_t Id, const __tgt_device_image *Image)
: ImageId(Id), TgtImage(Image), TgtImageBitcode(nullptr) {
assert(TgtImage && "Invalid target image");
}
/// Get the image identifier within the device.
int32_t getId() const { return ImageId; }
/// Get the pointer to the raw __tgt_device_image.
const __tgt_device_image *getTgtImage() const { return TgtImage; }
void setTgtImageBitcode(const __tgt_device_image *TgtImageBitcode) {
this->TgtImageBitcode = TgtImageBitcode;
}
const __tgt_device_image *getTgtImageBitcode() const {
return TgtImageBitcode;
}
/// Get the image starting address.
void *getStart() const { return TgtImage->ImageStart; }
/// Get the image size.
size_t getSize() const {
return getPtrDiff(TgtImage->ImageEnd, TgtImage->ImageStart);
}
/// Get a memory buffer reference to the whole image.
MemoryBufferRef getMemoryBuffer() const {
return MemoryBufferRef(StringRef((const char *)getStart(), getSize()),
"Image");
}
/// Get a reference to the offload entry table for the image.
OffloadEntryTableTy &getOffloadEntryTable() { return OffloadEntryTable; }
};
/// Class implementing common functionalities of offload kernels. Each plugin
/// should define the specific kernel class, derive from this generic one, and
/// implement the necessary virtual function members.
struct GenericKernelTy {
/// Construct a kernel with a name and a execution mode.
GenericKernelTy(const char *Name, OMPTgtExecModeFlags ExecutionMode)
: Name(Name), ExecutionMode(ExecutionMode),
PreferredNumThreads(0), MaxNumThreads(0) {}
virtual ~GenericKernelTy() {}
/// Initialize the kernel object from a specific device.
Error init(GenericDeviceTy &GenericDevice, DeviceImageTy &Image);
virtual Error initImpl(GenericDeviceTy &GenericDevice,
DeviceImageTy &Image) = 0;
/// Launch the kernel on the specific device. The device must be the same
/// one used to initialize the kernel.
Error launch(GenericDeviceTy &GenericDevice, void **ArgPtrs,
ptrdiff_t *ArgOffsets, KernelArgsTy &KernelArgs,
AsyncInfoWrapperTy &AsyncInfoWrapper) const;
virtual Error launchImpl(GenericDeviceTy &GenericDevice, uint32_t NumThreads,
uint64_t NumBlocks,
KernelArgsTy &KernelArgs, void *Args,
AsyncInfoWrapperTy &AsyncInfoWrapper) const = 0;
/// Get the kernel name.
const char *getName() const { return Name; }
/// Indicate whether an execution mode is valid.
static bool isValidExecutionMode(OMPTgtExecModeFlags ExecutionMode) {
switch (ExecutionMode) {
case OMP_TGT_EXEC_MODE_SPMD:
case OMP_TGT_EXEC_MODE_GENERIC:
case OMP_TGT_EXEC_MODE_GENERIC_SPMD:
return true;
}
return false;
}
private:
/// Prepare the arguments before launching the kernel.
void *prepareArgs(GenericDeviceTy &GenericDevice, void **ArgPtrs,
ptrdiff_t *ArgOffsets, int32_t NumArgs,
llvm::SmallVectorImpl<void *> &Args,
llvm::SmallVectorImpl<void *> &Ptrs,
AsyncInfoWrapperTy &AsyncInfoWrapper) const;
/// Get the default number of threads and blocks for the kernel.
virtual uint32_t getDefaultNumThreads(GenericDeviceTy &Device) const = 0;
virtual uint32_t getDefaultNumBlocks(GenericDeviceTy &Device) const = 0;
/// Get the number of threads and blocks for the kernel based on the
/// user-defined threads and block clauses.
uint32_t getNumThreads(GenericDeviceTy &GenericDevice,
uint32_t ThreadLimitClause[3]) const;
uint64_t getNumBlocks(GenericDeviceTy &GenericDevice,
uint32_t BlockLimitClause[3], uint64_t LoopTripCount,
uint32_t NumThreads) const;
/// Indicate if the kernel works in Generic SPMD, Generic or SPMD mode.
bool isGenericSPMDMode() const {
return ExecutionMode == OMP_TGT_EXEC_MODE_GENERIC_SPMD;
}
bool isGenericMode() const {
return ExecutionMode == OMP_TGT_EXEC_MODE_GENERIC;
}
bool isSPMDMode() const { return ExecutionMode == OMP_TGT_EXEC_MODE_SPMD; }
/// Get the execution mode name of the kernel.
const char *getExecutionModeName() const {
switch (ExecutionMode) {
case OMP_TGT_EXEC_MODE_SPMD:
return "SPMD";
case OMP_TGT_EXEC_MODE_GENERIC:
return "Generic";
case OMP_TGT_EXEC_MODE_GENERIC_SPMD:
return "Generic-SPMD";
}
llvm_unreachable("Unknown execution mode!");
}
/// The kernel name.
const char *Name;
/// The execution flags of the kernel.
OMPTgtExecModeFlags ExecutionMode;
protected:
/// The preferred number of threads to run the kernel.
uint32_t PreferredNumThreads;
/// The maximum number of threads which the kernel could leverage.
uint32_t MaxNumThreads;
};
/// Class representing a map of host pinned allocations. We track these pinned
/// allocations, so memory tranfers invloving these buffers can be optimized.
class PinnedAllocationMapTy {
/// Struct representing a map entry.
struct EntryTy {
/// The host pointer of the pinned allocation.
void *HstPtr;
/// The pointer that devices' driver should use to transfer data from/to the
/// pinned allocation. In most plugins, this pointer will be the same as the
/// host pointer above.
void *DevAccessiblePtr;
/// The size of the pinned allocation.
size_t Size;
/// The number of references to the pinned allocation. The allocation should
/// remain pinned and registered to the map until the number of references
/// becomes zero.
mutable size_t References;
/// Create an entry with the host and device acessible pointers, and the
/// buffer size.
EntryTy(void *HstPtr, void *DevAccessiblePtr, size_t Size)
: HstPtr(HstPtr), DevAccessiblePtr(DevAccessiblePtr), Size(Size),
References(1) {}
/// Utility constructor used for std::set searches.
EntryTy(void *HstPtr)
: HstPtr(HstPtr), DevAccessiblePtr(nullptr), Size(0), References(0) {}
};
/// Comparator of mep entries. Use the host pointer to enforce an order
/// between entries.
struct EntryCmpTy {
bool operator()(const EntryTy &Left, const EntryTy &Right) const {
return Left.HstPtr < Right.HstPtr;
}
};
typedef std::set<EntryTy, EntryCmpTy> PinnedAllocSetTy;
/// The map of host pinned allocations.
PinnedAllocSetTy Allocs;
/// The mutex to protect accesses to the map.
mutable std::shared_mutex Mutex;
/// Reference to the corresponding device.
GenericDeviceTy &Device;
/// Find an allocation that intersects with \p Buffer pointer. Assume
/// the map's mutex is acquired.
PinnedAllocSetTy::iterator findIntersecting(const void *Buffer) const {
if (Allocs.empty())
return Allocs.end();
// Search the first allocation with starting address that is not less than
// the buffer address.
auto It = Allocs.lower_bound({const_cast<void *>(Buffer)});
// Direct match of starting addresses.
if (It != Allocs.end() && It->HstPtr == Buffer)
return It;
// Not direct match but may be a previous pinned allocation in the map which
// contains the buffer. Return false if there is no such a previous
// allocation.
if (It == Allocs.begin())
return Allocs.end();
// Move to the previous pinned allocation.
--It;
// The buffer is not contained in the pinned allocation.
if (advanceVoidPtr(It->HstPtr, It->Size) > Buffer)
return It;
// None found.
return Allocs.end();
}
public:
/// Create the map of pinned allocations corresponding to a specific device.
PinnedAllocationMapTy(GenericDeviceTy &Device) : Device(Device) {}
/// Register a host buffer that was recently locked. None of the already
/// registered pinned allocations should intersect with this new one. The
/// registration requires the host pointer in \p HstPtr, the pointer that the
/// devices should use when transferring data from/to the allocation in
/// \p DevAccessiblePtr, and the size of the allocation in \p Size. Notice
/// that some plugins may use the same pointer for the \p HstPtr and
/// \p DevAccessiblePtr. The allocation must be unregistered using the
/// unregisterHostBuffer function.
Error registerHostBuffer(void *HstPtr, void *DevAccessiblePtr, size_t Size);
/// Unregister a host pinned allocation passing the host pointer which was
/// previously registered using the registerHostBuffer function. When calling
/// this function, the pinned allocation cannot have any other user.
Error unregisterHostBuffer(void *HstPtr);
/// Lock the host buffer at \p HstPtr or register a new user if it intersects
/// with an already existing one. A partial overlapping with extension is not
/// allowed. The function returns the device accessible pointer of the pinned
/// buffer. The buffer must be unlocked using the unlockHostBuffer function.
Expected<void *> lockHostBuffer(void *HstPtr, size_t Size);
/// Unlock the host buffer at \p HstPtr or unregister a user if other users
/// are still using the pinned allocation. If this was the last user, the
/// pinned allocation is removed from the map and the memory is unlocked.
Error unlockHostBuffer(void *HstPtr);
/// Return the device accessible pointer associated to the host pinned
/// allocation which the \p HstPtr belongs, if any. Return null in case the
/// \p HstPtr does not belong to any host pinned allocation. The device
/// accessible pointer is the one that devices should use for data transfers
/// that involve a host pinned buffer.
void *getDeviceAccessiblePtrFromPinnedBuffer(const void *HstPtr) const {
std::shared_lock<std::shared_mutex> Lock(Mutex);
// Find the intersecting allocation if any.
auto It = findIntersecting(HstPtr);
if (It == Allocs.end())
return nullptr;
const EntryTy &Entry = *It;
return advanceVoidPtr(Entry.DevAccessiblePtr,
getPtrDiff(HstPtr, Entry.HstPtr));
}
/// Check whether a buffer belongs to a registered host pinned allocation.
bool isHostPinnedBuffer(const void *HstPtr) const {
std::shared_lock<std::shared_mutex> Lock(Mutex);
// Return whether there is an intersecting allocation.
return (findIntersecting(const_cast<void *>(HstPtr)) != Allocs.end());
}
};
/// Class implementing common functionalities of offload devices. Each plugin
/// should define the specific device class, derive from this generic one, and
/// implement the necessary virtual function members.
struct GenericDeviceTy : public DeviceAllocatorTy {
/// Construct a device with its device id within the plugin, the number of
/// devices in the plugin and the grid values for that kind of device.
GenericDeviceTy(int32_t DeviceId, int32_t NumDevices,
const llvm::omp::GV &GridValues);
/// Get the device identifier within the corresponding plugin. Notice that
/// this id is not unique between different plugins; they may overlap.
int32_t getDeviceId() const { return DeviceId; }
/// Set the context of the device if needed, before calling device-specific
/// functions. Plugins may implement this function as a no-op if not needed.
virtual Error setContext() = 0;
/// Initialize the device. After this call, the device should be already
/// working and ready to accept queries or modifications.
Error init(GenericPluginTy &Plugin);
virtual Error initImpl(GenericPluginTy &Plugin) = 0;
/// Deinitialize the device and free all its resources. After this call, the
/// device is no longer considered ready, so no queries or modifications are
/// allowed.
Error deinit();
virtual Error deinitImpl() = 0;
/// Load the binary image into the device and return the target table.
Expected<__tgt_target_table *> loadBinary(GenericPluginTy &Plugin,
const __tgt_device_image *TgtImage);
virtual Expected<DeviceImageTy *>
loadBinaryImpl(const __tgt_device_image *TgtImage, int32_t ImageId) = 0;
/// Setup the device environment if needed. Notice this setup may not be run
/// on some plugins. By default, it will be executed, but plugins can change
/// this behavior by overriding the shouldSetupDeviceEnvironment function.
Error setupDeviceEnvironment(GenericPluginTy &Plugin, DeviceImageTy &Image);
/// Register the offload entries for a specific image on the device.
Error registerOffloadEntries(DeviceImageTy &Image);
/// Synchronize the current thread with the pending operations on the
/// __tgt_async_info structure.
Error synchronize(__tgt_async_info *AsyncInfo);
virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
/// Query for the completion of the pending operations on the __tgt_async_info
/// structure in a non-blocking manner.
Error queryAsync(__tgt_async_info *AsyncInfo);
virtual Error queryAsyncImpl(__tgt_async_info &AsyncInfo) = 0;
/// Allocate data on the device or involving the device.
Expected<void *> dataAlloc(int64_t Size, void *HostPtr, TargetAllocTy Kind);
/// Deallocate data from the device or involving the device.
Error dataDelete(void *TgtPtr, TargetAllocTy Kind);
/// Pin host memory to optimize transfers and return the device accessible
/// pointer that devices should use for memory transfers involving the host
/// pinned allocation.
Expected<void *> dataLock(void *HstPtr, int64_t Size) {
return PinnedAllocs.lockHostBuffer(HstPtr, Size);
}
virtual Expected<void *> dataLockImpl(void *HstPtr, int64_t Size) = 0;
/// Unpin a host memory buffer that was previously pinned.
Error dataUnlock(void *HstPtr) {
return PinnedAllocs.unlockHostBuffer(HstPtr);
}
virtual Error dataUnlockImpl(void *HstPtr) = 0;
/// Submit data to the device (host to device transfer).
Error dataSubmit(void *TgtPtr, const void *HstPtr, int64_t Size,
__tgt_async_info *AsyncInfo);
virtual Error dataSubmitImpl(void *TgtPtr, const void *HstPtr, int64_t Size,
AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Retrieve data from the device (device to host transfer).
Error dataRetrieve(void *HstPtr, const void *TgtPtr, int64_t Size,
__tgt_async_info *AsyncInfo);
virtual Error dataRetrieveImpl(void *HstPtr, const void *TgtPtr, int64_t Size,
AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Exchange data between devices (device to device transfer). Calling this
/// function is only valid if GenericPlugin::isDataExchangable() passing the
/// two devices returns true.
Error dataExchange(const void *SrcPtr, GenericDeviceTy &DstDev, void *DstPtr,
int64_t Size, __tgt_async_info *AsyncInfo);
virtual Error dataExchangeImpl(const void *SrcPtr, GenericDeviceTy &DstDev,
void *DstPtr, int64_t Size,
AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Run the kernel associated with \p EntryPtr
Error launchKernel(void *EntryPtr, void **ArgPtrs, ptrdiff_t *ArgOffsets,
KernelArgsTy &KernelArgs, __tgt_async_info *AsyncInfo);
/// Initialize a __tgt_async_info structure. Related to interop features.
Error initAsyncInfo(__tgt_async_info **AsyncInfoPtr);
virtual Error initAsyncInfoImpl(AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Initialize a __tgt_device_info structure. Related to interop features.
Error initDeviceInfo(__tgt_device_info *DeviceInfo);
virtual Error initDeviceInfoImpl(__tgt_device_info *DeviceInfo) = 0;
/// Create an event.
Error createEvent(void **EventPtrStorage);
virtual Error createEventImpl(void **EventPtrStorage) = 0;
/// Destroy an event.
Error destroyEvent(void *Event);
virtual Error destroyEventImpl(void *EventPtr) = 0;
/// Start the recording of the event.
Error recordEvent(void *Event, __tgt_async_info *AsyncInfo);
virtual Error recordEventImpl(void *EventPtr,
AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Wait for an event to finish. Notice this wait is asynchronous if the
/// __tgt_async_info is not nullptr.
Error waitEvent(void *Event, __tgt_async_info *AsyncInfo);
virtual Error waitEventImpl(void *EventPtr,
AsyncInfoWrapperTy &AsyncInfoWrapper) = 0;
/// Synchronize the current thread with the event.
Error syncEvent(void *EventPtr);
virtual Error syncEventImpl(void *EventPtr) = 0;
/// Print information about the device.
Error printInfo();
virtual Error printInfoImpl() = 0;
/// Getters of the grid values.
uint32_t getWarpSize() const { return GridValues.GV_Warp_Size; }
uint32_t getThreadLimit() const { return GridValues.GV_Max_WG_Size; }
uint32_t getBlockLimit() const { return GridValues.GV_Max_Teams; }
uint32_t getDefaultNumThreads() const {
return GridValues.GV_Default_WG_Size;
}
uint32_t getDefaultNumBlocks() const {
return GridValues.GV_Default_Num_Teams;
}
uint32_t getDynamicMemorySize() const { return OMPX_SharedMemorySize; }
/// Get target compute unit kind (e.g., sm_80, or gfx908).
virtual std::string getComputeUnitKind() const { return "unknown"; }
/// Post processing after jit backend. The ownership of \p MB will be taken.
virtual Expected<std::unique_ptr<MemoryBuffer>>
doJITPostProcessing(std::unique_ptr<MemoryBuffer> MB) const {
return std::move(MB);
}
private:
/// Register offload entry for global variable.
Error registerGlobalOffloadEntry(DeviceImageTy &DeviceImage,
const __tgt_offload_entry &GlobalEntry,
__tgt_offload_entry &DeviceEntry);
/// Register offload entry for kernel function.
Error registerKernelOffloadEntry(DeviceImageTy &DeviceImage,
const __tgt_offload_entry &KernelEntry,
__tgt_offload_entry &DeviceEntry);
/// Allocate and construct a kernel object.
virtual Expected<GenericKernelTy *>
constructKernelEntry(const __tgt_offload_entry &KernelEntry,
DeviceImageTy &Image) = 0;
/// Get and set the stack size and heap size for the device. If not used, the
/// plugin can implement the setters as no-op and setting the output
/// value to zero for the getters.
virtual Error getDeviceStackSize(uint64_t &V) = 0;
virtual Error setDeviceStackSize(uint64_t V) = 0;
virtual Error getDeviceHeapSize(uint64_t &V) = 0;
virtual Error setDeviceHeapSize(uint64_t V) = 0;
/// Indicate whether the device should setup the device environment. Notice
/// that returning false in this function will change the behavior of the
/// setupDeviceEnvironment() function.
virtual bool shouldSetupDeviceEnvironment() const { return true; }
/// Pointer to the memory manager or nullptr if not available.
MemoryManagerTy *MemoryManager;
/// Environment variables defined by the OpenMP standard.
Int32Envar OMP_TeamLimit;
Int32Envar OMP_NumTeams;
Int32Envar OMP_TeamsThreadLimit;
/// Environment variables defined by the LLVM OpenMP implementation.
Int32Envar OMPX_DebugKind;
UInt32Envar OMPX_SharedMemorySize;
UInt64Envar OMPX_TargetStackSize;
UInt64Envar OMPX_TargetHeapSize;
protected:
/// Return the execution mode used for kernel \p Name.
Expected<OMPTgtExecModeFlags> getExecutionModeForKernel(StringRef Name,
DeviceImageTy &Image);
/// Environment variables defined by the LLVM OpenMP implementation
/// regarding the initial number of streams and events.
UInt32Envar OMPX_InitialNumStreams;
UInt32Envar OMPX_InitialNumEvents;
/// Array of images loaded into the device. Images are automatically
/// deallocated by the allocator.
llvm::SmallVector<DeviceImageTy *> LoadedImages;
/// The identifier of the device within the plugin. Notice this is not a
/// global device id and is not the device id visible to the OpenMP user.
const int32_t DeviceId;
/// The default grid values used for this device.
llvm::omp::GV GridValues;
/// Enumeration used for representing the current state between two devices
/// two devices (both under the same plugin) for the peer access between them.
/// The states can be a) PENDING when the state has not been queried and needs
/// to be queried, b) AVAILABLE when the peer access is available to be used,
/// and c) UNAVAILABLE if the system does not allow it.
enum class PeerAccessState : uint8_t { AVAILABLE, UNAVAILABLE, PENDING };
/// Array of peer access states with the rest of devices. This means that if
/// the device I has a matrix PeerAccesses with PeerAccesses[J] == AVAILABLE,
/// the device I can access device J's memory directly. However, notice this
/// does not mean that device J can access device I's memory directly.
llvm::SmallVector<PeerAccessState> PeerAccesses;
std::mutex PeerAccessesLock;
/// Map of host pinned allocations used for optimize device transfers.
PinnedAllocationMapTy PinnedAllocs;
};
/// Class implementing common functionalities of offload plugins. Each plugin
/// should define the specific plugin class, derive from this generic one, and
/// implement the necessary virtual function members.
struct GenericPluginTy {
/// Construct a plugin instance.
GenericPluginTy(Triple::ArchType TA)
: RequiresFlags(OMP_REQ_UNDEFINED), GlobalHandler(nullptr), JIT(TA) {}
virtual ~GenericPluginTy() {}
/// Initialize the plugin.
Error init();
/// Initialize the plugin and return the number of available devices.
virtual Expected<int32_t> initImpl() = 0;
/// Deinitialize the plugin and release the resources.
Error deinit();
virtual Error deinitImpl() = 0;
/// Get the reference to the device with a certain device id.
GenericDeviceTy &getDevice(int32_t DeviceId) {
assert(isValidDeviceId(DeviceId) && "Invalid device id");
assert(Devices[DeviceId] && "Device is unitialized");
return *Devices[DeviceId];
}
/// Get the number of active devices.
int32_t getNumDevices() const { return NumDevices; }
/// Get the ELF code to recognize the binary image of this plugin.
virtual uint16_t getMagicElfBits() const = 0;
/// Get the target triple of this plugin.
virtual Triple::ArchType getTripleArch() const = 0;
/// Allocate a structure using the internal allocator.
template <typename Ty> Ty *allocate() {
return reinterpret_cast<Ty *>(Allocator.Allocate(sizeof(Ty), alignof(Ty)));
}
/// Get the reference to the global handler of this plugin.
GenericGlobalHandlerTy &getGlobalHandler() {
assert(GlobalHandler && "Global handler not initialized");
return *GlobalHandler;
}
/// Get the reference to the JIT used for all devices connected to this
/// plugin.
JITEngine &getJIT() { return JIT; }
/// Get the OpenMP requires flags set for this plugin.
int64_t getRequiresFlags() const { return RequiresFlags; }
/// Set the OpenMP requires flags for this plugin.
void setRequiresFlag(int64_t Flags) { RequiresFlags = Flags; }
/// Initialize a device within the plugin.
Error initDevice(int32_t DeviceId);
/// Deinitialize a device within the plugin and release its resources.
Error deinitDevice(int32_t DeviceId);
/// Indicate whether data can be exchanged directly between two devices under
/// this same plugin. If this function returns true, it's safe to call the
/// GenericDeviceTy::exchangeData() function on the source device.
virtual bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) {
return isValidDeviceId(SrcDeviceId) && isValidDeviceId(DstDeviceId);
}
/// Indicate if an image is compatible with the plugin devices. Notice that
/// this function may be called before actually initializing the devices. So
/// we could not move this function into GenericDeviceTy.
virtual Expected<bool> isImageCompatible(__tgt_image_info *Info) const = 0;
/// Indicate whether the plugin supports empty images.
virtual bool supportsEmptyImages() const { return false; }
protected:
/// Indicate whether a device id is valid.
bool isValidDeviceId(int32_t DeviceId) const {
return (DeviceId >= 0 && DeviceId < getNumDevices());
}
private:
/// Number of devices available for the plugin.
int32_t NumDevices;
/// Array of pointers to the devices. Initially, they are all set to nullptr.
/// Once a device is initialized, the pointer is stored in the position given
/// by its device id. A position with nullptr means that the corresponding
/// device was not initialized yet.
llvm::SmallVector<GenericDeviceTy *> Devices;
/// OpenMP requires flags.
int64_t RequiresFlags;
/// Pointer to the global handler for this plugin.
GenericGlobalHandlerTy *GlobalHandler;
/// Internal allocator for different structures.
BumpPtrAllocator Allocator;
/// The JIT engine shared by all devices connected to this plugin.
JITEngine JIT;
};
/// Class for simplifying the getter operation of the plugin. Anywhere on the
/// code, the current plugin can be retrieved by Plugin::get(). The class also
/// declares functions to create plugin-specific object instances. The check(),
/// createPlugin(), createDevice() and createGlobalHandler() functions should be
/// defined by each plugin implementation.
class Plugin {
// Reference to the plugin instance.
static GenericPluginTy *SpecificPlugin;
Plugin() {
if (auto Err = init())
REPORT("Failed to initialize plugin: %s\n",
toString(std::move(Err)).data());
}
~Plugin() {
if (auto Err = deinit())
REPORT("Failed to deinitialize plugin: %s\n",
toString(std::move(Err)).data());
}
Plugin(const Plugin &) = delete;
void operator=(const Plugin &) = delete;
/// Create and intialize the plugin instance.
static Error init() {
assert(!SpecificPlugin && "Plugin already created");
// Create the specific plugin.
SpecificPlugin = createPlugin();
assert(SpecificPlugin && "Plugin was not created");
// Initialize the plugin.
return SpecificPlugin->init();
}
// Deinitialize and destroy the plugin instance.
static Error deinit() {
assert(SpecificPlugin && "Plugin no longer valid");
// Deinitialize the plugin.
if (auto Err = SpecificPlugin->deinit())
return Err;
// Delete the plugin instance.
delete SpecificPlugin;
// Invalidate the plugin reference.
SpecificPlugin = nullptr;
return Plugin::success();
}
public:
/// Initialize the plugin if needed. The plugin could have been initialized by
/// a previous call to Plugin::get().
static Error initIfNeeded() {
// Trigger the initialization if needed.
get();
return Error::success();
}
// Deinitialize the plugin if needed. The plugin could have been deinitialized
// because the plugin library was exiting.
static Error deinitIfNeeded() {
// Do nothing. The plugin is deinitialized automatically.
return Plugin::success();
}
/// Get a reference (or create if it was not created) to the plugin instance.
static GenericPluginTy &get() {
// This static variable will initialize the underlying plugin instance in
// case there was no previous explicit initialization. The initialization is
// thread safe.
static Plugin Plugin;
assert(SpecificPlugin && "Plugin is not active");
return *SpecificPlugin;
}
/// Get a reference to the plugin with a specific plugin-specific type.
template <typename Ty> static Ty &get() { return static_cast<Ty &>(get()); }
/// Indicate whether the plugin is active.
static bool isActive() { return SpecificPlugin != nullptr; }
/// Create a success error. This is the same as calling Error::success(), but
/// it is recommended to use this one for consistency with Plugin::error() and
/// Plugin::check().
static Error success() { return Error::success(); }
/// Create a string error.
template <typename... ArgsTy>
static Error error(const char *ErrFmt, ArgsTy... Args) {
return createStringError(inconvertibleErrorCode(), ErrFmt, Args...);
}
/// Check the plugin-specific error code and return an error or success
/// accordingly. In case of an error, create a string error with the error
/// description. The ErrFmt should follow the format:
/// "Error in <function name>[<optional info>]: %s"
/// The last format specifier "%s" is mandatory and will be used to place the
/// error code's description. Notice this function should be only called from
/// the plugin-specific code.
template <typename... ArgsTy>
static Error check(int32_t ErrorCode, const char *ErrFmt, ArgsTy... Args);
/// Create a plugin instance.
static GenericPluginTy *createPlugin();
/// Create a plugin-specific device.
static GenericDeviceTy *createDevice(int32_t DeviceId, int32_t NumDevices);
/// Create a plugin-specific global handler.
static GenericGlobalHandlerTy *createGlobalHandler();
};
/// Auxiliary interface class for GenericDeviceResourceManagerTy. This class
/// acts as a reference to a device resource, such as a stream, and requires
/// some basic functions to be implemented. The derived class should define an
/// empty constructor that creates an empty and invalid resource reference. Do
/// not create a new resource on the ctor, but on the create() function instead.
struct GenericDeviceResourceRef {
/// Create a new resource and stores a reference.
virtual Error create(GenericDeviceTy &Device) = 0;
/// Destroy and release the resources pointed by the reference.
virtual Error destroy(GenericDeviceTy &Device) = 0;
protected:
~GenericDeviceResourceRef() = default;
};
/// Class that implements a resource pool belonging to a device. This class
/// operates with references to the actual resources. These reference must
/// derive from the GenericDeviceResourceRef class and implement the create
/// and destroy virtual functions.
template <typename ResourceRef> class GenericDeviceResourceManagerTy {
using ResourcePoolTy = GenericDeviceResourceManagerTy<ResourceRef>;
public:
/// Create an empty resource pool for a specific device.
GenericDeviceResourceManagerTy(GenericDeviceTy &Device)
: Device(Device), NextAvailable(0) {}
/// Destroy the resource pool. At this point, the deinit() function should
/// already have been executed so the resource pool should be empty.
virtual ~GenericDeviceResourceManagerTy() {
assert(ResourcePool.empty() && "Resource pool not empty");
}
/// Initialize the resource pool.
Error init(uint32_t InitialSize) {
assert(ResourcePool.empty() && "Resource pool already initialized");
return ResourcePoolTy::resizeResourcePool(InitialSize);
}
/// Deinitialize the resource pool and delete all resources. This function
/// must be called before the destructor.
Error deinit() {
if (NextAvailable)
DP("Missing %d resources to be returned\n", NextAvailable);
// TODO: This prevents a bug on libomptarget to make the plugins fail. There
// may be some resources not returned. Do not destroy these ones.
if (auto Err = ResourcePoolTy::resizeResourcePool(NextAvailable))
return Err;
ResourcePool.clear();
return Plugin::success();
}
/// Get resource from the pool or create new resources.
ResourceRef getResource() {
const std::lock_guard<std::mutex> Lock(Mutex);
assert(NextAvailable <= ResourcePool.size() &&
"Resource pool is corrupted");
if (NextAvailable == ResourcePool.size()) {
// By default we double the resource pool every time.
if (auto Err = ResourcePoolTy::resizeResourcePool(NextAvailable * 2)) {
REPORT("Failure to resize the resource pool: %s",
toString(std::move(Err)).data());
// Return an empty reference.
return ResourceRef();
}
}
return ResourcePool[NextAvailable++];
}
/// Return resource to the pool.
void returnResource(ResourceRef Resource) {
const std::lock_guard<std::mutex> Lock(Mutex);
assert(NextAvailable > 0 && "Resource pool is corrupted");
ResourcePool[--NextAvailable] = Resource;
}
private:
/// The resources between \p OldSize and \p NewSize need to be created or
/// destroyed. The mutex is locked when this function is called.
Error resizeResourcePoolImpl(uint32_t OldSize, uint32_t NewSize) {
assert(OldSize != NewSize && "Resizing to the same size");
if (auto Err = Device.setContext())
return Err;
if (OldSize < NewSize) {
// Create new resources.
for (uint32_t I = OldSize; I < NewSize; ++I) {
if (auto Err = ResourcePool[I].create(Device))
return Err;
}
} else {
// Destroy the obsolete resources.
for (uint32_t I = NewSize; I < OldSize; ++I) {
if (auto Err = ResourcePool[I].destroy(Device))
return Err;
}
}
return Plugin::success();
}
/// Increase or decrease the number of resources. This function should
/// be called with the mutex acquired.
Error resizeResourcePool(uint32_t NewSize) {
uint32_t OldSize = ResourcePool.size();
// Nothing to do.
if (OldSize == NewSize)
return Plugin::success();
if (OldSize < NewSize) {
// Increase the number of resources.
ResourcePool.resize(NewSize);
return ResourcePoolTy::resizeResourcePoolImpl(OldSize, NewSize);
}
// Decrease the number of resources otherwise.
auto Err = ResourcePoolTy::resizeResourcePoolImpl(OldSize, NewSize);
ResourcePool.resize(NewSize);
return Err;
}
/// The device to which the resources belong
GenericDeviceTy &Device;
/// Mutex for the resource pool.
std::mutex Mutex;
/// The next available resource in the pool.
uint32_t NextAvailable;
/// The actual resource pool.
std::deque<ResourceRef> ResourcePool;
};
} // namespace plugin
} // namespace target
} // namespace omp
} // namespace llvm
#endif // OPENMP_LIBOMPTARGET_PLUGINS_COMMON_PLUGININTERFACE_H
|