1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|
# ConfigFile Format:
# ==================
# A Transfer is defined as a uni-directional transfer from src memory location to dst memory location
# executed by either CPU or GPU
# Each single line in the configuration file defines a set of Transfers (a Test) to run in parallel
# There are two ways to specify the configuration file:
# 1) Basic
# The basic specification assumes the same number of threadblocks/CUs used per GPU-executed Transfer
# A positive number of Transfers is specified followed by that number of triplets describing each Transfer
# #Transfers #CUs (srcMem1->Executor1->dstMem1) ... (srcMemL->ExecutorL->dstMemL)
# 2) Advanced
# The advanced specification allows different number of threadblocks/CUs used per GPU-executed Transfer
# A negative number of Transfers is specified, followed by quadruples describing each Transfer
# -#Transfers (srcMem1->Executor1->dstMem1 #CUs1) ... (srcMemL->ExecutorL->dstMemL #CUsL)
# Argument Details:
# #Transfers: Number of Transfers to be run in parallel
# #CUs : Number of threadblocks/CUs to use for a GPU-executed Transfer
# srcMemL : Source memory location (Where the data is to be read from). Ignored in memset mode
# Executor : Executor is specified by a character indicating type, followed by device index (0-indexed)
# - C: CPU-executed (Indexed from 0 to # NUMA nodes - 1)
# - G: GPU-executed (Indexed from 0 to # GPUs - 1)
# dstMemL : Destination memory location (Where the data is to be written to)
# Memory locations are specified by a character indicating memory type,
# followed by device index (0-indexed)
# Supported memory locations are:
# - C: Pinned host memory (on NUMA node, indexed from 0 to [# NUMA nodes-1])
# - B: Fine-grain host memory (on NUMA node, indexed from 0 to [# NUMA nodes-1])
# - G: Global device memory (on GPU device indexed from 0 to [# GPUs - 1])
# - F: Fine-grain device memory (on GPU device indexed from 0 to [# GPUs - 1])
# Examples:
# 1 4 (G0->G0->G1) Single Transfer using 4 CUs on GPU0 to copy from GPU0 to GPU1
# 1 4 (C1->G2->G0) Single Transfer using 4 CUs on GPU2 to copy from CPU1 to GPU0
# 2 4 G0->G0->G1 G1->G1->G0 Runs 2 Transfers in parallel. GPU0 to GPU1, and GPU1 to GPU0, each with 4 CUs
# -2 (G0 G0 G1 4) (G1 G1 G0 2) Runs 2 Transfers in parallel. GPU0 to GPU1 with 4 CUs, and GPU1 to GPU0 with 2 CUs
# Round brackets and arrows' ->' may be included for human clarity, but will be ignored and are unnecessary
# Lines starting with # will be ignored. Lines starting with ## will be echoed to output
# Single GPU-executed Transfer between GPUs 0 and 1 using 4 CUs
1 4 (G0->G0->G1)
|