File: README.TXT

package info (click to toggle)
nvidia-cuda-toolkit 11.8.0-5~deb12u1
  • links: PTS, VCS
  • area: non-free
  • in suites: bookworm
  • size: 18,338,396 kB
  • sloc: ansic: 172,472; cpp: 57,058; javascript: 21,597; python: 12,656; xml: 12,438; makefile: 2,949; sh: 2,056; perl: 352
file content (40 lines) | stat: -rw-r--r-- 2,059 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
== Sample CUDA application for uncoalesced global memory accesses ==
Adds a floating point constant to an input array of double3 of N elements in global memory and generates an output array of double3 in global memory. 

Defines two versions of CUDA kernel
addConstDouble3 : naive version which results in uncoalesced global memory accesses 
addConstDouble  : version which treats the double3 array as a double array and avoids uncoalesced global memory accesses

Compiling the code:
==================
  > nvcc -lineinfo -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 uncoalescedGlobalAccesses.cu -o uncoalescedGlobalAccesses

Command line arguments (both are optional):
==========================================
1) <version of kernel to use> Integer value, If not specified uses 0.
          0: Use naive version of kernel addConstDouble3()
          1: Use addConstDouble() kernel
2) <N - number of elements in input array> Should be a positive number. Default value: 1048576 (1024 x 1024)

Sample usage:
============
- Run with default arguments - addConstDouble3() kernel and default value of N
  > uncoalescedGlobalAccesses

- Run with the addConstDouble() kernel and default value of N
  > uncoalescedGlobalAccesses 1

 - Run with the addConstDouble3() kernel and N=512
  > uncoalescedGlobalAccesses 0 512


Profiling the sample using Nsight Compute command line
======================================================
- Profile addConstDouble3() - the  initial version of kernel
  > ncu --set full --import-source on  -o addConstDouble3.ncu-rep ./uncoalescedGlobalAccesses

- Profile addConstDouble() - the  updated version of the kernel
  > ncu --set full --import-source on  -o addConstDouble.ncu-rep ./uncoalescedGlobalAccesses 1

The profiler report files for the sample are also provided and they can be opened in the 
Nsight Compute UI using the "File->Open" menu option.