File: README.md

package info (click to toggle)
ispc 1.28.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 97,620 kB
  • sloc: cpp: 77,067; python: 8,303; yacc: 3,337; lex: 1,126; ansic: 631; sh: 475; makefile: 17
file content (309 lines) | stat: -rw-r--r-- 12,262 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
ISPC Examples README
====================

This directory has a number of sample ISPC programs ported to Xe. Before building them, install the appropriate `ispc`
compiler binary and runtime into a directory in your path.  Add ISPC binary to your PATH. Then, do the following:

```
mkdir build
cd build
cmake ../
```

Some of the benchmarks are running `ispc` for CPU/Xe and then regular serial C++ implementations, printing out execution
time.

Simple
======

This is the most basic example. It executes a simple kernel on target device (which can be a Xe GPU or CPU) and
demonstrates basics concepts of ISPC Runtime API (such as device, module, kernel, memory view).  It uses C++ API of ISPC
Runtime.

If no command line arguments are provided, the example chooses device to execute on automatically. It is possible to
force usage of concrete device using command line options:

`simple [ --cpu | --gpu ]`

Simple-USM
==========

This example corresponds to the Simple example, but uses shared memory mechanisms. The shared memory functionality in
Level Zero allows for allocating memory that is shared between the CPU and the GPU and forms Unified Shared Memory
(pointers valid on the CPU are also valid on the GPU). There is no need to explicitly copy data between the host and the
device. This is handled by the Level Zero.

The ISPC Runtime enables using the USM via Array type and provides an allocator that can be used in standard C++
containers, such as `std::vector`.


Simple-fence
============

This example shows how one can use ISPCRT to asynchronously compute something on GPU with other work being
done on CPU in parallel. It is derived from `Simple` example. The key difference is that `TaskQueue` object is not used
here. `CommandQueue` and `CommandList` objects are explicitly constructed here instead. Commands are submitted with
`copyToDevice`, `launch` and `copyToHost` methods of the command list object. It is important to notice that barriers
should be inserted explicitly if needed between computations or memory copying. It can be done with `barrier` method
of `CommandList` object. After filling the command list, `submit` method is called. It instructs GPU to execute
the submitted commands. This method returns a `Fence` object. It has two states: unsignalled and signalled. GPU change
the fence object state to signalled when execution is completed. Fence state is checked via `status` method. After
submission, host CPU thread computes the validation result effectively in parallel with GPU computation. After that,
fence object is waited in the loop until being signalled.

`host_simple-fence [--gpu | --cpu ]`


AOBench
=======

This is an ISPC implementation of the "AO bench" benchmark
(http://syoyo.wordpress.com/2009/01/26/ao-bench-is-evolving/).  The command line arguments are:

`ao (num iterations) (x resolution) (y resolution)`

This examples also demontrates usage of C interface of ISPC Runtime so you can see how to execute the same ISPC kernel
on CPU and GPU in a seamless way.

It executes the program for the given number of iterations, rendering an (xres x yres) image each time and measuring the
computation time with serial and ISPC implementations on CPU and Xe.


Mandelbrot
==========

Mandelbrot set generation. This example is extensively documented at the https://ispc.github.io/example.html page. The
comamnd line arguments are:

`mandelbrot [--scale=<factor>] [tasks iterations] [serial iterations]`

This examples also demonstrates usage of C++ interface of ISPC Runtime so you can see how to execute the same ISPC
kernel on CPU and GPU in a seamless way.

It executes the program for the given number of iterations, rendering an image of fixed size each time and measuring the
computation time with serial and ISPC implementations on CPU and Xe.  You can change scale of the image with `--scale`
option.


Noise
=====

This example has an implementation of Ken Perlin's procedural "noise" function, as described in his 2002 "Improving
Noise" SIGGRAPH paper. The command line arguments are:

`noise [niterations] [group threads width] [group threads height]`

This examples also demonstrates usage of C++ interface of ISPC Runtime so you can see how to execute the same ISPC
kernel on CPU and GPU in a seamless way.

It executes the program for the given number of iterations in particular thread space, rendering an image of fixed size
each time and measuring the computation time with serial and ISPC implementations on CPU and Xe.


SGEMM
=====

This program uses ISPC to implement naive version of matrix multiply.

The command line arguments are:

`sgemm (optional)[num iterations] (optional)[group threads width] (optional)[group threads height]`

This example demonstrate usage of pure Level Zero.


Simple-DPCPP
============

This simple example demonstrates a basic scenario of interoperability between ISPC and the oneAPI DPC++ Compiler. It
runs an ISPC kernel using ISPC Runtime and then creates a SYCL context using native Level Zero handles obtained from
ISPCRT.  Then it runs a corresponding SYCL kernel in SYCL. The results are compared to confirm that those are identical.

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.


Simple-DPCPP-L0
===============

This simple example demonstrates a basic scenario of interoperability between ISPC and the oneAPI DPC++ Compiler. It
runs an ISPC kernel in a Level Zero context and then a corresponding SYCL kernel in SYCL context created from the same
Level Zero context.  Then the results are compared to check if those are identical.  The key difference between this and
the previous example is that this one uses native Level Zero API then the previous one uses ISPCRT.

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.

Pipeline-DPCPP
==============

This example demonstrates how to create a pipeline of kernels in the ISPC and the oneAPI DPC++ Compiler that cooperate
working on a single problem represented by a memory region. The memory region is shared between the kernels, but it also
is shared between the CPU and the GPU. The Level Zero runtime takes care of the necessary data movements in an efficent
way and the user does not need to manage copying data to/from the GPU.

This example requires the oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.

Simple-ESIMD
============

This simple example demonstrates a basic scenario of interoperability between ISPC and Explicit SIMD SYCL* Extension. It
uses ISPC Runtime and runs an ISPC kernel which calls to ESIMD function.

It is required to use include interop.cmake file to your CMakeLists.txt if you want to use ISPC/ESIMD interoperability
feature:

`include(${ISPCRT_DIR}/interop.cmake)`

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.

vadd-esimd
==========

This vector add example demonstrates a basic scenario of interoperability between Explicit SIMD SYCL* Extension and
ISPC. It uses SYCL Runtime and runs an ESIMD kernel which calls to ISPC function.

It is required to use include interop.cmake file to your CMakeLists.txt if you want to use ISPC/ESIMD interoperability
feature:

`include(${ISPCRT_DIR}/interop.cmake)`

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.

callback-esimd
==============

This example demonstrates usage of callbacks between ISPC and Explicit SIMD SYCL* Extension by passing a pointer to an
ISPC function to ESIMD and calling that function from ESIMD.

It is required to use include interop.cmake file to your CMakeLists.txt if you want to use ISPC/ESIMD interoperability
feature:

`include(${ISPCRT_DIR}/interop.cmake)`

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running this example may require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++ Compiler
libraries.

invoke-sycl-aobench
===================

This is simple aobench-like example demonstrating call of SYCL from ISPC using binary and vISA linking.  It has host
part written in ISPC Runtime and uses `invoke_sycl` in ISPC code to call SYCL function.

The CMake provided will build both an ISPC-only version (`invoke_sycl_aobench_ispc`), SYCL reference version
(`aobench_sycl_bin`), and a ISPC/SYCL version (`invoke_sycl_aobench_ispc_sycl_bin` for binary linking and
`invoke_sycl_aobench_ispc_sycl_visa` for vISA linking)

It is required to use include interop.cmake file to your CMakeLists.txt if you want to use ISPC/SYCL interoperability
feature:

`include(${ISPCRT_DIR}/interop.cmake)`

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running the example requires setting of one environment variable:

```
IGC_ForceOCLSIMDWidth=<ISPC SIMD width> (which is set to 16 in CMakeLists.txt)
```

Running this example may also require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++
Compiler libraries.

ISPC/SYCL interop tests are target Gen12+ HW.

invoke-simd-vadd
================

This is simple vector_add example. It has host part written in SYCL and uses `invoke_simd` to call ISPC function with
simple SIMD CF for vector addition/substraction.

It is required to use include interop.cmake file to your CMakeLists.txt if you want to use ISPC/SYCL interoperability
feature:

`include(${ISPCRT_DIR}/interop.cmake)`

It requires oneAPI DPC++ Compiler.

To enable this example please configure the build of ISPC examples using the following command line:

```
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
      -DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
```

Running the example requires setting of two environment variables:

```
IGC_VCSaveStackCallLinkage=1
IGC_ForceOCLSIMDWidth=<ISPC SIMD width> (which is set to 16 in CMakeLists.txt)
```

Running this example may also require setting the `LD_LIBRARY_PATH` environmental variable to include oneAPI DPC++
Compiler libraries.

ISPC/SYCL interop tests are target Gen12+ HW.