File: help-accelerator-cuda.txt

package info (click to toggle)
openmpi 5.0.7-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, trixie
  • size: 202,312 kB
  • sloc: ansic: 612,441; makefile: 42,495; sh: 11,230; javascript: 9,244; f90: 7,052; java: 6,404; perl: 5,154; python: 1,856; lex: 740; fortran: 61; cpp: 20; tcl: 12
file content (264 lines) | stat: -rw-r--r-- 10,157 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
# -*- text -*-
#
# Copyright (c) 2011-2015 NVIDIA.  All rights reserved.
# Copyright (c) 2015      Cisco Systems, Inc.  All rights reserved.
# Copyright (c) 2022      Amazon.com, Inc. or its affiliates.
#                         All Rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
[cuCtxGetCurrent failed not initialized]
WARNING: The call to cuCtxGetCurrent() failed while attempting to register
internal memory with the CUDA environment.  The program will continue to run,
but the performance of GPU memory transfers may be reduced.  This failure
indicates that the CUDA environment is not yet initialized.  To eliminate
this warning, ensure that CUDA is initialized prior to calling MPI_Init.

NOTE: You can turn off this warning by setting the MCA parameter
      mpi_common_cuda_warning to 0.
#
[cuCtxGetCurrent failed]
WARNING: The call to cuCtxGetCurrent() failed while attempting to register
internal memory with the CUDA environment.  The program will continue to run,
but the performance of GPU memory transfers may be reduced.
  cuCtxGetCurrent return value:   %d

NOTE: You can turn off this warning by setting the MCA parameter
      mpi_common_cuda_warning to 0.
#
[cuCtxGetCurrent returned NULL]
WARNING: The call to cuCtxGetCurrent() failed while attempting to register
internal memory with the CUDA environment.  The program will continue to run,
but the performance of GPU memory transfers may be reduced.  This failure
indicates that there is no CUDA context yet.  To eliminate this warning,
ensure that there is a CUDA context prior to calling MPI_Init.

NOTE: You can turn off this warning by setting the MCA parameter
      mpi_common_cuda_warning to 0.
#
[cuCtxGetDevice failed]
WARNING: The call to cuCtxGetDevice() failed.
  cuCtxGetDevice return value:   %d

NOTE: You can turn off this warning by setting the MCA parameter
      mpi_common_cuda_warning to 0.
#
[cuMemHostRegister during init failed]
The call to cuMemHostRegister(%p, %d, 0) failed.
  Host:  %s
  cuMemHostRegister return value:  %d
  Registration cache:  %s
#
[cuMemHostRegister failed]
The call to cuMemHostRegister(%p, %d, 0) failed.
  Host:  %s
  cuMemHostRegister return value:  %d
  Registration cache:  %s
#
[cuMemHostUnregister failed]
The call to cuMemHostRegister(%p) failed.
  Host:  %s
  cuMemHostUnregister return value:  %d
#
[cuIpcGetMemHandle failed]
The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol
cannot be used.
  cuIpcGetMemHandle return value:   %d
  address: %p
Check the cuda.h file for what the return value means. Perhaps a reboot
of the node will clear the problem.
#
[cuMemGetAddressRange failed]
The call to cuMemGetAddressRange failed. This means the GPU RDMA protocol
cannot be used.
  cuMemGetAddressRange return value:   %d
  address: %p
Check the cuda.h file for what the return value means. Perhaps a reboot
of the node will clear the problem.
#
[cuMemGetAddressRange failed 2]
The call to cuMemGetAddressRange failed during the GPU RDMA protocol.
  Host:  %s
  cuMemGetAddressRange return value:  %d
  address:  %p
Check the cuda.h file for what the return value means. This is highly
unusual and should not happen. The program will probably abort.
#
[Out of cuEvent handles]
The library has exceeded its number of outstanding event handles.
For better performance, this number should be increased.
  Current maximum handles:   %4d
  Suggested new maximum:     %4d
Rerun with --mca mpi_common_cuda_event_max %d
#
[cuIpcOpenMemHandle failed]
The call to cuIpcOpenMemHandle failed. This is an unrecoverable error
and will cause the program to abort.
  Hostname:                         %s
  cuIpcOpenMemHandle return value:  %d
  address:                          %p
Check the cuda.h file for what the return value means. A possible cause
for this is not enough free device memory.  Try to reduce the device
memory footprint of your application.
#
[cuIpcCloseMemHandle failed]
The call to cuIpcCloseMemHandle failed. This is a warning and the program
will continue to run.
  cuIpcCloseMemHandle return value:   %d
  address: %p
Check the cuda.h file for what the return value means. Perhaps a reboot
of the node will clear the problem.
#
[cuMemcpyAsync failed]
The call to cuMemcpyAsync failed. This is a unrecoverable error and will
cause the program to abort.
  cuMemcpyAsync(%p, %p, %d) returned value %d
Check the cuda.h file for what the return value means.
#
[cuEventCreate failed]
The call to cuEventCreate failed. This is a unrecoverable error and will
cause the program to abort.
  Hostname:                     %s
  cuEventCreate return value:   %d
Check the cuda.h file for what the return value means.
#
[cuEventRecord failed]
The call to cuEventRecord failed. This is a unrecoverable error and will
cause the program to abort.
  Hostname:                     %s
  cuEventRecord return value:   %d
Check the cuda.h file for what the return value means.
#
[cuEventQuery failed]
The call to cuEventQuery failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventQuery return value:   %d
Check the cuda.h file for what the return value means.
#
[cuEventSynchronize failed]
The call to cuEventSynchronize failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  cuEventSynchronize return value:     %d
#
[cuIpcGetEventHandle failed]
The call to cuIpcGetEventHandle failed. This is a unrecoverable error and will
cause the program to abort.
  cuIpcGetEventHandle return value:   %d
Check the cuda.h file for what the return value means.
#
[cuIpcOpenEventHandle failed]
The call to cuIpcOpenEventHandle failed. This is a unrecoverable error and will
cause the program to abort.
  cuIpcOpenEventHandle return value:   %d
Check the cuda.h file for what the return value means.
#
[cuStreamWaitEvent failed]
The call to cuStreamWaitEvent failed. This is a unrecoverable error and will
cause the program to abort.
  cuStreamWaitEvent return value:   %d
Check the cuda.h file for what the return value means.
#
[cuEventDestroy failed]
The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   %d
Check the cuda.h file for what the return value means.
#
[cuStreamCreate failed]
The call to cuStreamCreate failed.  This is a unrecoverable error and will
cause the program to abort.
  Hostname:                      %s
  cuStreamCreate return value:   %d
Check the cuda.h file for what the return vale means.
#
[cuStreamDestroy failed]
The call to cuStreamDestroy failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  cuStreamDestroy return value:         %d
Check the cuda.h file for what the return value means.
#
[dlopen disabled]
Open MPI was compiled without dynamic library support (e.g., with the
 --disable-dlopen flag), and therefore cannot utilize CUDA support.

If you need CUDA support, reconfigure Open MPI with dynamic library support enabled.
#
[dlopen failed]
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
%s
If you do not require CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you do
require CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to resolve this issue.
#
[dlsym failed]
An error occurred while trying to map in the address of a function.
  Function Name: %s
  Error string:  %s
CUDA-aware support is disabled.
#
[bufferID failed]
An error occurred while trying to get the BUFFER_ID of a GPU memory
region.  This could cause incorrect results.  Turn of GPU Direct RDMA
support by running with --mca btl_openib_cuda_want_gdr_support 0.
  Hostname:                             %s
  cuPointerGetAttribute return value:   %d
Check the cuda.h file for what the return value means.
[cuPointerSetAttribute failed]
The call to cuPointerSetAttribute with CU_POINTER_ATTRIBUTE_SYNC_MEMOPS
failed. This is highly unusual and should not happen.  The program will
continue, but report this error to the Open MPI developers.
  Hostname:                             %s
  cuPointerSetAttribute return value:   %d
  Address:                              %p
Check the cuda.h file for what the return value means.
#
[cuStreamSynchronize failed]
The call to cuStreamSynchronize failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  Hostname:                             %s
  cuStreamSynchronize return value:     %d
Check the cuda.h file for what the return value means.
#
[cuMemcpy failed]
The call to cuMemcpy failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  Hostname:                  %s
  cuMemcpy return value:     %d
Check the cuda.h file for what the return value means.
#
[cuMemcpy2D failed]
The call to cuMemcpy2D failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  Hostname:                  %s
  cuMemcpy2D return value:   %d
Check the cuda.h file for what the return value means.
#
[cuMemAlloc failed]
The call to cuMemAlloc failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  Hostname:                  %s
  cuMemAlloc return value:   %d
Check the cuda.h file for what the return value means.
#
[cuMemFree failed]
The call to cuMemFree failed. This is highly unusual and should
not happen.  Please report this error to the Open MPI developers.
  Hostname:                  %s
  cuMemFree return value:    %d
Check the cuda.h file for what the return value means.
#
[cuDeviceCanAccessPeer failed]
The call to cuDeviceCanAccessPeer failed.
  Hostname:                              %s
  cuDeviceCanAccessPeer return value:    %d
Check the cuda.h file for what the return value means.
#
[No memory]
A call to allocate memory within the CUDA support failed.  This is
an unrecoverable error and will cause the program to abort.
  Hostname:  %s