File: ClangNVLinkWrapper.rst

package info (click to toggle)
llvm-toolchain-20 1%3A20.1.6-1~exp1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 2,111,304 kB
  • sloc: cpp: 7,438,677; ansic: 1,393,822; asm: 1,012,926; python: 241,650; f90: 86,635; objc: 75,479; lisp: 42,144; pascal: 17,286; sh: 10,027; ml: 5,082; perl: 4,730; awk: 3,523; makefile: 3,349; javascript: 2,251; xml: 892; fortran: 672
file content (74 lines) | stat: -rw-r--r-- 3,097 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
====================
Clang nvlink Wrapper
====================

.. contents::
   :local:

.. _clang-nvlink-wrapper:

Introduction
============

This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
while still relying on NVIDIA's proprietary linker to produce the final output.

``nvlink`` has a number of known quirks that make it difficult to use in a
unified offloading setting. For example, it does not accept ``.o`` files as they
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will
provide a linker error. ``nvlink`` also does not support link time optimization
and ignores many standard linker arguments. This tool works around these issues.

Usage
=====

This tool can be used with the following options. Any arguments not intended
only for the linker wrapper will be forwarded to ``nvlink``.

.. code-block:: console

  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
  This enables static linking and LTO handling for NVPTX targets.

  USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>

  OPTIONS:
    --arch <value>       Specify the 'sm_' name of the target architecture.
    --cuda-path=<dir>    Set the system CUDA path
    --dry-run            Print generated commands without running.
    --feature <value>    Specify the '+ptx' freature to use for LTO.
    -g                   Specify that this was a debug compile.
    -help-hidden         Display all available options
    -help                Display available options (--help-hidden for more)
    -L <dir>             Add <dir> to the library search path
    -l <libname>         Search for library <libname>
    -mllvm <arg>         Arguments passed to LLVM, including Clang invocations,
                         for which the '-mllvm' prefix is preserved. Use '-mllvm
                         --help' for a list of options.
    -o <path>            Path to file to write output
    --plugin-opt=jobs=<value>
                         Number of LTO codegen partitions
    --plugin-opt=lto-partitions=<value>
                         Number of LTO codegen partitions
    --plugin-opt=O<O0, O1, O2, or O3>
                         Optimization level for LTO
    --plugin-opt=thinlto<value>
                         Enable the thin-lto backend
    --plugin-opt=<value> Arguments passed to LLVM, including Clang invocations,
                         for which the '-mllvm' prefix is preserved. Use '-mllvm
                         --help' for a list of options.
    --save-temps         Save intermediate results
    --version            Display the version number and exit
    -v                   Print verbose information

Example
=======

This tool is intended to be invoked when targeting the NVPTX toolchain directly
as a cross-compiling target. This can be used to create standalone GPU
executables with normal linking semantics similar to standard compilation.

.. code-block:: console

  clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c