1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
|
External Hardware Documentation and Resources
=============================================
Information about hardware behavior comes from a mix of official and
reverse-engineered sources.
Command buffers
^^^^^^^^^^^^^^^
* `NVIDIA open-gpu-doc repository`_ is official documentation from NVIDIA that
has been released to the public. The majority of this documentation comes in
the form of class headers which describe the class state registers.
* `NVIDIA open-gpu-kernel-modules repository`_ is the open-source kernel mode
driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide
examples of how to use some hardware features. If open-gpu-doc is missing a
class header, sometimes there will be one here.
* Reverse-engineered command names from `envytools`_ are available in mesa
under eg. ``src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h``. These are no
longer updated. nvk instead uses the open-gpu-doc headers
* `envyhooks`_ is the modern way to dump command sequences from the proprietary
driver
* ``nv_push_dump`` is part of mesa and can disassemble command sequences (build
with ``-D tools=nouveau``, run ``src/nouveau/headers/nv_push_dump`` from the
build dir)
.. _NVIDIA open-gpu-doc repository: https://github.com/NVIDIA/open-gpu-doc
.. _NVIDIA open-gpu-kernel-modules repository: https://github.com/NVIDIA/open-gpu-kernel-modules
.. _envyhooks: https://gitlab.freedesktop.org/nouveau/envyhooks
Shader ISA
^^^^^^^^^^
* `NVIDIA PTX documentation`_ is NVIDIA documentation for CUDA's
intermediate representation. We don't use PTX directly, but this often has
hints about how underlying hardware instructions work. For example, the PTX
`redux` instruction is pretty much identical to the hardware instruction of
the same name.
* `CUDA Binary Utilities`_ is documentation for CUDA's disassembler,
`nvdisasm`. It includes a brief description of most hardware instructions.
There's also an `older version`_ that has older architectures (Kepler through
Volta).
* Kuter Dinel has reverse-engineered instruction encodings for the `Hopper
ISA`_ and `Ada ISA`_ which are autogenerated from his `nv_isa_solver`_
project.
* `nv-shader-tools`_ has some additional tools for disassembling and fuzzing
the hardware ISA
* Mel has dumped a `list of avaiable instructions`_ and their opcodes on recent
architectures by scraping nvdisasm error messages.
* The `Volta whitepaper`_ section "Independent Thread Scheduling" has an
overview of the control flow model used on Volta+ GPUs.
* `Dissecting the NVidia Turing T4 GPU via Microbenchmarking`_ has
reverse-engineered info about the Turing instruction encoding. See especially
section "2.1 Control information" for an overview of compiler-inserted delays
and waits on Maxwell and later.
* `Analyzing Modern NVIDIA GPU cores`_ has additional reverse-engineered info
about the semantics of compiler-inserted delays and waits.
* `Control Flow Management in Modern GPUs`_ has more detail about control flow
reconvergence on Volta+
* `maxas`_ has some reverse-engineered info on the Maxwell ISA
* `asfermi`_ has some reverse-engineered info on the older Fermi ISA
* Red Hat has some NDA'd documentation on instruction latencies from NVIDIA.
Bother karolherbst or airlied on irc if you're missing a latency class for an
instruction on recent architectures.
* Behavior of instructions are tested using the hardware tests in
``src/nouveau/compiler/nak/hw_tests.rs`` and the corresponding ``Foldable``
implementations in ``src/nouveau/compiler/nak/ir.rs`` (build with ``-D
build-tests=true`` and run ``src/nouveau/compiler/nak hw_tests`` from the
build dir)
* NAK's instruction encodings are tested against nvdisasm using
``src/nouveau/compiler/nak/nvdisasm_tests.rs`` (build with ``-D
build-tests=true`` and run ``src/nouveau/compiler/nak nvdisasm_tests`` from
the build dir)
* The old GL driver's compiler, under ``src/gallium/drivers/nouveau/codegen``,
has some information. This is especially useful for graphics-only
instructions, which are often not covered by other sources.
* `Compiler explorer`_ is a convenient tool to see what assembly NVIDIA
generates for a given CUDA program.
.. _NVIDIA PTX documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
.. _CUDA Binary Utilities: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-reference
.. _older version: https://docs.nvidia.com/cuda/archive/11.8.0/cuda-binary-utilities/index.html#instruction-set-ref
.. _Hopper ISA: https://kuterdinel.com/nv_isa/
.. _Ada ISA: https://kuterdinel.com/nv_isa_sm89/
.. _nv_isa_solver: https://github.com/kuterd/nv_isa_solver
.. _nv-shader-tools: https://gitlab.freedesktop.org/nouveau/nv-shader-tools
.. _list of avaiable instructions: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/opclass?ref_type=heads
.. _Volta whitepaper: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
.. _Dissecting the NVidia Turing T4 GPU via Microbenchmarking: https://arxiv.org/pdf/1903.07486
.. _Analyzing Modern NVIDIA GPU cores: https://arxiv.org/pdf/2503.20481
.. _Control Flow Management in Modern GPUs: https://arxiv.org/pdf/2407.02944
.. _maxas: https://github.com/NervanaSystems/maxas/wiki
.. _asfermi: https://github.com/hyqneuron/asfermi/wiki
.. _Compiler explorer: https://godbolt.org/z/1jrfhq5G7
Misc
^^^^
* `envytools`_ has reverse-engineered documentation for maxwell and earlier
hardware.
* The nvidia architecture whitepapers give a basic overview of what has changed
between hardware revisions. See eg. the `Blackwell whitepaper`_
* The nvidia architecture tuning guides often mention how details of a hardware
generation has changed, often with information about the memory subsystem or
occupancy. See eg. the `Blackwell tuning guide`_
* `The Nouveau wiki's CodeNames page`_ is useful for mapping NVIDIA marketing
names to engineering names
* `Matching CUDA arch and CUDA gencode for various NVIDIA architectures`_ has a
useful table comparing SM versions to engineering names
.. _envytools: https://envytools.readthedocs.io/en/latest/hw/index.html
.. _Blackwell whitepaper: https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf
.. _Blackwell tuning guide: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html
.. _The Nouveau wiki's CodeNames page: https://nouveau.freedesktop.org/CodeNames.html
.. _Matching CUDA arch and CUDA gencode for various NVIDIA architectures: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
|