File: ucx_features.rst

package info (click to toggle)
mpich 4.0.2-3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 423,384 kB
  • sloc: ansic: 1,088,434; cpp: 71,364; javascript: 40,763; f90: 22,829; sh: 17,463; perl: 14,773; xml: 14,418; python: 10,265; makefile: 9,246; fortran: 8,008; java: 4,355; asm: 324; ruby: 176; lisp: 19; php: 8; sed: 4
file content (62 lines) | stat: -rw-r--r-- 1,860 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
.. 
.. Copyright (C) Mellanox Technologies Ltd. 2019.  ALL RIGHTS RESERVED.
..
.. See file LICENSE for terms.
..

.. _ucx_features:

*****************
UCX main features
*****************

High-level API features
***********************
- Select either a client/server connection establishment (similar to TCP), or
  connect directly by passing remote address blob.
- Support sharing resources between threads, or allocating dedicated resources
  per thread.
- Event-driven or polling-driven progress.
- Java and Python bindings.
- Seamless handling of GPU memory.

Main APIs
---------
- Stream-oriented send/receive operations.
- Tag-matched send/receive.
- Remote memory access.
- Remote atomic operations.

Fabrics support
***************
- RoCE
- InfiniBand
- TCP sockets
- Shared memory (CMA, knem, xpmem, SysV, mmap)
- Cray Gemini / Aries (ugni)

Platforms support
*****************
- Supported architectures: x86_64, Arm v8, Power.
- Runs on virtual machines (using SRIOV) and containers (docker, singularity).
- Can utilize either MLNX_OFED or Inbox RDMA drivers.
- Tested on major Linux distributions (RedHat/Ubuntu/SLES).

GPU support
***********
- Cuda (for NVIDIA GPUs)
- ROCm (for AMD GPUs)

Protocols, Optimizations and Advanced Features
**********************************************
- Automatic selection of best transports and devices.
- Zero-copy with registration cache.
- Scalable flow control algorithms.
- Optimized memory pools.
- Accelerated direct-verbs transport for Mellanox devices.
- Pipeline protocols for GPU memory
- QoS and traffic isolation for RDMA transports
- Platform (micro-architecture) specific optimizations (such as memcpy, memory barriers, etc.)
- Multi-rail and RoCE link aggregation group support
- Bare-metal, containers and cloud environments support
- Advanced protocols for transfer messages of different sizes