1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
|
# Gloo documentation
Documentation is split by domain. This file contains a general
overview of these domains and how they interact.
## Index
* [Overview](readme.md) -- this file
* [Rendezvous](rendezvous.md) -- creating a `gloo::Context`
* [Algorithms](algorithms.md) -- index of collective algorithms
and their semantics and complexity
* [Transport details](transport.md) -- the transport API and its
implementations
* [CUDA integration](cuda.md) -- integration of CUDA aware Gloo
algorithms with existing CUDA code
* [Latency optimization](latency.md) -- number of tips and tricks to
improve performance
## Overview
Gloo algorithms are collective algorithms, meaning they can run in
parallel across two or more processes/machines. To be able to execute
across multiple machines, they first need to find each other. We call
this _rendezvous_ and it is the first thing to address when
integrating Gloo into your code base.
See [`rendezvous.md`](./rendezvous.md) for more information.
Once rendezvous completes, participating machines have setup
connections to one another, either in a full mesh (every machine has a
bidirectional communication channel to every other machine), or some
subset. The required connectivity between machines depends on the type
of algorithm that is used. For example, a ring algorithm only needs
communication channels to a machine's neighbors.
Every participating process knows about the number of participating
processes, and its _rank_ (or 0-based index) within the list of
participating processes. This state, as well as the state needed to
store the persistent communication channels, is stored in a
`gloo::Context` class. Gloo does not maintain global state or
thread-local state. This means that you can setup as many contexts as
needed, and introduce as much parallelism as needed by your
application.
## Anything else?
If you find particular documentation is missing, please consider
[contributing](../CONTRIBUTING.md).
|