1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
|
# Examples for Distributed Training
## Examples with NVIDIA GPUs
| Example | Scalability | Description |
| ---------------------------------------------------------------------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`distributed_batching.py`](./distributed_batching.py) | single-node | Example for training GNNs on multiple graphs. |
| [`distributed_sampling.py`](./distributed_sampling.py) | single-node | Example for training GNNs on a homogeneous graph with neighbor sampling. |
| [`distributed_sampling_multinode.py`](./distributed_sampling_multinode.py) | multi-node | Example for training GNNs on a homogeneous graph with neighbor sampling on multiple nodes. |
| [`distributed_sampling_multinode.sbatch`](./distributed_sampling_multinode.sbatch) | multi-node | Example for submitting a training job to a Slurm cluster using [`distributed_sampling_multi_node.py`](./distributed_sampling_multinode.py). |
| [`papers100m_gcn.py`](./papers100m_gcn.py) | single-node | Example for training GNNs on the `ogbn-papers100M` homogeneous graph w/ ~1.6B edges. |
| [`papers100m_gcn_cugraph.py`](./papers100m_gcn_cugraph.py) | single-node | Example for training GNNs on `ogbn-papers100M` using [CuGraph](...). |
| [`papers100m_gcn_multinode.py`](./papers100m_gcn_multinode.py) | multi-node | Example for training GNNs on a homogeneous graph on multiple nodes. |
| [`papers100m_gcn_cugraph_multinode.py`](./papers100m_gcn_cugraph_multinode.py) | multi-node | Example for training GNNs on a homogeneous graph on multiple nodes using [CuGraph](...). |
| [`pcqm4m_ogb.py`](./pcqm4m_ogb.py) | single-node | Example for training GNNs for a graph-level regression task. |
| [`mag240m_graphsage.py`](./mag240m_graphsage.py) | single-node | Example for training GNNs on a large heterogeneous graph. |
| [`taobao.py`](./taobao.py) | single-node | Example for training link prediction GNNs on a heterogeneous graph. |
| [`model_parallel.py`](./model_parallel.py) | single-node | Example for model parallelism by manually placing layers on each GPU. |
| [`data_parallel.py`](./data_parallel.py) | single-node | Example for training GNNs on multiple graphs. Note that [`torch_geometric.nn.DataParallel`](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.data_parallel.DataParallel) is deprecated and [discouraged](https://github.com/pytorch/pytorch/issues/65936). |
## Examples with Intel GPUs (XPUs)
| Example | Scalability | Description |
| -------------------------------------------------------------- | ---------------------- | ------------------------------------------------------------------------ |
| [`distributed_sampling_xpu.py`](./distributed_sampling_xpu.py) | single-node, multi-gpu | Example for training GNNs on a homogeneous graph with neighbor sampling. |
|