1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
|
<!---
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# In this folder we showcase various full examples using 🤗 Accelerate
## Simple NLP example
The [nlp_example.py](./nlp_example.py) script is a simple example to train a Bert model on a classification task ([GLUE's MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398)).
Prior to running it you should install 🤗 Dataset and 🤗 Transformers:
```bash
pip install datasets evaluate transformers
```
The same script can be run in any of the following configurations:
- single CPU or single GPU
- multi CPUs
- multi GPUs (using PyTorch distributed mode)
- (multi) TPUs
- fp16 (mixed-precision) or fp32 (normal precision)
To run it in each of these various modes, use the following commands:
- single CPU:
* from a server without GPU
```bash
python ./nlp_example.py
```
* from any server by passing `cpu=True` to the `Accelerator`.
```bash
python ./nlp_example.py --cpu
```
* from any server with Accelerate launcher
```bash
accelerate launch --cpu ./nlp_example.py
```
- single GPU:
```bash
python ./nlp_example.py # from a server with a GPU
```
- with fp16 (mixed-precision)
* from any server by passing `mixed_precison=fp16` to the `Accelerator`.
```bash
python ./nlp_example.py --mixed_precision fp16
```
* from any server with Accelerate launcher
```bash
accelerate launch --mixed_precision fp16 ./nlp_example.py
- multi CPUs (requires Open MPI, Intel MPI, or MVAPICH)
* With Accelerate config and launcher, execute the following from node 0:
```bash
accelerate config # Select to have accelerate launch mpirun
accelerate launch ./nlp_example.py # This will run the script on each server
```
* With Intel MPI:
```bash
export CCL_WORKER_COUNT=1
export MASTER_ADDR=xxx.xxx.xxx.xxx #node0 ip
mpirun -f hostfile -n 16 -ppn 4 python ./nlp_example.py
```
- multi GPUs (using PyTorch distributed mode)
* With Accelerate config and launcher
```bash
accelerate config # This will create a config file on your server
accelerate launch ./nlp_example.py # This will run the script on your server
```
* With traditional PyTorch launcher (`python -m torch.distributed.run` can be used instead of `torchrun`)
```bash
torchrun --nproc_per_node 2 ./nlp_example.py
```
- multi GPUs, multi node (several machines, using PyTorch distributed mode)
* With Accelerate config and launcher, on each machine:
```bash
accelerate config # This will create a config file on each server
accelerate launch ./nlp_example.py # This will run the script on each server
```
* With PyTorch launcher only (`python -m torch.distributed.run` can be used instead of `torchrun`). Run this command on each node:
```bash
torchrun \ # python -m torch.distributed.run
--nproc_per_node 2 \
--nnodes 2 \
--rdzv_id 2299 \ # A unique job id
--rdzv_backend c10d \
--rdzv_endpoint master_node_ip_address:29500 \
./nlp_example.py
```
- (multi) TPUs
* With Accelerate config and launcher
```bash
accelerate config # This will create a config file on your TPU server
accelerate launch ./nlp_example.py # This will run the script on each server
```
* In PyTorch:
Add an `xmp.spawn` line in your script as you usually do.
## Simple vision example
The [cv_example.py](./cv_example.py) script is a simple example to fine-tune a ResNet-50 on a classification task ([Ofxord-IIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/)).
The same script can be run in any of the following configurations:
- single CPU or single GPU
- multi CPUs
- multi GPUs (using PyTorch distributed mode)
- (multi) TPUs
- fp16 (mixed-precision) or fp32 (normal precision)
Prior to running it you should install timm and torchvision:
```bash
pip install timm torchvision
```
and you should download the data with the following commands:
```bash
wget https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
tar -xzf images.tar.gz
```
To run it in each of these various modes, use the following commands:
- single CPU:
* from a server without GPU
```bash
python ./cv_example.py --data_dir path_to_data
```
* from any server by passing `cpu=True` to the `Accelerator`.
```bash
python ./cv_example.py --data_dir path_to_data --cpu
```
* from any server with Accelerate launcher
```bash
accelerate launch --cpu ./cv_example.py --data_dir path_to_data
```
- single GPU:
```bash
python ./cv_example.py # from a server with a GPU
```
- with fp16 (mixed-precision)
* from any server by passing `mixed_precison=fp16` to the `Accelerator`.
```bash
python ./cv_example.py --data_dir path_to_data --mixed_precison fp16
```
* from any server with Accelerate launcher
```bash
accelerate launch --mixed_precison fp16 ./cv_example.py --data_dir path_to_data
- multi CPUs (requires Open MPI, Intel MPI, or MVAPICH)
* With Accelerate config and launcher, run the following from node 0:
```bash
accelerate config --config_file config.yaml # Select to have accelerate launch mpirun
accelerate launch ./cv_example.py --data_dir path_to_data # This will run the script on each server
```
* With Intel MPI, execute mpirun from node 0:
```bash
export CCL_WORKER_COUNT=1
export MASTER_ADDR=xxx.xxx.xxx.xxx #node0 ip
mpirun -f hostfile -n 16 -ppn 4 python ./cv_example.py --data_dir path_to_data
```
- multi GPUs (using PyTorch distributed mode)
* With Accelerate config and launcher
```bash
accelerate config --config_file config.yaml # This will create a config file on your server to `config.yaml`
accelerate launch --config_file config.yaml ./cv_example.py --data_dir path_to_data # This will run the script on your server
```
* With traditional PyTorch launcher (`python -m torch.distributed.run` can be used instead of `torchrun`)
```bash
torchrun --nproc_per_node 2 ./cv_example.py --data_dir path_to_data
```
- multi GPUs, multi node (several machines, using PyTorch distributed mode)
* With Accelerate config and launcher, on each machine:
```bash
accelerate config --config_file config.yaml # This will create a config file on your server to `config.yaml`
accelerate launch --config_file config.yaml ./cv_example.py --data_dir path_to_data # This will run the script on each server
```
* With PyTorch launcher only (`python -m torch.distributed.run` can be used instead of `torchrun`). Run this command on each node:
```bash
torchrun \ # python -m torch.distributed.run
--nproc_per_node 2 \
--nnodes 2 \
--rdzv_id 2299 \ # A unique job id
--rdzv_backend c10d \
--rdzv_endpoint master_node_ip_address:29500 \
./cv_example.py --data_dir path_to_data
```
- (multi) TPUs
* With Accelerate config and launcher
```bash
accelerate config --config_file config.yaml # This will create a config file on your server to `config.yaml`
accelerate launch --config_file config.yaml ./cv_example.py --data_dir path_to_data # This will run the script on each server
```
* In PyTorch:
Add an `xmp.spawn` line in your script as you usually do.
### Simple vision example (GANs)
- [huggan project](https://github.com/huggingface/community-events/tree/main/huggan)
### Using AWS SageMaker integration
- [Examples showcasing AWS SageMaker integration of 🤗 Accelerate.](https://github.com/pacman100/accelerate-aws-sagemaker)
## Configuration zoo
In [/config_yaml_templates](./config_yaml_templates/) we have a variety of *minimal* `config.yaml` templates and examples to help you learn
how to create your own configuration files depending on the scenario.
## SLURM Scripts
In [/slurm/submit_multigpu.sh](./slurm/submit_multigpu.sh) and [/slurm/submit_multinode.sh](./slurm/submit_multinode.sh) we present two scripts for running the examples on a machine with [SLURM](https://slurm.schedmd.com/documentation.html) workload manager.
In [/slurm/submit_multigpu.sh](./slurm/submit_multigpu.sh) the only parameter in the launcher that needs to be modified is `--num_processes`, which determines the number of GPUs we will use. In this case, using the environment variable `$SLURM_GPUS`, we indicate that we want to utilize all the GPUs available on the node we have requested.
In [/slurm/submit_multinode.sh](./slurm/submit_multinode.sh) we must specify the number of nodes that will be part of the training (`--num_machines`), how many GPUs we will use in total (`--num_processes`), the [`backend`](https://pytorch.org/docs/stable/elastic/run.html#note-on-rendezvous-backend), `--main_process_ip` which will be the address the master node and the `--main_process_port`.
In [/slurm/submit_multicpu.sh](./slurm/submit_multicpu.sh) we must specify the number of nodes that will be part of the training (`--num_machines`), how many CPU processes we will use in total (`--num_processes`), the [`backend`](https://pytorch.org/docs/stable/elastic/run.html#note-on-rendezvous-backend), `--main_process_ip` which will be the address the master node and the `--main_process_port`. `mpirun_hostfile` specifies to run the job using MPIRun.
In both scripts, we run `activateEnvironment.sh` at the beginning. This script should contain the necessary instructions to initialize the environment for execution. Below, we show an example that loads the necessary libraries ([Environment modules](https://github.com/cea-hpc/modules)), activates the Python environment, and sets up various environment variables, most of them to run the scripts in offline mode in case we don't have internet connection from the cluster.
```bash
# activateEnvironment.sh
module purge
module load anaconda3/2020.02 cuda/10.2 cudnn/8.0.5 nccl/2.9.9 arrow/7.0.0 openmpi
source activate /home/nct01/nct01328/pytorch_antoni_local
export HF_HOME=/gpfs/projects/nct01/nct01328/
export HF_LOCAL_HOME=/gpfs/projects/nct01/nct01328/HF_LOCAL
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
export PYTHONPATH=/home/nct01/nct01328/transformers-in-supercomputers:$PYTHONPATH
export GPUS_PER_NODE=4
```
## Simple Multi-GPU Hardware Launcher (using an external platform)
[multigpu_remote_launcher.py](./multigpu_remote_launcher.py) is a minimal script that demonstrates launching accelerate
on multiple remote GPUs, and with automatic hardware environment and dependency setup for reproducibility. You can
easily customize the training function used, training arguments, hyperparameters, and type of compute hardware, and then
run the script to automatically launch multi GPU training on remote hardware.
This script uses [Runhouse](https://github.com/run-house/runhouse) to launch on self-hosted hardware (e.g. in your own
cloud account or on-premise cluster) but there are other options for running remotely as well. Runhouse can be installed
with `pip install runhouse`, and you can refer to
[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/latest/api/python/cluster.html#hardware-setup)
for hardware setup instructions, or this
[Colab tutorial](https://colab.research.google.com/drive/1qVwYyLTCPYPSdz9ZX7BZl9Qm0A3j7RJe) for a more in-depth walkthrough.
## Finer Examples
While the first two scripts are extremely barebones when it comes to what you can do with accelerate, more advanced features are documented in two other locations.
### `by_feature` examples
These scripts are *individual* examples highlighting one particular feature or use-case within Accelerate. They all stem from the [nlp_example.py](./nlp_example.py) script, and any changes or modifications is denoted with a `# New Code #` comment.
Read the README.md file located in the `by_feature` folder for more information.
### `complete_*` examples
These two scripts contain *every* single feature currently available in Accelerate in one place, as one giant script.
New arguments that can be passed include:
- `checkpointing_steps`, whether the various states should be saved at the end of every `n` steps, or `"epoch"` for each epoch. States are then saved to folders named `step_{n}` or `epoch_{n}`
- `resume_from_checkpoint`, should be used if you want to resume training off of a previous call to the script and passed a `checkpointing_steps` to it.
- `with_tracking`, should be used if you want to log the training run using all available experiment trackers in your environment. Currently supported trackers include TensorBoard, Weights and Biases, and CometML.
|