1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
# Reproducible PASCAL VOC2012 training with PyTorch-Ignite
In this example, we provide script and tools to perform reproducible experiments on training neural networks on PASCAL VOC2012
dataset.
Features:
- Distributed training with native automatic mixed precision
- Experiments tracking with [ClearML](https://github.com/allegroai/clearml)
Experiment | Model | Dataset | Val Avg IoU | ClearML Link
---|---|---|---|---
configs/baseline_dplv3_resnet101.py | DeepLabV3 Resnet101 | VOC Only | 0.659161 | [link](https://app.clear.ml/projects/0e9a3a92d3134283b7d5572d516d60c5/experiments/a7254f084a9e47ca9380dfd739f89520/output/execution)
configs/baseline_dplv3_resnet101_sbd.py | DeepLabV3 Resnet101 | VOC+SBD | 0.6853087 | [link](https://app.clear.ml/projects/0e9a3a92d3134283b7d5572d516d60c5/experiments/dc4cee3377a74d19bc2d0e0e4d638c1f/output/execution)
## Setup
```
pip install -r requirements.txt
```
### Docker
For docker users, you can use the following images to run the example:
```bash
docker pull pytorchignite/vision:latest
```
or
```bash
docker pull pytorchignite/hvd-vision:latest
```
and install other requirements as suggested above
### Using Horovod as distributed framework
We do not add `horovod` as a requirement into `requirements.txt`. Please, install it manually following the official guides or
use `pytorchignite/hvd-vision:latest` docker image.
### (Optional) Download Pascal VOC2012 and SDB datasets
Download and extract the datasets:
```bash
python main.py download /path/to/datasets
```
This script will download and extract the following datasets into `/path/to/datasets`
- The [Pascal VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) dataset
- Optionally, the [SBD](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz) evaluation dataset
## Usage
Please, export the `DATASET_PATH` environment variable for the Pascal VOC2012 dataset.
```bash
export DATASET_PATH=/path/to/pascal_voc2012
# e.g. export DATASET_PATH=/data/ where VOCdevkit is located
```
Optionally, if using SBD dataset, export the `SBD_DATASET_PATH` environment variable:
```bash
export SBD_DATASET_PATH=/path/to/SBD/
# e.g. export SBD_DATASET_PATH=/data/SBD/ where "cls img inst train.txt train_noval.txt val.txt" are located
```
### Training
#### Single GPU
- Adjust batch size for your GPU type in the configuration file: `configs/baseline_dplv3_resnet101_sbd.py` or `configs/baseline_dplv3_resnet101.py`
Run the following command:
```bash
CUDA_VISIBLE_DEVICES=0 python -u main.py training configs/baseline_dplv3_resnet101_sbd.py
# or without SBD
# CUDA_VISIBLE_DEVICES=0 python -u main.py training configs/baseline_dplv3_resnet101.py
```
#### Multiple GPUs
- Adjust total batch size for your GPUs in the configuration file: `configs/baseline_dplv3_resnet101_sbd.py` or `configs/baseline_dplv3_resnet101.py`
```bash
torchrun --nproc_per_node=2 main.py training configs/baseline_dplv3_resnet101_sbd.py
# or without SBD
# torchrun --nproc_per_node=2 main.py training configs/baseline_dplv3_resnet101.py
```
#### Using Horovod as distributed framework
- Adjust total batch size for your GPUs in the configuration file: `configs/baseline_dplv3_resnet101_sbd.py` or `configs/baseline_dplv3_resnet101.py`
```bash
horovodrun -np=2 python -u main.py training configs/baseline_dplv3_resnet101_sbd.py --backend="horovod"
# or without SBD
# horovodrun -np=2 python -u main.py training configs/baseline_dplv3_resnet101.py --backend="horovod"
```
### Evaluation
#### Single GPU
```bash
CUDA_VISIBLE_DEVICES=0 python -u main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py
```
#### Multiple GPUs
```bash
torchrun --nproc_per_node=2 main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py
```
#### Using Horovod as distributed framework
```bash
horovodrun -np=2 python -u main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py --backend="horovod"
```
## Acknowledgements
Trainings were done using credits provided by AWS for open-source development via NumFOCUS
and using [trainml.ai](trainml.ai) platform.
|