1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
|
# Basic MNIST Example with Ignite
ported from [pytorch-examples](https://github.com/pytorch/examples/tree/master/mnist)
#### Minimal requirements:
- [torchvision](https://github.com/pytorch/vision/): `pip install torchvision`
- [tqdm](https://github.com/tqdm/tqdm/): `pip install tqdm`
#### Usage:
Run the example:
```bash
python mnist.py
```
Same example with logging using TQDM progress bar
```bash
python mnist_with_tqdm_logger.py
```
### Logging with Tensorboard
MNIST example with training and validation monitoring using Tensorboard
#### Additional requirements:
- Tensorboard: `pip install tensorboard`
Run the example:
```bash
python mnist_with_tensorboard.py --log_dir=/tmp/tensorboard_logs
```
Start tensorboard:
```bash
tensorboard --logdir=/tmp/tensorboard_logs/
```
### Logging with Visdom
MNIST example with training and validation monitoring using Visdom
#### Additional requirements:
- [Visdom](https://github.com/facebookresearch/visdom): `pip install visdom`
#### Usage:
Start visdom:
```bash
python -m visdom.server
```
Run the example:
```bash
python mnist_with_visdom.py
```
### Logging with ClearML
#### Additional requirements:
- [ClearML python client](https://clear.ml/docs/latest/docs/): `pip install clearml`
#### Usage:
```bash
python mnist_with_clearml_logger.py
```
### Training save & resume
Example shows how to save a checkpoint of the trainer, model, optimizer, lr scheduler.
User can resume the training from stored latest checkpoint. In addition, training crash can be emulated.
We provided an option `--deterministic` which setups a deterministic trainer as
[`DeterministicEngine`](https://pytorch.org/ignite/engine.html#ignite.engine.deterministic.DeterministicEngine).
Trainer performs dataflow synchronization on epoch in order to ensure the same dataflow when training is resumed.
Please, see the documentation for more details.
#### Requirements:
- [torchvision](https://github.com/pytorch/vision/): `pip install torchvision`
- [tqdm](https://github.com/tqdm/tqdm/): `pip install tqdm`
- [TensorboardX](https://github.com/lanpa/tensorboard-pytorch): `pip install tensorboardX`
- Tensorboard: `pip install tensorboard`
#### Usage:
Training
```bash
python mnist_save_resume_engine.py --log_dir=logs/run_1 --epochs=10
# or same in deterministic mode
python mnist_save_resume_engine.py --log_dir=logs-det/run_1 --deterministic --epochs=10
```
Resume the training
```bash
python mnist_save_resume_engine.py --log_dir=logs/run_2 --resume_from=logs/run_1/checkpoint_5628.pt --epochs=10
# or same in deterministic mode
python mnist_save_resume_engine.py --log_dir=logs-det/run_2 --resume_from=logs-det/run_1/checkpoint_5628.pt --deterministic --epochs=10
```
Start tensorboard:
```bash
tensorboard --logdir=.
```
The script logs batch stats (mean/std of images, median of targets), model weights' norms and computed gradients norms in
`run.log` and `resume_run.log` to compare training behaviour in both cases.
If set `--deterministic` option, we can observe the same values after resuming the training.
| Non-deterministic | Deterministic |
| --------------------------------- | ------------------------------------- |
|  |  |
Deterministic `run.log` vs `resume_run.log`

#### Usage with simulated crash
Initial training with a crash
```bash
python mnist_save_resume_engine.py --crash_iteration 5700 --log_dir=logs/run_3_crash --epochs 10
# or same in deterministic mode
python mnist_save_resume_engine.py --crash_iteration 5700 --log_dir=logs-det/run_3_crash --epochs 10 --deterministic
```
Resume from the latest checkpoint
```bash
python mnist_save_resume_engine.py --resume_from logs/run_3_crash/checkpoint_6.pt --log_dir=logs/run_4 --epochs 10
# or same in deterministic mode
python mnist_save_resume_engine.py --resume_from logs-det/run_3_crash/checkpoint_6.pt --log_dir=logs-det/run_4 --epochs 10 --deterministic
```
| Non-deterministic | Deterministic |
| --------------------------------- | ------------------------------------- |
|  |  |
Deterministic `run.log` vs `resume_run.log`

|