File: README.md

package info (click to toggle)
pytorch-ignite 0.5.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 11,712 kB
  • sloc: python: 46,874; sh: 376; makefile: 27
file content (155 lines) | stat: -rw-r--r-- 4,396 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# Basic MNIST Example with Ignite

ported from [pytorch-examples](https://github.com/pytorch/examples/tree/master/mnist)

#### Minimal requirements:

- [torchvision](https://github.com/pytorch/vision/): `pip install torchvision`
- [tqdm](https://github.com/tqdm/tqdm/): `pip install tqdm`

#### Usage:

Run the example:

```bash
python mnist.py
```

Same example with logging using TQDM progress bar


```bash
python mnist_with_tqdm_logger.py
```

### Logging with Tensorboard

MNIST example with training and validation monitoring using Tensorboard

#### Additional requirements:

- Tensorboard: `pip install tensorboard`


Run the example:

```bash
python mnist_with_tensorboard.py --log_dir=/tmp/tensorboard_logs
```

Start tensorboard:

```bash
tensorboard --logdir=/tmp/tensorboard_logs/
```

### Logging with Visdom

MNIST example with training and validation monitoring using Visdom

#### Additional requirements:

- [Visdom](https://github.com/facebookresearch/visdom): `pip install visdom`

#### Usage:

Start visdom:

```bash
python -m visdom.server
```

Run the example:

```bash
python mnist_with_visdom.py
```

### Logging with ClearML

#### Additional requirements:

- [ClearML python client](https://clear.ml/docs/latest/docs/): `pip install clearml`

#### Usage:

```bash
python mnist_with_clearml_logger.py
```

### Training save & resume

Example shows how to save a checkpoint of the trainer, model, optimizer, lr scheduler.
User can resume the training from stored latest checkpoint. In addition, training crash can be emulated.

We provided an option `--deterministic` which setups a deterministic trainer as
[`DeterministicEngine`](https://pytorch.org/ignite/engine.html#ignite.engine.deterministic.DeterministicEngine).
Trainer performs dataflow synchronization on epoch in order to ensure the same dataflow when training is resumed.
Please, see the documentation for more details.

#### Requirements:

- [torchvision](https://github.com/pytorch/vision/): `pip install torchvision`
- [tqdm](https://github.com/tqdm/tqdm/): `pip install tqdm`
- [TensorboardX](https://github.com/lanpa/tensorboard-pytorch): `pip install tensorboardX`
- Tensorboard: `pip install tensorboard`

#### Usage:

Training

```bash
python mnist_save_resume_engine.py --log_dir=logs/run_1 --epochs=10
# or same in deterministic mode
python mnist_save_resume_engine.py --log_dir=logs-det/run_1 --deterministic --epochs=10
```

Resume the training

```bash
python mnist_save_resume_engine.py --log_dir=logs/run_2 --resume_from=logs/run_1/checkpoint_5628.pt --epochs=10
# or same in deterministic mode
python mnist_save_resume_engine.py --log_dir=logs-det/run_2 --resume_from=logs-det/run_1/checkpoint_5628.pt --deterministic --epochs=10
```

Start tensorboard:

```bash
tensorboard --logdir=.
```

The script logs batch stats (mean/std of images, median of targets), model weights' norms and computed gradients norms in
`run.log` and `resume_run.log` to compare training behaviour in both cases.
If set `--deterministic` option, we can observe the same values after resuming the training.

| Non-deterministic                 | Deterministic                         |
| --------------------------------- | ------------------------------------- |
| ![img11](assets/logs_run_1_2.png) | ![img12](assets/logs-det_run_1_2.png) |

Deterministic `run.log` vs `resume_run.log`
![img13](assets/run_vs_resume_run_logs_1_2.png)

#### Usage with simulated crash

Initial training with a crash

```bash
python mnist_save_resume_engine.py --crash_iteration 5700 --log_dir=logs/run_3_crash --epochs 10
# or same in deterministic mode
python mnist_save_resume_engine.py --crash_iteration 5700 --log_dir=logs-det/run_3_crash --epochs 10 --deterministic
```

Resume from the latest checkpoint

```bash
python mnist_save_resume_engine.py --resume_from logs/run_3_crash/checkpoint_6.pt --log_dir=logs/run_4 --epochs 10
# or same in deterministic mode
python mnist_save_resume_engine.py --resume_from logs-det/run_3_crash/checkpoint_6.pt --log_dir=logs-det/run_4 --epochs 10 --deterministic
```

| Non-deterministic                 | Deterministic                         |
| --------------------------------- | ------------------------------------- |
| ![img21](assets/logs_run_3_4.png) | ![img22](assets/logs-det_run_3_4.png) |

Deterministic `run.log` vs `resume_run.log`
![img23](assets/run_vs_resume_run_logs_3_4.png)