# Benchmark mixed precision training on Cifar100

In this notebook we will benchmark 1) native PyTorch mixed precision module [`torch.cuda.amp`](https://pytorch.org/docs/master/amp.html) and 2) NVidia/Apex package.

We will train Wide-ResNet model on Cifar100 dataset using Turing enabled GPU and compare training times.

**TL;DR**

The ranking is the following:
- 1st place: Nvidia/Apex "O2"
- 2nd place: `torch.cuda.amp`: autocast and scaler
- 3rd place: Nvidia/Apex "O1"
- 4th place: fp32

According to @mcarilli: "Native amp is more like a faster, better integrated, locally enabled O1"

## Installations and setup

1) Recently added [`torch.cuda.amp`](https://pytorch.org/docs/master/notes/amp_examples.html#working-with-multiple-models-losses-and-optimizers) module to perform automatic mixed precision training instead of using Nvidia/Apex package is available in PyTorch >=1.6.0.

In this example we only need `pynvml` and `fire` packages, assuming that `torch` and `ignite` are already installed. We can install it using pip:

In [None]:
!pip install pytorch-ignite pynvml fire

2) Let's install Nvidia/Apex package:

In [None]:
# Install Apex:
# If torch cuda version and nvcc version match:
!pip install --upgrade --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex/
# if above command is failing, please install apex without c++/cuda extensions:
# !pip install --upgrade --no-cache-dir git+https://github.com/NVIDIA/apex/

In [None]:
import torch
import torchvision
import ignite
torch.__version__, torchvision.__version__, ignite.__version__

3) The scripts we will execute are located in `ignite/examples/contrib/cifar100_amp_benchmark` of github repository. Let's clone the repository and setup PYTHONPATH to execute benchmark scripts:

In [None]:
!git clone https://github.com/pytorch/ignite.git /tmp/ignite
scriptspath="/tmp/ignite/examples/cifar100_amp_benchmark/"
setup=f"cd {scriptspath} && export PYTHONPATH=$PWD:$PYTHONPATH"

4) Download dataset

In [None]:
from torchvision.datasets.cifar import CIFAR100
CIFAR100(root="/tmp/cifar100/", train=True, download=True)

## Training in fp32

In [None]:
!{setup} && python benchmark_fp32.py /tmp/cifar100/ --batch_size=256 --max_epochs=20

## Training with `torch.cuda.amp`

In [None]:
!{setup} && python benchmark_torch_cuda_amp.py /tmp/cifar100/ --batch_size=256 --max_epochs=20

## Training with `Nvidia/apex`


- we check 2 optimization levels: "O1" and "O2"
    - "O1" optimization level: automatic casts arount Pytorch functions and tensor methods
    - "O2" optimization level: fp16 training with fp32 batchnorm and fp32 master weights

In [None]:
!{setup} && python benchmark_nvidia_apex.py /tmp/cifar100/ --batch_size=256 --max_epochs=20 --opt="O1"

In [None]:
!{setup} && python benchmark_nvidia_apex.py /tmp/cifar100/ --batch_size=256 --max_epochs=20 --opt="O2"