1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
|
.. _transforms:
Transforming and augmenting images
==================================
.. currentmodule:: torchvision.transforms
Transforms are common image transformations available in the
``torchvision.transforms`` module. They can be chained together using
:class:`Compose`.
Most transform classes have a function equivalent: :ref:`functional
transforms <functional_transforms>` give fine-grained control over the
transformations.
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
images and tensor images, although some transformations are :ref:`PIL-only
<transforms_pil_only>` and some are :ref:`tensor-only
<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
convert to and from PIL images.
The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.
The expected range of the values of a tensor image is implicitly defined by
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.
Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.
The following examples illustrate the use of the available transforms:
* :ref:`sphx_glr_auto_examples_plot_transforms.py`
.. figure:: ../source/auto_examples/images/sphx_glr_plot_transforms_001.png
:align: center
:scale: 65%
* :ref:`sphx_glr_auto_examples_plot_scripted_tensor_transforms.py`
.. figure:: ../source/auto_examples/images/sphx_glr_plot_scripted_tensor_transforms_001.png
:align: center
:scale: 30%
.. warning::
Since v0.8.0 all random transformations are using torch default random generator to sample random parameters.
It is a backward compatibility breaking change and user should set the random state as following:
.. code:: python
# Previous versions
# import random
# random.seed(12)
# Now
import torch
torch.manual_seed(17)
Please, keep in mind that the same seed for torch random generator and Python random generator will not
produce the same results.
Scriptable transforms
---------------------
In order to script the transformations, please use ``torch.nn.Sequential`` instead of :class:`Compose`.
.. code:: python
transforms = torch.nn.Sequential(
transforms.CenterCrop(10),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
)
scripted_transforms = torch.jit.script(transforms)
Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor`` and does not require
`lambda` functions or ``PIL.Image``.
For any custom transformations to be used with ``torch.jit.script``, they should be derived from ``torch.nn.Module``.
Compositions of transforms
--------------------------
.. autosummary::
:toctree: generated/
:template: class.rst
Compose
Transforms on PIL Image and torch.\*Tensor
------------------------------------------
.. autosummary::
:toctree: generated/
:template: class.rst
CenterCrop
ColorJitter
FiveCrop
Grayscale
Pad
RandomAffine
RandomApply
RandomCrop
RandomGrayscale
RandomHorizontalFlip
RandomPerspective
RandomResizedCrop
RandomRotation
RandomVerticalFlip
Resize
TenCrop
GaussianBlur
RandomInvert
RandomPosterize
RandomSolarize
RandomAdjustSharpness
RandomAutocontrast
RandomEqualize
.. _transforms_pil_only:
Transforms on PIL Image only
----------------------------
.. autosummary::
:toctree: generated/
:template: class.rst
RandomChoice
RandomOrder
.. _transforms_tensor_only:
Transforms on torch.\*Tensor only
---------------------------------
.. autosummary::
:toctree: generated/
:template: class.rst
LinearTransformation
Normalize
RandomErasing
ConvertImageDtype
.. _conversion_transforms:
Conversion Transforms
---------------------
.. autosummary::
:toctree: generated/
:template: class.rst
ToPILImage
ToTensor
PILToTensor
Generic Transforms
------------------
.. autosummary::
:toctree: generated/
:template: class.rst
Lambda
Automatic Augmentation Transforms
---------------------------------
`AutoAugment <https://arxiv.org/pdf/1805.09501.pdf>`_ is a common Data Augmentation technique that can improve the accuracy of Image Classification models.
Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that
ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN.
The new transform can be used standalone or mixed-and-matched with existing transforms:
.. autosummary::
:toctree: generated/
:template: class.rst
AutoAugmentPolicy
AutoAugment
RandAugment
TrivialAugmentWide
AugMix
.. _functional_transforms:
Functional Transforms
---------------------
.. currentmodule:: torchvision.transforms.functional
Functional transforms give you fine-grained control of the transformation pipeline.
As opposed to the transformations above, functional transforms don't contain a random number
generator for their parameters.
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.
Example:
you can apply a functional transform with the same parameters to multiple images like this:
.. code:: python
import torchvision.transforms.functional as TF
import random
def my_segmentation_transforms(image, segmentation):
if random.random() > 0.5:
angle = random.randint(-30, 30)
image = TF.rotate(image, angle)
segmentation = TF.rotate(segmentation, angle)
# more transforms ...
return image, segmentation
Example:
you can use a functional transform to build transform classes with custom behavior:
.. code:: python
import torchvision.transforms.functional as TF
import random
class MyRotationTransform:
"""Rotate by one of the given angles."""
def __init__(self, angles):
self.angles = angles
def __call__(self, x):
angle = random.choice(self.angles)
return TF.rotate(x, angle)
rotation_transform = MyRotationTransform(angles=[-30, -15, 0, 15, 30])
.. autosummary::
:toctree: generated/
:template: function.rst
adjust_brightness
adjust_contrast
adjust_gamma
adjust_hue
adjust_saturation
adjust_sharpness
affine
autocontrast
center_crop
convert_image_dtype
crop
equalize
erase
five_crop
gaussian_blur
get_dimensions
get_image_num_channels
get_image_size
hflip
invert
normalize
pad
perspective
pil_to_tensor
posterize
resize
resized_crop
rgb_to_grayscale
rotate
solarize
ten_crop
to_grayscale
to_pil_image
to_tensor
vflip
|