File: video_mvit.rst

package info (click to toggle)

pytorch-vision 0.21.0-3

links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 20,228 kB
sloc: python: 65,904; cpp: 11,406; ansic: 2,459; java: 550; sh: 265; xml: 79; objc: 56; makefile: 33

file content (27 lines) | stat: -rw-r--r-- 791 bytes

parent folder | download | duplicates (2)

Video MViT
==========

.. currentmodule:: torchvision.models.video

The MViT model is based on the
`MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
<https://arxiv.org/abs/2112.01526>`__ and `Multiscale Vision Transformers
<https://arxiv.org/abs/2104.11227>`__ papers.


Model builders
--------------

The following model builders can be used to instantiate a MViT v1 or v2 model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.MViT`` base class. Please refer to the `source
code
<https://github.com/pytorch/vision/blob/main/torchvision/models/video/mvit.py>`_ for
more details about this class.

.. autosummary::
    :toctree: generated/
    :template: function.rst

    mvit_v1_b
    mvit_v2_s