File: README.md

package info (click to toggle)
pytorch-text 0.14.1-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 11,560 kB
  • sloc: python: 14,197; cpp: 2,404; sh: 214; makefile: 20
file content (36 lines) | stat: -rw-r--r-- 1,234 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Description

This example shows end-2-end training for SST-2 binary classification using the RoBERTa model and TorchArrow based text
pre-processing. The main motivation for this example is to demonstrate the authoring of a text processing pipeline on
top of TorchArrow DataFrame.

## Installation and Usage

The example depends on TorchArrow and TorchData.

#### TorchArrow

Install it from source following instructions at https://github.com/pytorch/torcharrow#from-source. Note that some of
the natively integrated text operators (`bpe_tokenize` for tokenization, `lookup_indices` for vocabulary look-up) used
in this example depend on the torch library. By default, TorchArrow doesn’t take dependency on the torch library. Hence
make sure to use flag `USE_TORCH=1` during TorchArrow installation (this is also the reason why we cannot depend on
nightly releases)

```
USE_TORCH=1 python setup.py install
```

#### TorchData

To install TorchData follow instructions at https://github.com/pytorch/data#installation

#### Usage

To run example from command line run following command:

```bash
python roberta_sst2_training_with_torcharrow.py \
        --batch-size 16 \
        --num-epochs 1 \
        --learning-rate 1e-5
```