File: usage-training.md

package info (click to toggle)
python-thinc 8.1.7-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 5,804 kB
  • sloc: python: 15,818; javascript: 1,554; ansic: 342; makefile: 20; sh: 13
file content (301 lines) | stat: -rw-r--r-- 10,695 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
---
title: Training Models
next: /docs/usage-frameworks
---

Thinc provides a fairly minimalistic approach to training, leaving you in
control to write the training loop. The library provides a few utilities for
minibatching, hyperparameter scheduling, loss functions and weight
initialization, but does not provide abstractions for data loading, progress
tracking or hyperparameter optimization.

## The training loop {#training-loop}

Thinc assumes that your model will be trained using some form of **minibatched
stochastic gradient descent**. On each step of a standard training loop, you'll
loop over batches of your data and call
[`Model.begin_update`](/docs/api-model#begin_update) on the inputs of the batch,
which will return a batch of predictions and a backpropagation callback. You'll
then calculate the gradient of the loss with respect to the output, and provide
it to the backprop callback which will increment the gradients of the model
parameters as a side-effect. You can then pass an optimizer function into the
[`Model.finish_update`](/docs/api-model#finish_update) method to update the
weights.

```python
### Basic training loop
for i in range(10):
    for X, Y in train_batches:
        Yh, backprop = model.begin_update(X)
        loss, dYh = get_loss_and_gradient(Yh, Y)
        backprop(dYh)
        model.finish_update(optimizer)
```

You'll usually want to make some additions to the loop to save out model
checkpoints periodically, and to calculate and report progress statistics. Thinc
also provides ready access to **lower-level details**, making it easy to
experiment with arbitrary training variations. You can accumulate the gradients
over multiple batches before calling the optimizer, call the `backprop` callback
multiple times (or not at all if the update is small), and inject arbitrary code
to change or report gradients for particular layers. The implementation is quite
transparent, so you'll find it easy to implement such arbitrary modifications if
you need to.

## Batching {#batching}

<infobox>

A "minibatch" (or simply "batch" – we use the terms interchangeably) is just a
group of samples that you update or predict over together. Batching the data is
very important: most neural network models converge much faster and achieve
better accuracy when the gradients are calculated using multiple samples.

</infobox>

Thinc implements two batching helpers via the backend object
[`Ops`](/docs/api-backend#ops), typically used via `model.ops`. They should
cover the most common batching needs for training and evaluation.

1. [`minibatch`](/docs/api-backends#minibatch): Iterate slices from a sequence,
   optionally shuffled.
2. [`multibatch`](/docs/api-backends#multibatch): Minibatch one or more
   sequences and yield lists with one batch per sequence.

```python
### Example
batches = model.ops.minibatch(128, data, shuffle=True)
batches = model.ops.multibatch(128, train_X, train_Y, shuffle=True)
```

The batching methods take sequences of data and process them as a stream. They
return a [`SizedGenerator`](/docs/api-types#sizedgenerator), a simple custom
dataclass for generators that has a `__len__` and can repeatedly call the
generator function. This also means that the batching works nicely with progress
bars like [`tqdm`](https://github.com/tqdm/tqdm) and similar tools
out-of-the-box.

```python
### With progress bar {highlight="1,4"}
from tqdm import tqdm

data = model.ops.multibatch(128, train_X, train_Y, shuffle=True)
for X, Y in tqdm(data, leave=False):
    Yh, backprop = model.begin_update(X)
```

[`SizedGenerator`](/docs/api-types#sizedgenerator) objects hold a reference to
the generator function and **call it repeatedly**, i.e. every time the sized
generator is executed. This also means that the sized generator is **never
consumed**. If you like, you can define it once outside your training loop, and
on each iteration, the data will be **rebatched and reshuffled**.

```python
### Option 1
for i in range(10):
    for X, Y in model.ops.multibatch(128, train_X, train_Y, shuffle=True):
        # Update the model here
    for X, Y in model.ops.multibatch(128, dev_X, dev_Y):
        # Evaluate the model here
```

```python
### Option 2
train_data = model.ops.multibatch(128, train_X, train_Y, shuffle=True)
dev_data = model.ops.multibatch(128, dev_X, dev_Y)
for i in range(10):
    for X, Y in train_data:
        # Update the model here
    for X, Y in dev_data:
        # Evaluate the model here
```

The `minibatch` and `multibatch` methods also support a `buffer` argument, which
may be useful to promote better parallelism. If you're using an engine that
supports asynchronous execution, such as PyTorch or
[JAX](https://github.com/google/jax), an unbuffered stream could cause the
engine to block unnecessarily. If you think this may be a problem, try setting a
higher buffer, e.g. `buffer=500`, and see if it solves the problem. You could
also simply consume the entire generator, by calling `list()` on it.

Finally, `minibatch` and `multibatch` support **variable length batching**,
based on a schedule you can provide as the `batch_size` argument. Simply pass in
an iterable (such as a generator from the
[built-in schedules](/docs/api-schedules)) instead of an integer. Variable
length batching is non-standard, but we regularly use it for some of
[spaCy](https://spacy.io)'s models, especially the parser and entity recognizer.

```python
from thinc.api import compounding

batch_size = compounding(1.0, 16.0, 1.001)
train_data = model.ops.multibatch(batch_size, train_X, train_Y, shuffle=True)
```

![](images/schedules_custom2.svg)

<grid>

```ini
### config {small="true"}
[batch_size]
@schedules = "compounding.v1"
start = 1.0
stop = 16.0
compound = 1.001
```

```python
### Usage {small="true"}
from thinc.api import Config, registry

config = Config().from_str("./config.cfg")
resolved = registry.resolve(config)
batch_size = resolved["batch_size"]
```

</grid>

---

## Evaluation {#evaluation}

Thinc does not provide utilities for calculating accuracy scores over either
individual samples or whole datasets. In most situations, you will make a loop
over batches of your inputs and targets, **calculate the accuracy** on the batch
of data, and then **keep a tally of the scores**.

```python
def evaluate(model, batch_size, Xs, Ys):
    correct = 0.
    total = 0.
    for X, Y in model.ops.multibatch(batch_size, Xs, Ys):
        correct += (model.predict(X).argmax(axis=0) == Y.argmax(axis=0)).sum()
        total += X.shape[0]
    return correct / total
```

During evaluation, take care to run your model **in a prediction context** (as
opposed to a training context), by using either the
[`Model.predict`](/docs/api-model#predict) method, or by passing the
`is_train=False` flag to [`Model.__call__`](/docs/api-model#call). Some layers
may behave differently during training and prediction in order to provide
regularization. Dropout layers are the most common example.

---

## Loss calculators {#losses}

When training your Thinc models, the most important loss calculation is not a
scalar loss, but rather the **gradient of the loss with respect to your model
output**. That's the figure you have to pass into the backprop callback. You
actually don't need to calculate the scalar loss at all, although it's often
helpful as a diagnostic statistic.

Thinc provides a few [helpers for common loss functions](/docs/api-losses). Each
helper is provided as a class, so you can pass in any settings or
hyperparameters that your loss might require. The helper class can be used as a
callable object, in which case it will return both the scalar loss and the
gradient of the loss with respect to the outputs. You can also call the
`get_grad` method to just get the gradients, or the `get_loss` method to just
get the scalar loss.

<grid>

```python
### Example {small="true"}
from thinc.api import CategoricalCrossentropy
loss_calc = CategoricalCrossentropy()
grad, loss = loss_calc(guesses, truths)
```

```ini
### config.cfg {small="true"}
[loss]
@losses = "CategoricalCrossentropy.v1"
normalize = true
```

</grid>

---

## Setting learning rate schedules {#schedules}

A common trick for stochastic gradient descent is to **vary the learning rate or
other hyperparameters** over the course of training. Since there are many
possible ways to vary the learning rate, Thinc lets you implement hyperparameter
schedules as simple generator functions. Thinc also provides a number of
[popular schedules](/docs/api-schedules) built-in.

You can use schedules directly, by calling `next()` on the schedule and using it
to update hyperparameters in your training loop. Since schedules are
particularly common for optimization settings, the
[`Optimizer`](/docs/api-optimizer) object accepts floats, lists and iterators
for most of its parameters. When you call
[`Optimizer.step_schedules`](/docs/api-optimizer#step_schedules), the optimizer
will draw the next value from the generators and use them to change the given
attributes. For instance, here's how to create an instance of the `Adam`
optimizer with a custom learning rate schedule:

```python
### Custom learning rate schedule
from thinc.api import Adam

def my_schedule():
    values = [0.001, 0.01, 0.1]
    while True:
        for value in values:
            yield value
        for value in reversed(values):
            yield value

optimizer = Adam(learn_rate=my_schedule())
assert optimizer.learn_rate == 0.001
optimizer.step_schedules()
assert optimizer.learn_rate == 0.01
optimizer.step_schedules()
assert optimizer.learn_rate == 0.1
```

![](images/schedules_custom1.svg)

You'll often want to describe your optimization schedules in your configuration
file. That's also very easy: you can use the
[`@thinc.registry.schedules`](/docs/api-config#registry) decorator to register
your function, and then refer to it in your config as the `learn_rate` argument
of the optimizer. Check out the
[documentation on config files](/docs/usage-config) for more examples.

<grid>

```python
### Registered function {small="true"}
@thinc.registry.schedules("my_schedule.v1")
def my_schedule(values):
    while True:
        for value in values:
            yield value
        for value in reversed(values):
            yield value
```

```ini
### config.cfg {small="true"}
[optimizer]
@optimizers = "Adam.v1"

[optimizer.learn_rate]
@schedules = "my_schedule.v1"
values = [0.001, 0.01, 0.1]
```

</grid>

---

## Distributed training {#distributed}

We expect to recommend [Ray](https://ray.io/) for distributed training. Ray
offers a clean and simple API that fits well with Thinc's model design. Full
support is still under development.