File: api-initializers.md

package info (click to toggle)
python-thinc 9.1.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky
  • size: 5,896 kB
  • sloc: python: 17,122; javascript: 1,559; ansic: 342; makefile: 15; sh: 13
file content (186 lines) | stat: -rw-r--r-- 8,178 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: Initializers
next: /docs/api-schedules
---

A collection of initialization functions. Parameter initialization schemes can
be very important for deep neural networks, because the initial distribution of
the weights helps determine whether activations change in mean and variance as
the signal moves through the network. If the activations are not stable, the
network will not learn effectively. The "best" initialization scheme changes
depending on the activation functions being used, which is why a variety of
initializations are necessary. You can reduce the importance of the
initialization by using normalization after your hidden layers.

### normal_init {#normal_init tag="function"}

Initialize from a normal distribution, with `scale = sqrt(1 / fan_in)`.

| Argument       | Type              | Description                                                                       |
| -------------- | ----------------- | --------------------------------------------------------------------------------- |
| `ops`          | <tt>Ops</tt>      | The backend object, e.g. `model.ops`.                                             |
| `shape`        | <tt>Shape</tt>    | The data shape.                                                                   |
| _keyword-only_ |                   |                                                                                   |
| `fan_in`       | <tt>int</tt>      | Usually the number of inputs to the layer. If `-1`, the second dimension is used. |
| **RETURNS**    | <tt>FloatsXd</tt> | The initialized array.                                                            |

### glorot_uniform_init {#glorot_uniform_init tag="function"}

Initialize from a uniform distribution with `scale` 
parameter computed by the method introduced by Xavier Glorot
([Glorot and Bengio, 2010])(http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf):
`scale = sqrt(6.0 / (data.shape[0] + data.shape[1]))`. Usually used in
[`Relu`](/docs/api-layers#relu) layers.

| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### glorot_normal_init {#glorot_normal_init tag="function"}

Initialize from a normal distribution with `scale` 
parameter computed by the method introduced by Xavier Glorot
([Glorot and Bengio, 2010])(http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf):
`scale = sqrt(2.0 / (data.shape[0] + data.shape[1]))`.

| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### he_uniform_init {#he_uniform_init tag="function"}

Initialize from a uniform distribution with `scale` parameter
computed by the method introduced in [He et al., 2015](https://arxiv.org/abs/1502.01852): 
`scale = sqrt(6.0 / data.shape[1])`.

| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### he_normal_init {#he_normal_init tag="function"}
Initialize from a normal distribution with `scale` parameter
computed by the method introduced in [He et al., 2015](https://arxiv.org/abs/1502.01852): 
`scale = sqrt(2.0 / data.shape[1])`.

| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### lecun_uniform_init {#lecun_uniform_init tag="function"}
Initialize from a uniform distribution with `scale` parameter
computed as: `scale = sqrt(3.0 / data.shape[1])`.


| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |


### lecun_normal_init {#lecun_normal_init tag="function"}
Initialize from a normal distribution with `scale` parameter
computed as: `variance = sqrt(1.0 / data.shape[1])`.


| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### zero_init {#zero_init tag="function"}

Initialize a parameter with zero weights. This is usually used for output layers
and for bias vectors.

| Argument    | Type              | Description                           |
| ----------- | ----------------- | ------------------------------------- |
| `ops`       | <tt>Ops</tt>      | The backend object, e.g. `model.ops`. |
| `shape`     | <tt>Shape</tt>    | The data shape.                       |
| **RETURNS** | <tt>FloatsXd</tt> | The initialized array.                |

### uniform_init {#uniform_init tag="function"}

Initialize values from a uniform distribution. This is usually used for word
embedding tables.

| Argument       | Type              | Description                              |
| -------------- | ----------------- | ---------------------------------------- |
| `ops`          | <tt>Ops</tt>      | The backend object, e.g. `model.ops`.    |
| `shape`        | <tt>Shape</tt>    | The data shape.                          |
| _keyword-only_ |                   |                                          |
| `lo`           | <tt>float</tt>    | The minimum of the uniform distribution. |
| `hi`           | <tt>float</tt>    | The maximum of the uniform distribution. |
| **RETURNS**    | <tt>FloatsXd</tt> | The initialized array.                   |

---

## Usage via config and function registry {#registry}

Since the initializers need to be called with data, defining them in the
[config](/docs/usage-config) will return a **configured function**: a partial
with only the settings (the keyword arguments) applied. Within your script, you
can then pass in the data, and the configured function will be called using the
settings defined in the config.

Most commonly, the initializer is passed as an argument to a
[layer](/docs/api-layers), so it can be defined as its own config block nested
under the layer settings:

<grid>

```ini
### config.cfg {small="true"}
[model]
@layers = "linear.v1"
nO = 10

[model.init_W]
@initializers = "normal_init.v1"
fan_in = -1
```

```python
### Usage {small="true"}
from thinc.api import registry, Config

config = Config().from_disk("./config.cfg")
resolved = registry.resolve(config)
model = resolved["model"]
```

</grid>

You can also define it as a regular config setting and then call the configured
function in your script:

<grid>

```ini
### config.cfg {small="true"}
[initializer]
@initializers = "uniform_init.v1"
lo = -0.1
hi = 0.1
```

```python
### Usage {small="true"}
from thinc.api import registry, Config, NumpyOps

config = Config().from_disk("./config.cfg")
resolved = registry.resolve(config)
initializer = resolved["initializer"]
weights = initializer(NumpyOps(), (3, 2))
```

</grid>