File: esl_vectorops.md

package info (click to toggle)
infernal 1.1.5-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 74,208 kB
  • sloc: ansic: 230,749; perl: 14,433; sh: 6,147; makefile: 3,071; python: 1,247
file content (80 lines) | stat: -rw-r--r-- 5,074 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# esl_vectorops

The `vectorops` module contains routines for simple operations on
vectors of various types, of length <n>. 

Different functions allow an operation to be performed in vectors
containing elements of different scalar types (`double`, `float`,
`int`, `int64_t`, `char`). The appropriate routine is prefixed with
`D`, `F`, `I`, `L`, or `C`. For example, `esl_vec_DSet()` is the Set
routine for a vector of doubles; `esl_vec_LSet()` is for 64-bit
("Long") integers.

## quick reference

Replace `*` with `esl_vec_[DFIL]`, sometimes `C`:

| functions                | in English                   | in math                         | notes |
|--------------------------|------------------------------|--------------------------------:|------:|
| `*Set(v, n, x)`          | Set all elements to x        | $v_i = x \quad \forall i$       |       |
| `*Scale(v, n, x)`        | Multiply all elements by x   | $v_i = x v_i \quad \forall i$   |       |
| `*Increment(v, n, x)`    | Add x to all elements        | $v_i = v_i + x \quad \forall i$ |       |
| `*Add(v, w, n)`          | Vector addition              | $v_i = v_i + w_i \quad \forall i$ |     |
| `*AddScaled(v, w, n, a)` | Vector add a scaled vector   | $v_i = v_i + a w_i \quad \forall i$ |   |
| `*Sum(v, n)`             | Return sum of elements       | (return) $\sum_i v_i$           | [1]   |
| `*Dot(v, w, n)`          | Return dot product           | (return) $\mathbf{v} \cdot \mathbf{w}$ ||
| `*Max(v, n)`             | Return maximum element       | (return) $\max_i v_i$           |       |
| `*Min(v, n)`             | Return minimum element       | (return) $\min_i v_i$           |       |
| `*ArgMax(v, n)`          | Return index of maximum element | (return) $\mathrm{argmax}_i v_i$  |  |
| `*ArgMin(v, n)`          | Return index of minimum element | (return) $\mathrm{argmin}_i v_i$  |  |
| `*Copy(v, n, w)`         | Copy a vector                | $w_i = v_i \quad \forall i$     |       |
| `*Swap(v, w, n)`         | Swap contents of two vectors | $\mathbf{w} = \mathbf{v}, \mathbf{v} = \mathbf{w}$ | |
| `*Reverse(v, w, n)`      | Reverse order of v, store in w | $w_i = v_{n-i-1}$             |       |
| `*SortIncreasing(v, n)`  | Sort from smallest to largest   | []()                              |  |
| `*SortDecreasing(v, n)`  | Sort from largest to smallest   | []()                         |       |
| `*Shuffle(rng, v, n)`    | Shuffle vector in place         | []()                         |       |
| `*Compare(v, w, n ...)`  | Compare vectors for equality | (return `eslOK` or `eslFAIL`)   | [2]   |
| `*Dump(fp, v, n, label)` | Dump contents for inspection    |                              |       |


Floating point only, `[DF]` (generally for probability or log probability vectors):

| functions                | in English                   | in math                         | 
|--------------------------|------------------------------|--------------------------------:|
| `*Norm(v, n)`            | Normalize probability vector | $v_i = \frac{v_i}{\sum_j v_j}$  |       
| `*LogNorm(v, n)`         | Normalize log probability vector | $v_i = \frac{e^{v_i}}{\sum_j e^{v_j}}$ |
| `*Log(v, n)`             | Take log of each element     | $v_i = \log v_i$                |       
| `*Exp(v, n)`             | Exponentiate each element    | $v_i = e^{v_i}$                 |        
| `*LogSum(v, n)`          | Return log sum of log probabilities | (return) $\log \sum_i e^{v_i}$  
| `*Entropy(p, n)`         | Return Shannon entropy       | (return) $H = - \sum_i p_i \log_2 p_i$ |  
| `*RelEntropy(p, q, n)`   | Return Kullback-Leibler divergence | (return) $D_{\mathrm{KL}}(p \parallel q) = \sum_i p_i \log_2 \frac{p_i}{q_i}$ |
| `*CDF(p, n, C)`          | Cumulative distribution of a prob vector | $C_i = \sum_{j=0}^{j=i} p_j$ |
| `*Validate(p, n, atol, errbuf)`| Validate probability vector | (return `eslOK` or `eslFAIL`) | 
| `*LogValidate(p, n, atol, errbuf)`| Validate log prob vector | (return `eslOK` or `eslFAIL`) | 


[1] Floating point `*Sum()` functions use the Kahan compensated
    summation algorithm to reduce numerical error
	[[Kahan, 1965]](https://doi.org/10.1145/363707.363723).
	
[2] Floating point `*Compare()` functions take an additional argument
    `tol`, and use `esl_*Compare(v[i], w[i], tol)` to compare each
    element with **relative** tolerance `tol`. We are phasing
    `esl_*Compare()` functions out. They are to be replaced by
    `esl_*CompareNew()` which use `atol` and `rtol` absolute and
    relative tolerances.

## notes to future 

* We could provide SIMD accelerated versions and runtime dispatchers. 

* The vector length <n> is always an `int`, so vectors can be up to a
  couple billion ($2^{31}-1$) long. If we ever need longer vectors
  (!), we'll write a `esl_vec64` module. That is, the contents (type)
  of the vector is a different issue from its length.  Don't be
  confused to see the `L` versions of the functions using `int64_t
  *vec` and `int n`, and using `int i` to index it; this is
  deliberate.