File: doc.go

package info (click to toggle)
golang-github-shenwei356-bio 0.0~git20201213.18e3e64-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 448 kB
  • sloc: perl: 114; sh: 21; makefile: 12
file content (117 lines) | stat: -rw-r--r-- 3,371 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
/*Package fai implements fasta sequence file index handling, including creating
, reading and random accessing.

Code of fai data structure were copied and edited from [1].

But I wrote the code of creating and reading fai, and so did test code.

Code of random accessing subsequences were copied from [2], but I extended them a lot.

Reference:

[1]. https://github.com/biogo/biogo/blob/master/io/seqio/fai/fai.go

[2]. https://github.com/brentp/faidx/blob/master/faidx.go

## General Usage

    import "github.com/shenwei356/bio/seqio/fai"

    file := "seq.fa"
    faidx, err := fai.New(file)
    checkErr(err)
    defer func() {
        checkErr(faidx.Close())
    }()

    // whole sequence
    seq, err := faidx.Seq("cel-mir-2")
    checkErr(err)

    // single base
    s, err := faidx.Base("cel-let-7", 1)
    checkErr(err)

    // subsequence. start and end are all 1-based
    seq, err := faidx.SubSeq("cel-mir-2", 15, 19)
    checkErr(err)


## Extended SubSeq


For extended SubSeq, negative position is allowed.


This is my custom locating strategy. Start and end are all 1-based.
To better understand the locating strategy, see examples below:


     1-based index    1 2 3 4 5 6 7 8 9 10
    negative index    0-9-8-7-6-5-4-3-2-1
               seq    A C G T N a c g t n
               1:1    A
               2:4      C G T
             -4:-2                c g t
             -4:-1                c g t n
             -1:-1                      n
              2:-2      C G T N a c g t
              1:-1    A C G T N a c g t n
              1:12    A C G T N a c g t n
            -12:-1    A C G T N a c g t n

Examples:

    // last 12 bases
    seq, err := faidx.SubSeq("cel-mir-2", -12, -1)
    checkErr(err)

## Advanced Usage

Function `fai.New(file string)` is a wraper to simplify the process of
creating and reading FASTA index . Let's see what's happened inside:

    func New(file string) (*Faidx, error) {
            fileFai := file + ".fai"
            var index Index
            if _, err := os.Stat(fileFai); os.IsNotExist(err) {
                    index, err = Create(file)
                    if err != nil {
                            return nil, err
                    }
            } else {
                    index, err = Read(fileFai)
                    if err != nil {
                            return nil, err
                    }
            }

            return NewWithIndex(file, index)
    }

By default, sequence ID is used as key in FASTA index file.
Inside the package, a regular expression is used to get sequence ID from
full head. The default value is `^([^\s]+)\s?`, i.e. getting
first non-space characters of head.
So you can just use `fai.Create(file string)` to create .fai file.

If you want to use full head instead of sequence ID (first non-space characters of head),
you could use `fai.CreateWithIDRegexp(file string, idRegexp string)` to create faidx.
Here, the `idRegexp` should be `^(.+)$`. For convenience, you can use another function
`CreateWithFullHead`.


## More Advanced Usages

Note that, ***by default, whole file is mapped into shared memory***,
which is OK for small files (smaller than your RAM).
For very big files, you should disable that.
Instead, file seeking is used.

    // change the global variable
    fai.MapWholeFile = false

    // then do other things

*/
package fai