File: pbcore.io.rst

package info (click to toggle)
python-pbcore 2.1.2%2Bdfsg-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 6,476 kB
  • sloc: python: 13,393; xml: 2,504; makefile: 232; sh: 66
file content (109 lines) | stat: -rw-r--r-- 2,783 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
pbcore.io
=========

The ``pbcore.io`` package provides a number of lightweight interfaces
to PacBio data files and other standard bioinformatics file formats.
Preferred usage is to import classes directly from the ``pbcore.io``
package.

The classes within ``pbcore.io`` adhere to a few conventions, in order
to provide a uniform API:

  - Each data file type is thought of as a container of a `Record`
    type; all `Reader` classes support streaming access by iterating on the
    reader object, and `IndexedBarReader` additionally provides
    random-access to alignments/reads.

    For example::

      from pbcore.io import *
      with IndexedBamReader(filename) as f:
        for r in f:
            process(r)

    To make scripts a bit more user friendly, a progress bar can be
    easily added using the `tqdm` third-party package::

      from pbcore.io import *
      from tqdm import tqdm
      with IndexedBamReader(filename) as f:
        for r in tqdm(f):
            process(r)

  - The constructor argument needed to instantiate `Reader` and
    `Writer` objects can be either a filename (which can be suffixed
    by ".gz" for all file types) or an open file handle.
    The reader/writer classes will do what you would expect.


BAM format
----------

The BAM format is a standard format described aligned and unaligned
reads.  PacBio uses the BAM format exclusively.
For basic functionality, one should use :class:`BamReader`;
use :class:`IndexedBamReader` API for full index operation support,
which requires the auxiliary *PacBio BAM index file* (``bam.pbi`` file).

.. autoclass:: pbcore.io.BamAlignment
    :members:
    :undoc-members:

.. autoclass:: pbcore.io.BamReader
    :members:
    :undoc-members:

.. autoclass:: pbcore.io.IndexedBamReader
    :members:
    :undoc-members:


FASTA Format
------------

FASTA is a standard format for sequence data.  We recommmend using the
`FastaTable` class, which provides random access to indexed FASTA
files (using the conventional SAMtools "fai" index).

.. autoclass:: pbcore.io.FastaTable
    :members:

.. autoclass:: pbcore.io.FastaRecord
    :members:

.. autoclass:: pbcore.io.FastaReader
    :members:

.. autoclass:: pbcore.io.FastaWriter
    :members:


FASTQ Format
------------

FASTQ is a standard format for sequence data with associated quality scores.

.. autoclass:: pbcore.io.FastqRecord
    :members:

.. autoclass:: pbcore.io.FastqReader
    :members:

.. autoclass:: pbcore.io.FastqWriter
    :members:



GFF Format (Version 3)
----------------------

The GFF format is an open and flexible standard for representing genomic features.

.. autoclass:: pbcore.io.Gff3Record
    :members:

.. autoclass:: pbcore.io.GffReader
    :members:

.. autoclass:: pbcore.io.GffWriter
    :members: