File: api.rst

package info (click to toggle)
python-pysam 0.15.4+ds-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 27,992 kB
  • sloc: ansic: 140,738; python: 7,881; sh: 265; makefile: 223; perl: 41
file content (233 lines) | stat: -rw-r--r-- 6,020 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
======================================================
pysam - An interface for reading and writing SAM files
======================================================

Introduction
============

Pysam is a python module that makes it easy to read and manipulate
mapped short read sequence data stored in SAM/BAM files.  It is a
lightweight wrapper of the htslib_ C-API.

This page provides a quick introduction in using pysam followed by the
API. See :ref:`usage` for more detailed usage instructions.

To use the module to read a file in BAM format, create a
:class:`~pysam.AlignmentFile` object::

   import pysam
   samfile = pysam.AlignmentFile("ex1.bam", "rb")

Once a file is opened you can iterate over all of the read mapping to
a specified region using :meth:`~pysam.AlignmentFile.fetch`.  Each
iteration returns a :class:`~pysam.AlignedSegment` object which
represents a single read along with its fields and optional tags::

   for read in samfile.fetch('chr1', 100, 120):
	print read

   samfile.close()

To give::

    EAS56_57:6:190:289:82	0	99	<<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<;	69	CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA	0	192	1
    EAS56_57:6:190:289:82	0	99	<<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2;	137	AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC	73	64	1
    EAS51_64:3:190:727:308	0	102	<<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844	99	GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG	99	18	1
    ...

You can also write to a :class:`~pysam.AlignmentFile`::

   import pysam
   samfile = pysam.AlignmentFile("ex1.bam", "rb")
   pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile)
   for read in samfile.fetch():
	if read.is_paired:
		pairedreads.write(read)

   pairedreads.close()
   samfile.close()

An alternative way of accessing the data in a SAM file is by iterating
over each base of a specified region using the
:meth:`~pysam.AlignmentFile.pileup` method. Each iteration returns a
:class:`~pysam.PileupColumn` which represents all the reads in the SAM
file that map to a single base in the reference sequence. The list of
reads are represented as :class:`~pysam.PileupRead` objects in the
:attr:`PileupColumn.pileups <pysam.PileupColumn.pileups>` property::

    import pysam
    samfile = pysam.AlignmentFile("ex1.bam", "rb" )
    for pileupcolumn in samfile.pileup("chr1", 100, 120):
        print ("\ncoverage at base %s = %s" %
               (pileupcolumn.pos, pileupcolumn.n))
        for pileupread in pileupcolumn.pileups:
            if not pileupread.is_del and not pileupread.is_refskip:
                # query position is None if is_del or is_refskip is set.
                print ('\tbase in read %s = %s' %
                      (pileupread.alignment.query_name,
                       pileupread.alignment.query_sequence[pileupread.query_position]))

    samfile.close()

The above code outputs::

    coverage at base 99 = 1
        base in read EAS56_57:6:190:289:82 = A

    coverage at base 100 = 1
        base in read EAS56_57:6:190:289:82 = G

    coverage at base 101 = 1
        base in read EAS56_57:6:190:289:82 = G

    coverage at base 102 = 2
        base in read EAS56_57:6:190:289:82 = G
        base in read EAS51_64:3:190:727:308 = G
    ...

Commands available in :term:`csamtools` are available as simple
function calls. For example::

   pysam.sort("-o", "output.bam", "ex1.bam")

corresponds to the command line::

   samtools sort -o output.bam ex1.bam 

Analogous to :class:`~pysam.AlignmentFile`, a
:class:`~pysam.TabixFile` allows fast random access to compressed and
tabix indexed tab-separated file formats with genomic data::

   import pysam
   tabixfile = pysam.TabixFile("example.gtf.gz")

   for gtf in tabixfile.fetch("chr1", 1000, 2000):
       print (gtf.contig, gtf.start, gtf.end, gtf.gene_id)

:class:`~pysam.TabixFile` implements lazy parsing in order to iterate
over large tables efficiently.

More detailed usage instructions is at :ref:`usage`.

.. note::

   Coordinates in pysam are always 0-based (following the python
   convention). SAM text files use 1-based coordinates.

.. note::

   The above examples can be run in the :file:`tests` directory of the
    installation directory. Type 'make' before running them.

.. seealso::

   https://github.com/pysam-developers/pysam

       The pysam code repository, containing source code and download
       instructions

   http://pysam.readthedocs.org/en/latest/

       The pysam website containing documentation

API
===

SAM/BAM/CRAM files
-------------------

Objects of type :class:`~pysam.AlignmentFile` allow working with
BAM/SAM formatted files.

.. autoclass:: pysam.AlignmentFile
   :members:

.. autoclass:: pysam.AlignmentHeader
   :members:

An :class:`~pysam.AlignedSegment` represents an aligned segment within
a SAM/BAM file.

.. autoclass:: pysam.AlignedSegment
   :members:

.. autoclass:: pysam.PileupColumn
   :members:

.. autoclass:: pysam.PileupRead
   :members:

.. autoclass:: pysam.IndexedReads
   :members:


Tabix files
-----------

:class:`~pysam.TabixFile` opens tabular files that have been
indexed with tabix_.

.. autoclass:: pysam.TabixFile
   :members:

To iterate over tabix files, use :func:`~pysam.tabix_iterator`:

.. autofunction:: pysam.tabix_iterator

.. autofunction:: pysam.tabix_compress

.. autofunction:: pysam.tabix_index

.. autoclass:: pysam.asTuple
   :members:

.. autoclass:: pysam.asVCF
   :members:

.. autoclass:: pysam.asBed
   :members:

.. autoclass:: pysam.asGTF
   :members:


Fasta files
-----------

.. autoclass:: pysam.FastaFile
   :members:

Fastq files
-----------

.. autoclass:: pysam.FastxFile
   :members:


.. autoclass:: pysam.FastqProxy
   :members:


VCF files
---------

.. autoclass:: pysam.VariantFile
   :members:

.. autoclass:: pysam.VariantHeader
   :members:

.. autoclass:: pysam.VariantRecord
   :members:

.. autoclass:: pysam.VariantHeaderRecord
   :members:

HTSFile
-------

HTSFile is the base class for :class:`pysam.AlignmentFile` and
:class:`pysam.VariantFile`.

.. autoclass:: pysam.HTSFile
   :members: