1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
|
======================================================
pysam - An interface for reading and writing SAM files
======================================================
Introduction
============
Pysam is a python module that makes it easy to read and manipulate
mapped short read sequence data stored in SAM/BAM files. It is a
lightweight wrapper of the htslib_ C-API.
This page provides a quick introduction in using pysam followed by the
API. See :ref:`usage` for more detailed usage instructions.
To use the module to read a file in BAM format, create a
:class:`~pysam.AlignmentFile` object::
import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
Once a file is opened you can iterate over all of the read mapping to
a specified region using :meth:`~pysam.AlignmentFile.fetch`. Each
iteration returns a :class:`~pysam.AlignedSegment` object which
represents a single read along with its fields and optional tags::
for read in samfile.fetch('chr1', 100, 120):
print read
samfile.close()
To give::
EAS56_57:6:190:289:82 0 99 <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; 69 CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA 0 192 1
EAS56_57:6:190:289:82 0 99 <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; 137 AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC 73 64 1
EAS51_64:3:190:727:308 0 102 <<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844 99 GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG 99 18 1
...
You can also write to a :class:`~pysam.AlignmentFile`::
import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile)
for read in samfile.fetch():
if read.is_paired:
pairedreads.write(read)
pairedreads.close()
samfile.close()
An alternative way of accessing the data in a SAM file is by iterating
over each base of a specified region using the
:meth:`~pysam.AlignmentFile.pileup` method. Each iteration returns a
:class:`~pysam.PileupColumn` which represents all the reads in the SAM
file that map to a single base in the reference sequence. The list of
reads are represented as :class:`~pysam.PileupRead` objects in the
:attr:`PileupColumn.pileups <pysam.PileupColumn.pileups>` property::
import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb" )
for pileupcolumn in samfile.pileup("chr1", 100, 120):
print ("\ncoverage at base %s = %s" %
(pileupcolumn.pos, pileupcolumn.n))
for pileupread in pileupcolumn.pileups:
if not pileupread.is_del and not pileupread.is_refskip:
# query position is None if is_del or is_refskip is set.
print ('\tbase in read %s = %s' %
(pileupread.alignment.query_name,
pileupread.alignment.query_sequence[pileupread.query_position]))
samfile.close()
The above code outputs::
coverage at base 99 = 1
base in read EAS56_57:6:190:289:82 = A
coverage at base 100 = 1
base in read EAS56_57:6:190:289:82 = G
coverage at base 101 = 1
base in read EAS56_57:6:190:289:82 = G
coverage at base 102 = 2
base in read EAS56_57:6:190:289:82 = G
base in read EAS51_64:3:190:727:308 = G
...
Commands available in :term:`csamtools` are available as simple
function calls. For example::
pysam.sort("-o", "output.bam", "ex1.bam")
corresponds to the command line::
samtools sort -o output.bam ex1.bam
Analogous to :class:`~pysam.AlignmentFile`, a
:class:`~pysam.TabixFile` allows fast random access to compressed and
tabix indexed tab-separated file formats with genomic data::
import pysam
tabixfile = pysam.TabixFile("example.gtf.gz")
for gtf in tabixfile.fetch("chr1", 1000, 2000):
print (gtf.contig, gtf.start, gtf.end, gtf.gene_id)
:class:`~pysam.TabixFile` implements lazy parsing in order to iterate
over large tables efficiently.
More detailed usage instructions is at :ref:`usage`.
.. note::
Coordinates in pysam are always 0-based (following the python
convention). SAM text files use 1-based coordinates.
.. note::
The above examples can be run in the :file:`tests` directory of the
installation directory. Type 'make' before running them.
.. seealso::
https://github.com/pysam-developers/pysam
The pysam code repository, containing source code and download
instructions
http://pysam.readthedocs.org/en/latest/
The pysam website containing documentation
API
===
SAM/BAM/CRAM files
-------------------
Objects of type :class:`~pysam.AlignmentFile` allow working with
BAM/SAM formatted files.
.. autoclass:: pysam.AlignmentFile
:members:
.. autoclass:: pysam.AlignmentHeader
:members:
An :class:`~pysam.AlignedSegment` represents an aligned segment within
a SAM/BAM file.
.. autoclass:: pysam.AlignedSegment
:members:
.. autoclass:: pysam.PileupColumn
:members:
.. autoclass:: pysam.PileupRead
:members:
.. autoclass:: pysam.IndexedReads
:members:
Tabix files
-----------
:class:`~pysam.TabixFile` opens tabular files that have been
indexed with tabix_.
.. autoclass:: pysam.TabixFile
:members:
To iterate over tabix files, use :func:`~pysam.tabix_iterator`:
.. autofunction:: pysam.tabix_iterator
.. autofunction:: pysam.tabix_compress
.. autofunction:: pysam.tabix_index
.. autoclass:: pysam.asTuple
:members:
.. autoclass:: pysam.asVCF
:members:
.. autoclass:: pysam.asBed
:members:
.. autoclass:: pysam.asGTF
:members:
Fasta files
-----------
.. autoclass:: pysam.FastaFile
:members:
Fastq files
-----------
.. autoclass:: pysam.FastxFile
:members:
.. autoclass:: pysam.FastqProxy
:members:
VCF files
---------
.. autoclass:: pysam.VariantFile
:members:
.. autoclass:: pysam.VariantHeader
:members:
.. autoclass:: pysam.VariantRecord
:members:
.. autoclass:: pysam.VariantHeaderRecord
:members:
HTSFile
-------
HTSFile is the base class for :class:`pysam.AlignmentFile` and
:class:`pysam.VariantFile`.
.. autoclass:: pysam.HTSFile
:members:
|