1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
|
Overview
--------
The `BEDTools suite of programs <http://bedtools.readthedocs.org/>`_ is widely
used for genomic interval manipulation or "genome algebra". `pybedtools` wraps
and extends BEDTools and offers feature-level manipulations from within
Python.
See full online documentation, including installation instructions, at
http://daler.github.io/pybedtools/.
Why `pybedtools`?
-----------------
Here is an example to get the names of genes that are <5 kb away from
intergenic SNPs:
.. code-block:: python
from pybedtools import BedTool
snps = BedTool('snps.bed.gz') # [1]
genes = BedTool('hg19.gff') # [1]
intergenic_snps = snps.subtract(genes) # [2]
nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]
for gene in nearby: # [4]
if int(gene[-1]) < 5000: # [4]
print gene.name # [4]
Useful features shown here include:
* `[1]` support for all BEDTools-supported formats (here gzipped BED and GFF)
* `[2]` wrapping of all BEDTools programs and arguments (here, `subtract` and `closest` and passing
the `-d` flag to `closest`);
* `[3]` streaming results (like Unix pipes, here specified by `stream=True`)
* `[4]` iterating over results while accessing feature data by index or by attribute
access (here `[-1]` and `.name`).
In contrast, here is the same analysis using shell scripting. Note that this
requires knowledge in Perl, bash, and awk. The run time is identical to the
`pybedtools` version above:
.. code-block:: bash
snps=snps.bed.gz
genes=hg19.gff
intergenic_snps=/tmp/intergenic_snps
snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))
intersectBed -a $snps -b $genes -v > $intergenic_snps
closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'
rm $intergenic_snps
See the `Shell script comparison <http://daler.github.io/pybedtools/sh-comparison.html>`_ in the docs
for more details on this comparison, or keep reading the full documentation at
http://daler.github.io/pybedtools.
|