1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
|
.. include:: includeme.rst
Saving :class:`BedTool` results
===============================
In general, there are three different ways of saving results from
:class:`BedTool` operations:
Use the :meth:`BedTool.saveas` method
-------------------------------------
The :meth:`BedTool.saveas` method makes a **copy** of the results, so beware
that for large files, this can be time and/or memory-consuming. However, when
working with a streaming or iterating :class:`BedTool`, this is a great way to
render the results to disk in the middle of a pipeline.
A good example of this is saving the results from a :meth:`BedTool.each` call:
.. doctest::
>>> from pybedtools.featurefuncs import TSS
>>> a = pybedtools.example_bedtool('a.bed')
>>> result = a.each(TSS, upstream=1000, downstream=0)\
... .saveas('upstream_regions.bed')
Use the :meth:`BedTool.moveto` method
-------------------------------------
The :meth:`BedTool.moveto` method does a **move** operation of the results.
This is best used when the results have been written to disk already (perhaps
to a tempfile) but you'd like to give the file a more reasonable/memorable
name.
.. doctest::
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b).moveto('intersection_of_a_and_b.bed')
Use the ``output`` keyword argument
-----------------------------------
If you know ahead of time that you want to save the output to a particular
file, use the ``output`` keyword argument to any wrapped :class:`BedTool`
method that returns another :class:`BedTool` object. This will override the
default behavior of creating a tempfile.
.. doctest::
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b, output='intersection_of_a_and_b.bed')
Working with non-interval output files
--------------------------------------
`BEDTools` commands offer lots of flexibility. This means it is possible to
return results that are not supported interval files like
BED/GFF/GTF/BAM/SAM/VCF.
Consider the following example, which uses :meth:`BedTool.groupby` to get
a 2-column file containing the number of intervals in each featuretype:
.. doctest::
>>> a = pybedtools.example_bedtool('gdc.gff')
>>> b = pybedtools.example_bedtool('gdc.bed')
>>> c = a.intersect(b, c=True)
>>> d = c.groupby(g=[3], c=10, o=['sum'])
We can read the file created by `d` looks like this:
(note: the latest version of BEDTools, v2.26.0, causes this to fail. This will
be fixed in the next BEDTools release (see
https://github.com/arq5x/bedtools2/issues/453,
https://github.com/arq5x/bedtools2/issues/450,
https://github.com/arq5x/bedtools2/issues/435,
https://github.com/arq5x/bedtools2/issues/436 for details).
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> # bedtools v2.26.0
>>> print(open(d.fn).read())
UTR 0
CDS 2
intron 4
CDS 0
UTR 1
exon 3
mRNA 7
CDS 2
exon 2
tRNA 2
gene 7
<BLANKLINE>
Trying to iterate over `d` (`[i for i in d]`) or save it (`d.saveas()`) raises
exceptions. This is because:
* `saveas()` is expected to return a `BedTool` object that can be
used with other `BEDTools` tools. We can't create a `BedTool` object out of
an unsupported file format like this
* iterating over a `BedTool` object is expected to yield `Interval` objects,
but these lines can't be converted into the supported formats
To save the output to a filename of your choosing, provide the `output`
argument instead of `saveas()`, like this:
.. doctest::
>>> # only works with bedtools != v2.26.0
>>> # d = c.groupby(g=[3], c=10, o=['sum'], output='counts.txt')
To iterate over the lines of the file, you can use standard Python
tools, e.g.:
.. doctest::
>>> # only works with bedtools != v2.26.0
>>> # for line in open(d.fn):
>>> # featuretype, count = line.strip().split()
|