1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
|
.. include:: includeme.rst
.. _`Design principles`:
Design principles
-----------------
Hopefully, understanding (or just being aware of) these design principles
will help in getting the most out of :mod:`pybedtools` and working
efficiently.
.. _`temp principle`:
Principle 1: Temporary files are created (and deleted) automatically
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using :class:`BedTool` instances typically has the side effect of creating
temporary files on disk. Even when using the iterator protocol of
:class:`BedTool` objects, temporary files may be created in order to run
BEDTools programs (see :ref:`BedTools as iterators` for more on this latter topic).
Let's illustrate some of the design principles behind :mod:`pybedtools` by
merging features in :file:`a.bed` that are 100 bp or less apart (`d=100`)
in a strand-specific way (`s=True`):
.. doctest::
>>> from pybedtools import BedTool
>>> import pybedtools
>>> a = BedTool(pybedtools.example_filename('a.bed'))
>>> merged_a = a.merge(d=100, s=True)
Now `merged_a` is a :class:`BedTool` instance that contains the results of the
merge.
:class:`BedTool` objects must always point to a file on disk. So in the
example above, `merged_a` is a :class:`BedTool`, but what file does it
point to? You can always check the :attr:`BedTool.fn` attribute to find
out::
>>> # what file does `merged_a` point to?
>>> merged_a.fn
'/tmp/pybedtools.MPPp5f.tmp'
Note that the specific filename will be different for you since it is a
randomly chosen name (handled by Python's :mod:`tempfile` module). This
shows one important aspect of :mod:`pybedtools`: every operation results in
a new temporary file. Temporary files are stored in :file:`/tmp` by
default, and have the form :file:`/tmp/pybedtools.*.tmp`.
By default, at exit all temp files created during the session will be deleted.
However, if Python does not exit cleanly (e.g., from a bug in client code),
then the temp files will not be deleted.
If this happens, from the command line you can always do a::
rm /tmp/pybedtools.*.tmp
In the middle of a session, you can force a deletion of all tempfiles created thus far::
>>> # Don't do this yet if you're following the tutorial!
>>> pybedtools.cleanup()
Alternatively, in this session or another session you can use::
>>> pybedtools.cleanup(remove_all=True)
to remove all files that match the pattern
:file:`<tempdir>/pybedtools.*.tmp` where `<tempdir>` is the current value
of `pybedtools.get_tempdir()`.
If you need to specify a different directory than that used by default by
Python's tempdir_ module, then you can set it with::
>>> pybedtools.set_tempdir('/scratch')
You'll need write permissions to this directory, and it needs to already
exist. All temp files will then be written to that directory, until the
tempdir is changed again.
.. _`similarity principle`:
Principle 2: Names and arguments are as similar as possible to BEDTools_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As much as possible, BEDTools programs and :class:`BedTool` methods share
the same names and arguments.
Returning again to this example::
>>> merged_a = a.merge(d=100, s=True)
This demonstrates that the :class:`BedTool` methods that wrap BEDTools_
programs do the same thing and take the exact same arguments as the
BEDTools_ program. Here we can pass `d=100` and `s=True` only because the
underlying BEDTools_ program, `mergeBed`, can accept these arguments.
Need to know what arguments `mergeBed` can take? See the docs for
:meth:`BedTool.merge`; for more on this see :ref:`good docs principle`.
In general, remove the "Bed" from the end of the BEDTools_ program to get
the corresponding :class:`BedTool` method. So there's a
:meth:`BedTool.subtract` method for `subtractBed`, a
:meth:`BedTool.intersect` method for `intersectBed`, and so on.
.. _`version principle`:
Principle 3: Indifference to BEDTools version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since :class:`BedTool` methods just wrap BEDTools_ programs, they are as up-to-date as
the version of BEDTools_ you have installed on disk. If you are using a
cutting-edge version of BEDTools_ that has some hypothetical argument
`-z` for `intersectBed`, then you can use `a.intersectBed(z=True)`.
:mod:`pybedtools` will also raise an exception if you try to use a method
that relies on a more recent version of BEDTools than you have installed.
.. _`default args principle`:
Principle 4: Sensible default args
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we were running the ``mergeBed`` program from the command line, we
would have to specify the input file with the :option:`mergeBed -i` option.
:mod:`pybedtools` assumes that if we're calling the :meth:`merge` method on
the :class:`BedTool`, `a`, we want to operate on the bed file that `a`
points to.
In general, BEDTools_ programs that accept a single BED file as input
(by convention typically specified with the :option:`-i` option) the
default behavior for :mod:`pybedtools` is to use the :class:`BedTool`'s
file (indicated in the :attr:`BedTool.fn` attribute) as input.
We can still pass a file using the `i` keyword argument if we wanted to be
absolutely explicit. In fact, the following two versions produce the same
output:
.. doctest::
>>> # The default is to use existing file for input -- no need
>>> # to specify "i" . . .
>>> result1 = a.merge(d=100, s=True)
>>> # . . . but you can always be explicit if you'd like
>>> result2 = a.merge(i=a.fn, d=100, s=True)
>>> # Confirm that the output is identical
>>> result1 == result2
True
Methods that have this type of default behavior are indicated by the following text in their docstring::
.. note::
For convenience, the file this BedTool object points to is passed as "-i"
There are some BEDTools_ programs that accept two BED files as input, like
``intersectBed`` where the the first file is specified with `-a` and the
second file with `-b`. The default behavior for :mod:`pybedtools` is to
consider the :mod:`BedTool`'s file as `-a` and the first non-keyword
argument to the method as `-b`, like this:
.. doctest::
>>> b = pybedtools.example_bedtool('b.bed')
>>> result3 = a.intersect(b)
This is exactly the same as passing the `a` and `b` arguments explicitly:
.. doctest::
>>> result4 = a.intersect(a=a.fn, b=b.fn)
>>> result3 == result4
True
Furthermore, the first non-keyword argument used as `-b` can either be a
filename *or* another :class:`BedTool` object; that is, these commands also
do the same thing:
.. doctest::
>>> result5 = a.intersect(b=b.fn)
>>> result6 = a.intersect(b=b)
>>> str(result5) == str(result6)
True
Methods that accept either a filename or another :class:`BedTool` instance
as their first non-keyword argument are indicated by the following text in
their docstring::
.. note::
This method accepts either a BedTool or a file name as the first
unnamed argument
.. _`non defaults principle`:
Principal 5: Other arguments have no defaults
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Only the BEDTools_ arguments that refer to BED (or other interval) files have
defaults. In the current version of BEDTools_, this means only the `-i`,
`-a`, and `-b` arguments have defaults. All others have no defaults
specified by :mod:`pybedtools`; they pass the buck to BEDTools programs. This
means if you do not specify the `d` kwarg when calling :meth:`BedTool.merge`,
then it will use whatever the installed version of BEDTools_ uses for `-d`
(currently, `mergeBed`'s default for `-d` is 0).
`-d` is an option to BEDTools_ `mergeBed` that accepts a value, while
`-s` is an option that acts as a switch. In :mod:`pybedtools`, simply
pass a value (integer, float, whatever) for value-type options like `-d`,
and boolean values (`True` or `False`) for the switch-type options like
`-s`.
Here's another example using both types of keyword arguments; the
:class:`BedTool` object `b` (or it could be a string filename too) is
implicitly passed to `intersectBed` as `-b` (see :ref:`default args
principle` above)::
>>> a.intersect(b, v=True, f=0.5)
Again, any option that can be passed to a BEDTools_ program can be passed
to the corresonding :class:`BedTool` method.
.. _`chaining principle`:
Principle 6: Chaining together commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Most methods return new :class:`BedTool` objects, allowing you to chain
things together just like piping commands together on the command line. To
give you a flavor of this, here is how you would get the merged regions of
features shared between :file:`a.bed` (as referred to by the
:class:`BedTool` `a` we made previously) and :file:`b.bed`: (as referred to
by the :class:`BedTool` `b`):
.. doctest::
>>> a.intersect(b).merge().saveas('shared_merged.bed')
<BedTool(shared_merged.bed)>
This is equivalent to the following BEDTools_ commands::
intersectBed -a a.bed -b b.bed | merge -i stdin > shared_merged.bed
Methods that return a new :class:`BedTool` instance are indicated with the following text in their docstring::
.. note::
This method returns a new BedTool instance
.. _`good docs principle`:
Principle 7: Check the help
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you're unsure of whether a method uses a default, or if you want to read
about what options an underlying BEDTools_ program accepts, check the help.
Each :class:`pyBedTool` method that wraps a BEDTools_ program also wraps
the BEDTools_ program help string. There are often examples of how to use
a method in the docstring as well. The documentation is also run through
doctests, so the code you read here is guaranteed to work and be
up-to-date.
|