File: filtering.rst

package info (click to toggle)
python-pybedtools 0.10.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 16,620 kB
  • sloc: python: 10,030; cpp: 899; makefile: 142; sh: 57
file content (93 lines) | stat: -rw-r--r-- 2,881 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

.. include:: includeme.rst

.. _filtering:

Filtering
~~~~~~~~~
The :meth:`BedTool.filter` method lets you pass in a function that accepts an
:class:`Interval` as its first argument and returns True for False.  This
allows you to perform "grep"-like operations on :class:`BedTool` objects.  For
example, here's how to get a new :class:`BedTool` containing features from `a`
that are more than 100 bp long:

.. doctest::
    :options: +NORMALIZE_WHITESPACE

    >>> a = pybedtools.example_bedtool('a.bed')
    >>> b = a.filter(lambda x: len(x) > 100)
    >>> print(b)
    chr1	150	500	feature3	0	-
    <BLANKLINE>

The :meth:`filter` method will pass its `*args` and `**kwargs` to the function
provided.  So here is a more generic case, where the function is defined once
and different arguments are passed in for filtering on different lengths:

.. doctest::
    :options: +NORMALIZE_WHITESPACE

    >>> def len_filter(feature, L):
    ...     "Returns True if feature is longer than L"
    ...     return len(feature) > L

Now we can pass different lengths without defining a new function for each
length of interest, like this:

.. doctest::
    :options: +NORMALIZE_WHITESPACE

    >>> a = pybedtools.example_bedtool('a.bed')

    >>> print(a.filter(len_filter, L=10))
    chr1	1	100	feature1	0	+
    chr1	100	200	feature2	0	+
    chr1	150	500	feature3	0	-
    chr1	900	950	feature4	0	+
    <BLANKLINE>

    >>> print(a.filter(len_filter, L=99))
    chr1	100	200	feature2	0	+
    chr1	150	500	feature3	0	-
    <BLANKLINE>

    >>> print(a.filter(len_filter, L=200))
    chr1	150	500	feature3	0	-
    <BLANKLINE>


See :ref:`BedTools as iterators` for more advanced and space-efficient usage
of :meth:`filter` using iterators.

Note that we could have used the built-in Python function, `filter()`, but that
would have returned an iterator that we would have to construct a new
:class:`pybedtools.BedTool` out of.  The :meth:`BedTool.filter` method returns
a ready-to-use :class:`BedTool` object, which allows embedding of
:meth:`BedTool.filter` calls in a chain of commands, e.g.::

    >>> a.intersect(b).filter(lambda x: len(x) < 100).merge()

Fast filtering functions in Cython
----------------------------------

The :mod:`featurefuncs` module contains some ready-made functions written
in Cython that will be faster than pure Python equivalents.  For example,
there are :func:`greater_than` and :func:`less_than` functions, which are
about 70% faster.  In IPython::

    >>> from pybedtools.featurefuncs import greater_than

    >>> len(a)
    310456

    >>> def L(x,width=100):
    ...     return len(x) > 100

    >>> # The %timeit command is from IPython, and won't work
    >>> # in a regular Python script:
    >>> %timeit a.filter(greater_than, 100)
    1 loops, best of 3: 1.74 s per loop

    >>> %timeit a.filter(L, 100)
    1 loops, best of 3: 2.96 s per loop