File: each.rst

package info (click to toggle)
python-pybedtools 0.10.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 16,620 kB
  • sloc: python: 10,030; cpp: 899; makefile: 142; sh: 57
file content (66 lines) | stat: -rw-r--r-- 2,400 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
.. include:: includeme.rst

Each
====
Similar to :meth:`BedTool.filter`, which applies a function to return True
or False given an :class:`Interval`, the :meth:`BedTool.each` method applies a
function to return a new, possibly modified :class:`Interval`.

The :meth:`BedTool.each` method applies a function to every feature.  Like
:meth:`BedTool.filter`, you can use your own function or some pre-defined
ones in the :mod:`featurefuncs` module.  Also like :meth:`filter`, `*args`
and `**kwargs` are sent to the function.

.. doctest::
    :options: +NORMALIZE_WHITESPACE

    >>> a = pybedtools.example_bedtool('a.bed')
    >>> b = pybedtools.example_bedtool('b.bed')

    >>> # The results of an "intersect" with c=True will return features
    >>> # with an additional field representing the counts.
    >>> with_counts = a.intersect(b, c=True)

Let's define a function that will take the number of counts in each feature
as calculated above and divide by the number of bases in that feature.  We
can also supply an optional scalar, like 0.001, to get the results in
"number of intersections per kb".  We then insert that value into the score
field of the feature.  Here's the function:

.. doctest::

    >>> def normalize_count(feature, scalar=0.001):
    ...     """
    ...     assume feature's last field is the count
    ...     """
    ...     counts = float(feature[-1])
    ...     normalized = round(counts / (len(feature) * scalar), 2)
    ...
    ...     # need to convert back to string to insert into feature
    ...     feature.score = str(normalized)
    ...     return feature

And we apply it like this:

.. doctest::
    :options: +NORMALIZE_WHITESPACE

    >>> normalized = with_counts.each(normalize_count)
    >>> print(normalized)
    chr1	1	100	feature1	0.0	+	0
    chr1	100	200	feature2	10.0	+	1
    chr1	150	500	feature3	2.86	-	1
    chr1	900	950	feature4	20.0	+	1
    <BLANKLINE>

Similar to :meth:`BedTool.filter`, we could have used the Python built-in
function `map` to map a function to each :class:`Interval`.  In fact, this can
still be useful if you don't want a :class:`BedTool` object as a result.  For
example::

    >>> feature_lengths = map(len, a)

However, the :meth:`BedTool.each` method returns a :class:`BedTool` object,
which can be used in a chain of commands, e.g., ::

    >>> a.intersect(b).each(normalize_count).filter(lamda x: float(x[4]) < 1e-5)