File: cluster.rst

package info (click to toggle)
bedtools 2.26.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 55,328 kB
  • sloc: cpp: 37,989; sh: 6,930; makefile: 2,225; python: 163
file content (123 lines) | stat: -rw-r--r-- 4,085 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
.. _cluster:

###############
*cluster*
###############

|

.. image:: ../images/tool-glyphs/cluster-glyph.png 
    :width: 600pt 
|


Similar to :doc:`../tools/merge`, ``cluster`` report each set of overlapping or 
"book-ended" features in an interval file.  In contrast to ``merge``, 
``cluster`` does not flatten the cluster of intervals into a new meta-interval;
instead, it assigns an unique cluster ID to each record in each cluster.  This
is useful for having fine control over how sets of overlapping intervals in 
a single interval file are combined.

.. note::

    ``bedtools cluster`` requires that you presort your data by chromosome and
    then by start position (e.g., ``sort -k1,1 -k2,2n in.bed > in.sorted.bed``
    for BED files).
    
.. seealso::

    :doc:`../tools/merge`
    

==========================================================================
Usage and option summary
==========================================================================
**Usage**:
::

  bedtools cluster [OPTIONS] -i <BED/GFF/VCF> 

**(or)**:
::

  clusterBed [OPTIONS] -i <BED/GFF/VCF>


  
===========================      ===============================================================================================================================================================================================================
Option                           Description
===========================      ===============================================================================================================================================================================================================
**-s**				             Force strandedness. That is, only cluster features that are the same strand. *By default, this is disabled*.
**-d**                           Maximum distance between features allowed for features to be clustered. *Default is 0. That is, overlapping and/or book-ended features are clustered*.
===========================      ===============================================================================================================================================================================================================





==========================================================================
Default behavior
==========================================================================
By default, ``bedtools cluster`` collects overlapping (by at least 1 bp) and/or
bookended intervals into distinct clusters.  In the example below, the 4th 
column is the cluster ID.
  
.. code-block:: bash

  $ cat A.bed
  chr1  100  200
  chr1  180  250
  chr1  250  500
  chr1  501  1000

  $ bedtools cluster -i A.bed
  chr1	100	200	1
  chr1	180	250	1
  chr1	250	500	1
  chr1	501	1000	2


==========================================================================
``-s`` Enforcing "strandedness" 
==========================================================================
The ``-s`` option will only cluster intervals that are overlapping/bookended
*and* are on the same strand.

.. code-block:: bash

  $ cat A.bed
  chr1  100  200   a1  1 +
  chr1  180  250   a2  2 +
  chr1  250  500   a3  3 - 
  chr1  501  1000  a4  4 +

  $ bedtools cluster -i A.bed -s
  chr1	100	200	a1	1	+	1
  chr1	180	250	a2	2	+	1
  chr1	501	1000	a4	4	+	2
  chr1	250	500	a3	3	-	3


==========================================================================
``-d`` Controlling how close two features must be in order to cluster 
==========================================================================
By default, only overlapping or book-ended features are combined into a new 
feature. However, one can force ``cluster`` to combine more distant features 
with the ``-d`` option. For example, were one to set ``-d`` to 1000, any 
features that overlap or are within 1000 base pairs of one another will be 
clustered.

.. code-block:: bash

  $ cat A.bed
  chr1  100  200
  chr1  501  1000
  
  $ bedtools cluster -i A.bed
  chr1  100  200    1
  chr1  501  1000   2

  $ bedtools cluster -i A.bed -d 1000
  chr1  100  200    1
  chr1  501  1000   1