File: merge.rst

package info (click to toggle)
bedtools 2.26.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 55,328 kB
  • sloc: cpp: 37,989; sh: 6,930; makefile: 2,225; python: 163
file content (278 lines) | stat: -rwxr-xr-x 10,204 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
.. _merge:

###############
*merge*
###############

|

.. image:: ../images/tool-glyphs/merge-glyph.png 
    :width: 600pt 
|



``bedtools merge`` combines overlapping or "book-ended" features in an interval 
file into a single feature which spans all of the combined features.

.. note::

    ``bedtools merge`` requires that you presort your data by chromosome and
    then by start position (e.g., ``sort -k1,1 -k2,2n in.bed > in.sorted.bed``
    for BED files).
    
.. seealso::

    :doc:`../tools/cluster`
    :doc:`../tools/complement`
    

==========================================================================
Usage and option summary
==========================================================================
**Usage**:
::

  bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM> 

**(or)**:
::

  mergeBed [OPTIONS] -i <BED/GFF/VCF/BAM>


  
===========================      ===============================================================================================================================================================================================================
Option                           Description
===========================      ===============================================================================================================================================================================================================
**-s**				                   Force strandedness. That is, only merge features that are the same strand. *By default, this is disabled*.
**-S**                           Force merge for one specific strand only. Follow with + or - to force merge from only the forward or reverse strand, respectively. *By default, merging is done without respect to strand*.
**-d**                           Maximum distance between features allowed for features to be merged. *Default is 0. That is, overlapping and/or book-ended features are merged*.
**-c**                           Specify columns from the input file to operate upon (see -o option, below). Multiple columns can be specified in a comma-delimited list.
**-o**                           | Specify the operation that should be applied to ``-c``.
                                 | Valid operations:
                                 |     sum, min, max, absmin, absmax,
                                 |     mean, median,
                                 |     collapse (i.e., print a delimited list (duplicates allowed)),
                                 |     distinct (i.e., print a delimited list (NO duplicates allowed)),
                                 |     count
                                 |     count_distinct (i.e., a count of the unique values in the column),
                                 | **Default:** sum
                                 | Multiple operations can be specified in a comma-delimited list. 
                                 | If there is only column, but multiple operations, all operations will be
                                 | applied on that column. Likewise, if there is only one operation, but
                                 | multiple columns, that operation will be applied to all columns.
                                 | Otherwise, the number of columns must match the the number of operations,
                                 | and will be applied in respective order.
                                 |
                                 | E.g., ``-c 5,4,6 -o sum,mean,count`` will give the sum of column 5,
                                 | the mean of column 4, and the count of column 6.
                                 | The order of output columns will match the ordering given in the command.
**-header**                      | Print the header from the A file prior to results.

**-delim**                       | Specify a custom delimiter for the -nms and -scores concat options
                                 | Example: ``-delim "|"``
                                 | ``Default: ";"``
===========================      ===============================================================================================================================================================================================================





==========================================================================
Default behavior
==========================================================================
By default, ``bedtools merge`` combines overlapping (by at least 1 bp) and/or
bookended intervals into a single, "flattened" or "merged" interval.
  
.. code-block:: bash

  $ cat A.bed
  chr1  100  200
  chr1  180  250
  chr1  250  500
  chr1  501  1000

  $ bedtools merge -i A.bed
  chr1  100  500
  chr1  501  1000


==========================================================================
``-s`` Enforcing "strandedness" 
==========================================================================
The ``-s`` option will only merge intervals that are overlapping/bookended
*and* are on the same strand.

.. code-block:: bash

  $ cat A.bed
  chr1  100  200   a1  1 +
  chr1  180  250   a2  2 +
  chr1  250  500   a3  3 - 
  chr1  501  1000  a4  4 +

  $ bedtools merge -i A.bed -s
  chr1  100  250
  chr1  501  1000
  chr1  250  500

To also report the strand, you could use the ``-c`` and ``-o`` operators (see below for more details):

.. code-block:: bash

  $ bedtools merge -i A.bed -s -c 6 -o distinct
  chr1  100 250 +
  chr1  501 1000  +

==========================================================================
``-S`` Reporting merged intervals on a specific strand. 
==========================================================================
The ``-S`` option will only merge intervals for a specific strand. For example,
to only report merged intervals on the "+" strand:

.. code-block:: bash

  $ cat A.bed
  chr1  100  200   a1  1 +
  chr1  180  250   a2  2 +
  chr1  250  500   a3  3 - 
  chr1  501  1000  a4  4 +

  $ bedtools merge -i A.bed -S + 
  chr1  100 250
  chr1  501 1000

To also report the strand, you could use the ``-c`` and ``-o`` operators (see below for more details):

.. code-block:: bash

  $ bedtools merge -i A.bed -S + -c 6 -o distinct
  chr1  100 250 +
  chr1  501 1000  +


==========================================================================
``-d`` Controlling how close two features must be in order to merge 
==========================================================================
By default, only overlapping or book-ended features are combined into a new 
feature. However, one can force ``merge`` to combine more distant features 
with the ``-d`` option. For example, were one to set ``-d`` to 1000, any 
features that overlap or are within 1000 base pairs of one another will be 
combined.

.. code-block:: bash

  $ cat A.bed
  chr1  100  200
  chr1  501  1000
  
  $ bedtools merge -i A.bed
  chr1  100  200
  chr1  501  1000

  $ bedtools merge -i A.bed -d 1000
  chr1  100  200  1000

==========================================================================
``-c`` and ``-o`` Applying operations to columns from merged intervals. 
==========================================================================
When merging intervals, we often want to summarize or keep track of the 
values observed in specific columns (e.g., the feature name or score) from
the original, unmerged intervals. When used together, the ``-c`` and ``-o``
options allow one to select specific columns (``-c``) and apply operation 
(``-o``) to each column. The result will be appended to the default, merged
interval output.  For example, one could use the following to report the
count of intervals that we merged in each resulting interval (this replaces
the ``-n`` option that existed prior to version ``2.20.0``).

.. code-block:: bash

  $ cat A.bed
  chr1  100  200
  chr1  180  250
  chr1  250  500
  chr1  501  1000
  
  $ bedtools merge -i A.bed -c 1 -o count
  chr1  100  500  3
  chr1  501  1000 1

We could also use these options to report the mean of the score (#5) field:

.. code-block:: bash

  $ cat A.bed
  chr1  100  200   a1  1 +
  chr1  180  250   a2  2 +
  chr1  250  500   a3  3 - 
  chr1  501  1000  a4  4 +

  $ bedtools merge -i A.bed -c 5 -o mean
  chr1  100 500 2
  chr1  501 1000  4

Let's get fancy and report the mean, min, and max of the score column:

.. code-block:: bash

  $ bedtools merge -i A.bed -c 5 -o mean,min,max
  chr1  100 500 2 1 3
  chr1  501 1000  4 4 4

Let's also report a comma-separated list of the strands:

.. code-block:: bash

  $ bedtools merge -i A.bed -c 5,5,5,6 -o mean,min,max,collapse
  chr1  100 500 2 1 3 +,+,-
  chr1  501 1000  4 4 4 +

Hopefully this provides a clear picture of what can be done.


==========================================================================
``-n`` Reporting the number of features that were merged 
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.

==========================================================================
``-nms`` Reporting the names of the features that were merged 
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.
  
==========================================================================
``-scores`` Reporting the scores of the features that were merged 
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.
  
  
==========================================================================
``-delim`` Change the delimiter for ``-c`` and ``-o``
==========================================================================
One can override the use of a comma as the delimiter for the ``-c`` and
``-o collapse|distinct`` options via the use of the ``-delim`` option.

.. code-block:: bash

  $ cat A.bed
  chr1  100  200  A1
  chr1  150  300  A2
  chr1  250  500  A3

Compare:
 
.. code-block:: bash

  $ bedtools merge -i A.bed -c 4 -o collapse
  chr1  100  500  A1,A2,A3
  
to:

.. code-block:: bash

  $ bedtools merge -i A.bed -c 4 -o collapse -delim "|"
  chr1  100  500  A1|A2|A3