File: bamtobed.rst

package info (click to toggle)
bedtools 2.26.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 55,328 kB
  • sloc: cpp: 37,989; sh: 6,930; makefile: 2,225; python: 163
file content (159 lines) | stat: -rwxr-xr-x 7,681 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
.. _bamtobed:

###############
*bamtobed*
###############
``bedtools bamtobed`` is a conversion utility that converts sequence alignments 
in BAM format into BED, BED12, and/or BEDPE records. 

==========================================================================
Usage and option summary
==========================================================================
**Usage**:
::

  bedtools bamtobed [OPTIONS] -i <BAM>

**(or)**:
::

    bamToBed [OPTIONS] -i <BAM>



.. tabularcolumns:: |p{4.5cm}|p{8.5cm}|

=============   ================================================================
Option          Description
=============   ================================================================
**-bedpe**      Write BAM alignments in BEDPE format. Only one alignment from 
                paired-end reads will be reported. Specifically, it each mate 
                is aligned to the same chromosome, the BAM alignment reported 
                will be the one where the BAM insert size is greater than zero. 
                When the mate alignments are interchromosomal, the 
                lexicographically lower chromosome will be reported first. 
                Lastly, when an end is unmapped, the chromosome and strand will 
                be set to "." and the start and end coordinates will be set 
                to -1. *By default, this is disabled and the output will be 
                reported in BED format*.
**-mate1**      When writing BEDPE (-bedpe) format,
                always report mate one as the first BEDPE "block".		 
**-bed12**      Write "blocked" BED (a.k.a. BED12) format. This will convert 
                "spliced" BAM alignments (denoted by the "N" CIGAR operation) 
                to BED12. `Forces -split`.
**-split**      Report each portion of a "split" BAM (i.e., having an "N" CIGAR 
                operation) alignment as a distinct BED intervals.
**-splitD**     Report each portion of a "split" BAM while obeying noth "N" CIGAR 
                and "D" operation. Forces `-split`.
**-ed**         Use the "edit distance" tag (NM) for the BED score field. 
                Default for BED is to use mapping quality. Default for BEDPE is 
                to use the *minimum* of the two mapping qualities for the pair. 
                When -ed is used with -bedpe, the total edit distance from the 
                two mates is reported.                                            
**-tag**        Use other *numeric* BAM alignment tag for BED score. Default 
                for BED is to use mapping quality. Disallowed with BEDPE output.
**-color**      An R,G,B string for the color used with BED12 format. Default 
                is (255,0,0).
**-cigar**      Add the CIGAR string to the BED entry as a 7th column.
=============   ================================================================


==========================================================================
Default behavior
==========================================================================
By default, each alignment in the BAM file is converted to a 6 column BED. The 
BED "name" field is comprised of the RNAME field in the BAM alignment. If mate 
information is available, the mate (e.g., "/1" or "/2") field will be appended 
to the name.

.. code-block:: bash

  $ bedtools bamtobed -i reads.bam | head -3
  chr7   118970079   118970129   TUPAC_0001:3:1:0:1452#0/1   37   -
  chr7   118965072   118965122   TUPAC_0001:3:1:0:1452#0/2   37   +
  chr11  46769934    46769984    TUPAC_0001:3:1:0:1472#0/1   37   -


==========================================================================
``-tag`` Set the score field based on BAM tags
==========================================================================
One can override the choice of the BAM `MAPQ` as the result BED record's `score`
field by using the ``-tag`` option.  In the example below, we use the ``-tag``
option to select the BAM edit distance (the `NM` tag) as the score 
column in the resulting BED records.

.. code-block:: bash

  $ bedtools bamtobed -i reads.bam -tag NM | head -3
  chr7   118970079   118970129   TUPAC_0001:3:1:0:1452#0/1   1    -
  chr7   118965072   118965122   TUPAC_0001:3:1:0:1452#0/2   3    +
  chr11  46769934    46769984    TUPAC_0001:3:1:0:1472#0/1   1    -


==========================================================================
``-bedpe`` Set the score field based on BAM tags
==========================================================================
The ``-bedpe`` option converts BAM alignments to BEDPE format, thus allowing
the two ends of a paired-end alignment to be reported on a single text line. 
Specifically, it each mate is aligned to the same chromosome, 
the BAM alignment reported will be the one where the BAM insert size is greater 
than zero. When the mate alignments are interchromosomal, the lexicographically 
lower chromosome will be reported first. Lastly, when an end is unmapped, the 
chromosome and strand will be set to "." and the start and end coordinates will 
be set to -1. 

.. note::

    When using this option, it is required that the BAM 
    file is sorted/grouped by the read name. This allows bamToBed 
    to extract correct alignment coordinates for each end based on 
    their respective CIGAR strings. It also assumes that the 
    alignments for a given pair come in groups of twos. There is 
    not yet a standard method for reporting multiple alignments 
    using BAM. bamToBed will fail if an aligner does not report 
    alignments in pairs.		

.. code-block:: bash
 
  $ bedtools bamtobed -i reads.ba -bedpe | head -3
  chr7   118965072   118965122   chr7   118970079   118970129 TUPAC_0001:3:1:0:1452#0 37     +     -
  chr11  46765606    46765656    chr11  46769934    46769984 TUPAC_0001:3:1:0:1472#0 37     +     -
  chr20  54704674    54704724    chr20  54708987    54709037 TUPAC_0001:3:1:1:1833#0 37     +    

		 
One can easily use samtools and bamToBed together as part of a UNIX pipe. In 
this example, we will only convert properly-paired (``FLAG == 0x2``) reads to 
BED format.

.. code-block:: bash

  $ samtools view -bf 0x2 reads.bam | bedtools bamtobed -i stdin | head
  chr7   118970079   118970129   TUPAC_0001:3:1:0:1452#0/1   37   -
  chr7   118965072   118965122   TUPAC_0001:3:1:0:1452#0/2   37   +
  chr11  46769934    46769984    TUPAC_0001:3:1:0:1472#0/1   37   -
  chr11  46765606    46765656    TUPAC_0001:3:1:0:1472#0/2   37   +
  chr20  54704674    54704724    TUPAC_0001:3:1:1:1833#0/1   37   +
  chr20  54708987    54709037    TUPAC_0001:3:1:1:1833#0/2   37   -
  chrX   9380413     9380463     TUPAC_0001:3:1:1:285#0/1    0    -
  chrX   9375861     9375911     TUPAC_0001:3:1:1:285#0/2    0    +
  chrX   131756978   131757028   TUPAC_0001:3:1:2:523#0/1    37   +
  chrX   131761790   131761840   TUPAC_0001:3:1:2:523#0/2    37   -

  
==================================================================
``-split`` Creating BED12 features from "spliced" BAM entries. 
==================================================================
``bedtools bamtobed`` will, by default, create a BED6 feature that represents 
the entire span of a spliced/split BAM alignment. However, when using the 
``-split`` command, a BED12 feature is reported where BED blocks will be 
created for each aligned portion of the sequencing read.

::
  Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             
  Exons       ***************                                    **********
  
  BED/BAM A      ^^^^^^^^^^^^....................................^^^^
  
  Result      ===============                                    ====