1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
|
.. _window:
###############
*window*
###############
|
.. image:: ../images/tool-glyphs/window-glyph.png
:width: 600pt
|
Similar to ``bedtools intersect``, ``window`` searches for overlapping features
in A and B. However, ``window`` adds a specified number (1000, by default) of
base pairs upstream and downstream of each feature in A. In effect, this allows
features in B that are "near" features in A to be detected.
===============================
Usage and option summary
===============================
**Usage**:
::
bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF>
**(or)**:
::
bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF>
=========================== =========================================================================================================================================================
Option Description
=========================== =========================================================================================================================================================
**-abam** BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view -b <BAM> | bedtools window -abam stdin -b genes.bed
**-ubam** Write uncompressed BAM output. The default is write compressed BAM output.
**-bed** When using BAM input (-abam), write output as BED. The default is to write output in BAM when using -abam. For example: bedtools window -abam reads.bam -b genes.bed -bed
**-w** Base pairs added upstream and downstream of each entry in A when searching for overlaps in B. *Default is 1000 bp*.
**-l** Base pairs added upstream (left of) of each entry in A when searching for overlaps in B. *Allows one to create assymetrical "windows". Default is 1000bp*.
**-r** Base pairs added downstream (right of) of each entry in A when searching for overlaps in B. *Allows one to create assymetrical "windows". Default is 1000bp*.
**-sw** Define -l and -r based on strand. For example if used, -l 500 for a negative-stranded feature will add 500 bp downstream. *By default, this is disabled*.
**-sm** Only report hits in B that overlap A on the same strand. *By default, overlaps are reported without respect to strand*.
**-Sm** Only report hits in B that overlap A on the opposite strand. *By default, overlaps are reported without respect to strand*.
**-u** Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B.
**-c** For each entry in A, report the number of hits in B while restricting to -f. Reports 0 for A entries that have no overlap with B.
**-v** Only report those entries in A that have *no overlaps* with B.
**-header** Print the header from the A file prior to results.
=========================== =========================================================================================================================================================
==========================================================================
Default behavior
==========================================================================
By default, ``bedtools window`` adds 1000 bp upstream and downstream of each A
feature and searches for features in B that overlap this "window". If an overlap
is found in B, both the *original* A feature and the *original* B feature are
reported.
.. code-block:: bash
$ cat A.bed
chr1 100 200
$ cat B.bed
chr1 500 1000
chr1 1300 2000
$ bedtools window -a A.bed -b B.bed
chr1 100 200 chr1 500 1000
==========================================================================
``-w`` Defining a custom window size
==========================================================================
Instead of using the default window size of 1000bp, one can define a custom,
*symmetric* window around each feature in A using the **-w** option. One should
specify the window size in base pairs. For example, a window of 5kb should be
defined as ``-w 5000``.
For example (note that in contrast to the default behavior,
the second B entry is reported):
.. code-block:: bash
$ cat A.bed
chr1 100 200
$ cat B.bed
chr1 500 1000
chr1 1300 2000
$ bedtools window -a A.bed -b B.bed -w 5000
chr1 100 200 chr1 500 1000
chr1 100 200 chr1 1300 2000
==========================================================================
``-l and -r`` Defining *assymteric* windows
==========================================================================
One can also define asymmetric windows where a differing number of bases are
added upstream and downstream of each feature using the ``-l`` (upstream)
and ``-r`` (downstream)** options.
.. note::
By default, the ``-l`` and ``-r`` options ignore strand. If you want to define
*upstream* and *downstream* based on strand, use the ``-sw`` option (below)
with the ``-l`` and ``-r`` options.
For example (note the difference between -l 200 and -l 300):
.. code-block:: bash
$ cat A.bed
chr1 1000 2000
$ cat B.bed
chr1 500 800
chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 200 -r 20000
chr1 1000 2000 chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 300 -r 20000
chr1 1000 2000 chr1 500 800
chr1 1000 2000 chr1 10000 20000
==========================================================================
``-sw`` Defining assymteric windows based on strand
==========================================================================
Especially when dealing with gene annotations or RNA-seq experiments, you may
want to define asymmetric windows based on "strand". For example, you may want
to screen for overlaps that occur within 5000 bp upstream of a gene (e.g. a
promoter region) while screening only 1000 bp downstream of the gene.
By enabling the ``-sw`` ("stranded" windows) option, the windows are added
upstream or downstream according to strand. For example, imagine one
specifies ``-l 5000``, ``-r 1000`` as well as the ``-sw`` option. In this case,
forward stranded ("+") features will screen 5000 bp to the *left* (that is,
*lower* genomic coordinates) and 1000 bp to the *right* (that is, *higher*
genomic coordinates). By contrast, reverse stranded ("-") features will screen
5000 bp to the *right* (that is, *higher* genomic coordinates) and 1000 bp to
the *left* (that is, *lower* genomic coordinates).
For example (note the difference between ``-l 200`` and ``-l 300``):
.. code-block:: bash
$ cat A.bed
chr1 10000 20000 A.forward 1 +
chr1 10000 20000 A.reverse 1 -
$ cat B.bed
chr1 1000 8000 B1
chr1 24000 32000 B2
$ bedtools window -a A.bed -b B.bed -l 5000 -r 1000 -sw
chr1 10000 20000 A.forward 1 + chr1 1000 8000 B1
chr1 10000 20000 A.reverse 1 - chr1 24000 32000 B2
==========================================================================
``-sm`` Enforcing matches with the *same* "strandedness"
==========================================================================
This option behaves the same as the ``-s`` option for ``bedtools intersect``
while scanning for overlaps within the "window" surrounding A. That is, overlaps
in B will only be included if the B interval is on the *same* strand as the A
interval.
==========================================================================
``-Sm`` Enforcing matches with the *opposite* "strandedness"
==========================================================================
This option behaves the same as the ``-S`` option for ``bedtools intersect`` while
scanning for overlaps within the "window" surrounding A. That is, overlaps in
B will only be included if the B interval is on the *opposite* strand as the A
interval.
==========================================================================
``-u`` Reporting the presence/absence of at least one overlapping feature
==========================================================================
This option behaves the same as for ``bedtools intersect``. That is, even if
multiple overlaps exist, each A interval will only be reported once.
==========================================================================
``-c`` Reporting the number of overlapping features
==========================================================================
This option behaves the same as for ``bedtools intersect``. That is, it will
report the *count* of intervals in B that overlap each A interval.
==========================================================================
``-v`` Reporting the absence of any overlapping features
==========================================================================
This option behaves the same as for ``bedtools intersect``. That is, it will
only report those intervals in A that have have *zero* overlaps in B.
==========================================================================
``-header`` Print the header for the A file before reporting results.
==========================================================================
By default, if your A file has a header, it is ignored when reporting results.
This option will instead tell bedtools to first print the header for the
A file prior to reporting results.
|