File: maskfasta.rst

package info (click to toggle)
bedtools 2.26.0%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 55,328 kB
  • sloc: cpp: 37,989; sh: 6,930; makefile: 2,225; python: 163
file content (115 lines) | stat: -rwxr-xr-x 3,637 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
.. _maskfasta:

###############
*maskfasta*
###############


|

.. image:: ../images/tool-glyphs/maskfasta-glyph.png 
    :width: 600pt 


``bedtools maskfasta`` masks sequences in a FASTA file based on intervals defined in a feature file. The
headers in the input FASTA file must exactly match the chromosome column in the feature file. This
may be useful fro creating your own masked genome file based on custom annotations or for masking all
but your target regions when aligning sequence data from a targeted capture experiment.


==========================================================================
Usage and option summary
==========================================================================
**Usage**

.. code-block:: bash

  $ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA>
  
**(or):**

.. code-block:: bash

  $ maskFastaFromBed [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA>


.. note::

    The input (``-fi``) and output (``-fo``) FASTA files must be different.

.. seealso::

    :doc:`../tools/getfasta`


===========================      ==========================================================================================================================================
 Option                           Description
===========================      ==========================================================================================================================================
**-soft**				         Soft-mask (that is, convert to lower-case bases) the FASTA sequence. *By default, hard-masking (that is, conversion to Ns) is performed*. 
**-mc**				             Replace masking character.  That is, instead of masking with Ns, use another character.
===========================      ==========================================================================================================================================



==========================================================================
Default behavior
==========================================================================
**bedtools maskfasta** will mask a FASTA file based on the intervals in a 
BED file. The newly masked FASTA file is written to the output FASTA file.

.. code-block:: bash

  $ cat test.fa
  >chr1
  AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

  $ cat test.bed
  chr1 5 10

  $ bedtools maskfasta -fi test.fa -bed test.bed -fo test.fa.out
  
  $ cat test.fa.out
  >chr1
  AAAAANNNNNCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG


==========================================================================
``-soft`` Soft-masking the FASTA file.
==========================================================================
Using the **-soft** option, one can optionally "soft-mask" the FASTA file.

.. code-block:: bash

  $ cat test.fa
  >chr1
  AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

  $ cat test.bed
  chr1 5 10

  $ bedtools maskfasta -fi test.fa -bed test.bed -fo test.fa.out -soft

  $ cat test.fa.out
  >chr1
  AAAAAaaaccCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

==========================================================================
``-mc`` Specify a masking character.
==========================================================================
Using the **-mc** option, one can optionally choose a masking character to each
base that will be masked by the BED file.

.. code-block:: bash

  $ cat test.fa
  >chr1
  AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

  $ cat test.bed
  chr1 5 10

  $ bedtools maskfasta -fi test.fa -bed test.bed -fo test.fa.out -mc X

  $ cat test.fa.out
  >chr1
  AAAAAXXXXXCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG