1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
|
:py:mod:`ecoPrimers`: new barcode markers and primers
=====================================================
Authors: Eric Coissac <eric.coissac@metabarcoding.org> and Tiayyba Riaz <tiayyba.riaz@metabarcoding.org>
:py:mod:`ecoPrimers` designs the most efficient barcode markers and primers, based
on a set of reference sequence records, and according to specified parameters.
Reference
---------
Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E (2011) ecoPrimers: inference of new DNA
barcode markers from whole genome sequence analysis. Nucleic Acids Research, 39, e145.
:py:mod:`ecoPrimers` specific options
-------------------------------------
.. cmdoption:: -d <filename>
Filename containing the reference sequence records used for designing the barcode
markers and primers (see :doc:`obiconvert <./obiconvert>` for a description
of the database format).
.. WARNING:: This option is compulsory.
.. cmdoption:: -e <INTEGER>
Maximum number of errors (mismatches) allowed per primer (default: 0).
.. cmdoption:: -l <INTEGER>
Minimum length of the barcode, excluding primers.
.. cmdoption:: -L <INTEGER>
Maximum length of the barcode, excluding primers.
.. cmdoption:: -r <TAXID>
Defines the example sequence records (example dataset). Only the sequences of the corresponding
taxonomic group identified by its ``TAXID`` are taken into account for designing the barcodes and
the primers. The ``TAXID`` is an integer that can be found either in the NCBI taxonomic database,
or using the :doc:`ecofind <ecofind>` program.
.. cmdoption:: -i <TAXID>
Defines the counterexample sequence records (counterexample dataset). The barcodes and primers
will be selected in order to avoid the counterexample taxonomic group identified by its ``TAXID``.
.. cmdoption:: -E <TAXID>
Defines an counterexample taxonomic group (identified by its ``TAXID``) within the example
dataset.
.. cmdoption:: -c
Considers that the sequences of the database are circular (e.g. mitochondrial
or chloroplast DNA).
.. cmdoption:: -3 <INTEGER>
Defines the number of nucleotides on the 3' end of the primers that must have a strict match
with their target sequences.
.. cmdoption:: -q <FLOAT>
Defines the strict matching quorum, i.e. the proportion of the sequence records in which a
strict match between the primers and their targets occurs (default: 0.7)
.. cmdoption:: -s <FLOAT>
Defines the sensitivity quorum, i.e. the proportion of the example sequence records that
must fulfill the specified parameters for designing the barcodes and the primers.
.. cmdoption:: -x <FLOAT>
Defines the false positive quorum, i.e. the maximum proportion of the counterexample
sequence records that fulfill the specified parameters for designing the barcodes and
the primers.
.. cmdoption:: -t <TAXONOMIC_LEVEL>
Defines the taxonomic level that is considered for evaluating the barcodes and primers in
the output of :py:mod:`ecoPrimers`. The default taxonomic level is the species level. When
using a taxonomic database builts from a :doc:`NCBI taxonomy dump files <../taxdump>`, the
other possible taxonomic levels are genus, family, order, class, phylum, kingdom, and
superkingdom.
.. cmdoption:: -D
Sets the double strand mode.
.. cmdoption:: -S
Sets the single strand mode.
.. cmdoption:: -O <INTEGER>
Sets the primer length (default: 18).
.. cmdoption:: -m <1|2>
Defines the method used for estimating the *Tm* (melting temperature) between
the primers and their corresponding target sequences (default: 1).
1 SantaLucia method (SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS, 95, 1460-1465).
2 Owczarzy method (Owczarzy R, Vallone PM, Gallo FJ *et al.* (1997) Predicting sequence-dependent melting stability of short duplex DNA oligomers. Biopolymers, 44, 217-239).
.. cmdoption:: -a <FLOAT>
Salt concentration used for estimating the *Tm* (default: 0.05).
.. cmdoption:: -U
No multi match of a primer on the same sequence record.
.. cmdoption:: -R <TEXT>
Defines the reference sequence by indicating its identifier in the database.
.. cmdoption:: -A
Prints the list of all identifiers of sequence records in the database.
.. cmdoption:: -f
Remove data mining step during strict primer identification.
.. cmdoption:: -v
Stores statistic file about memory usage during strict primer identification.
.. cmdoption:: -h
Print help.
Output file
-----------
The output file contains several columns, with '|' as separator, and describes
the characteristics of each barcode and its associated primers.
column 1: serial number
column 2: sequence of primer 1
column 3: sequence of primer 2
column 4: *Tm* (melting temperature) of primer 1, without mismatch
column 5: lowest *Tm* of primer 1 against example sequence records
column 6: *Tm* of primer 2, without mismatch
column 7: lowest *Tm* of primer 2 against example sequence records
column 8: number of C or G in primer 1
column 9: number of C or G in primer 2
column 10: GG (*Good-Good*) means that both primer are specific to the example dataset,
GB or BG (*Good-Bad* or *Bad-Good*) means that only one of the two primers
is specific to the example dataset
column 11: number of sequence records of the example dataset that are properly amplified according to the specified parameters
column 12: proportion of sequence records of the example dataset that are properly amplified according to the specified parameters
column 13: yule-like output
column 14: number of taxa of the example dataset that are properly amplified according to the specified parameters
column 15: number of taxa of the counterexample dataset that are properly amplified according to the specified parameters
column 16: proportion of taxa of the example dataset that are properly amplified according to the specified parameters (*Bc* index)
column 17: number of taxa of the example dataset that are properly identified
column 18: proportion of taxa of the example dataset that are properly identified (*Bs* index)
column 19: minimum length of the barcode in base pairs for the example sequence records (excluding primers)
column 20: maximum length of the barcode in base pairs for the example sequence records (excluding primers)
column 21: average length of the barcode in base pairs for the example sequence records(excluding primers)
Examples
--------
*Example 1:*
.. code-block:: bash
> ecoPrimers -d mydatabase -e 3 -l 50 \
-L 800 -r 2759 -3 2 > mybarcodes.ecoprimers
Launches a search for barcodes and corresponding primers on mydatabase (see
:doc:`obiconvert <./obiconvert>` for a description of the database format), with a maximum
of three mismatches for each primer. The minimum and maximum barcode lengths (excluding
primers) are 50 bp and 800 bp, respectively. The search is restricted to the taxonomic
group identified by its *taxid* (2759 corresponds to the Diatoma). The two last
Nucleotides on the 3' end of the primers must have a perfect match with their target sequences.
The results are saved in the mybarcodes.ecoprimers file.
*Example 2:*
.. code-block:: bash
> ecoPrimers -d mydatabase -e 2 -l 30 -L 120 \
-r 7742 - i 2 -E 9604 -3 2 > mybarcodes.ecoprimers
Launches a search for barcodes and corresponding primers on mydatabase (see :doc:`obiconvert <./obiconvert>`
for a description of the database format), with a maximum of two mismatches for each primer. The minimum and
maximum barcode lengths (excluding primers) are 30 bp and 120 bp, respectively. The search is
restricted to the Vertebrates, excluding Bacteria and Hominidae (7742, 2, and 9604 corresponds to
the `TAXID` of Vertebrates, Bacteria, and Hominidae, respectively. The two last nucleotides on
the 3' end of the primers must have a perfect match with their target sequences. The results
are saved in the mybarcodes.ecoprimers file.
|