1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174
|
Sequence record selection options
---------------------------------
.. cmdoption:: -s <REGULAR_PATTERN>, --sequence=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
sequence itself. The pattern is case insensitive.
*Examples:*
.. code-block:: bash
> obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
Selects only the sequence records that contain an *EcoRI* restriction site.
.. code-block:: bash
> obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
Selects only the sequence records that contain a stretch of at least 10 ``A``.
.. code-block:: bash
> obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
Selects only the sequence records that do not contain ambiguous nucleotides.
.. cmdoption:: -D <REGULAR_PATTERN>, --definition=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
definition of the sequence record. The pattern is case
sensitive.
*Example:*
.. code-block:: bash
> obigrep -D '[Cc]hloroplast' seq1.fasta > seq2.fasta
Selects only the sequence records whose definition contains ``chloroplast`` or
``Chloroplast``.
.. cmdoption:: -I <REGULAR_PATTERN>, --identifier=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
identifier of the sequence record. The pattern is case
sensitive.
*Example:*
.. code-block:: bash
> obigrep -I '^GH' seq1.fasta > seq2.fasta
Selects only the sequence records whose identifier begins with ``GH``.
.. cmdoption:: --id-list=<FILENAME>
``<FILENAME>`` points to a text file containing the list of sequence
record identifiers to be selected.
The file format consists in a single identifier per line.
*Example:*
.. code-block:: bash
> obigrep --id-list=my_id_list.txt seq1.fasta > seq2.fasta
Selects only the sequence records whose identifier is present in the
``my_id_list.txt`` file.
.. cmdoption:: -a <KEY>:<REGULAR_PATTERN>,
.. cmdoption:: --attribute=<KEY>:<REGULAR_PATTERN>
Regular expression pattern matched against the
:doc:`attributes of the sequence record <../fasta>`. the value of this attribute
is of the form : key:regular_pattern. The
pattern is case sensitive. Several ``-a`` options can be
used on the same command line and in this last case,
the selected sequence records will match all constraints.
*Example:*
.. code-block:: bash
> obigrep -a 'family_name:Asteraceae' seq1.fasta > seq2.fasta
Selects the sequence records containing an attribute whose key is ``family_name`` and value
is ``Asteraceae``.
.. cmdoption:: -A <ATTRIBUTE_NAME>, --has-attribute=<KEY>
Selects sequence records having an attribute whose key = <KEY>.
*Example:*
.. code-block:: bash
> obigrep -A taxid seq1.fasta > seq2.fasta
Selects only the sequence records having a *taxid* attribute defined.
.. cmdoption:: -p <PYTHON_EXPRESSION>, --predicat=<PYTHON_EXPRESSION>
Python boolean expression to be evaluated for each
sequence record. The attribute keys defined for each sequence record
can be used in the expression as variable names.
An extra variable named 'sequence' refers to the
sequence record itself.
Several -p options can be used on the same command
line and in this last case,
the selected sequence records will match all constraints.
*Example:*
.. code-block:: bash
> obigrep -p '(forward_error<2) and (reverse_error<2)' \
seq1.fasta > seq2.fasta
Selects only the sequence records whose ``forward_error`` and ``reverse_error``
attributes have a value smaller than two.
.. cmdoption:: -L <##>, --lmax=<##>
Keeps sequence records whose sequence length is
equal or shorter than ``lmax``.
*Example:*
.. code-block:: bash
> obigrep -L 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence
length equal or shorter than 100bp.
.. cmdoption:: -l <##>, --lmin=<##>
Selects sequence records whose sequence length is
equal or longer than ``lmin``.
*Examples:*
.. code-block:: bash
> obigrep -l 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence length
equal or longer than 100bp.
.. cmdoption:: -v, --inverse-match
Inverts the sequence record selection.
*Examples:*
.. code-block:: bash
> obigrep -v -l 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence length shorter than 100bp.
|