
|
Sequence record selection options
---------------------------------
.. cmdoption:: -s <REGULAR_PATTERN>, --sequence=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
sequence itself. The pattern is case insensitive.
*Examples:*
.. code-block:: bash
> obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
Selects only the sequence records that contain an *EcoRI* restriction site.
.. code-block:: bash
> obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
Selects only the sequence records that contain a stretch of at least 10 ``A``.
.. code-block:: bash
> obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
Selects only the sequence records that do not contain ambiguous nucleotides.
.. cmdoption:: -D <REGULAR_PATTERN>, --definition=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
definition of the sequence record. The pattern is case
sensitive.
*Example:*
.. code-block:: bash
> obigrep -D '[Cc]hloroplast' seq1.fasta > seq2.fasta
Selects only the sequence records whose definition contains ``chloroplast`` or
``Chloroplast``.
.. cmdoption:: -I <REGULAR_PATTERN>, --identifier=<REGULAR_PATTERN>
Regular expression pattern to be tested against the
identifier of the sequence record. The pattern is case
sensitive.
*Example:*
.. code-block:: bash
> obigrep -I '^GH' seq1.fasta > seq2.fasta
Selects only the sequence records whose identifier begins with ``GH``.
.. cmdoption:: --id-list=<FILENAME>
``<FILENAME>`` points to a text file containing the list of sequence
record identifiers to be selected.
The file format consists in a single identifier per line.
*Example:*
.. code-block:: bash
> obigrep --id-list=my_id_list.txt seq1.fasta > seq2.fasta
Selects only the sequence records whose identifier is present in the
``my_id_list.txt`` file.
.. cmdoption:: -a <KEY>:<REGULAR_PATTERN>,
.. cmdoption:: --attribute=<KEY>:<REGULAR_PATTERN>
Regular expression pattern matched against the
:doc:`attributes of the sequence record <../fasta>`. the value of this attribute
is of the form : key:regular_pattern. The
pattern is case sensitive. Several ``-a`` options can be
used on the same command line and in this last case,
the selected sequence records will match all constraints.
*Example:*
.. code-block:: bash
> obigrep -a 'family_name:Asteraceae' seq1.fasta > seq2.fasta
Selects the sequence records containing an attribute whose key is ``family_name`` and value
is ``Asteraceae``.
.. cmdoption:: -A <ATTRIBUTE_NAME>, --has-attribute=<KEY>
Selects sequence records having an attribute whose key = <KEY>.
*Example:*
.. code-block:: bash
> obigrep -A taxid seq1.fasta > seq2.fasta
Selects only the sequence records having a *taxid* attribute defined.
.. cmdoption:: -p <PYTHON_EXPRESSION>, --predicat=<PYTHON_EXPRESSION>
Python boolean expression to be evaluated for each
sequence record. The attribute keys defined for each sequence record
can be used in the expression as variable names.
An extra variable named 'sequence' refers to the
sequence record itself.
Several -p options can be used on the same command
line and in this last case,
the selected sequence records will match all constraints.
*Example:*
.. code-block:: bash
> obigrep -p '(forward_error<2) and (reverse_error<2)' \
seq1.fasta > seq2.fasta
Selects only the sequence records whose ``forward_error`` and ``reverse_error``
attributes have a value smaller than two.
.. cmdoption:: -L <##>, --lmax=<##>
Keeps sequence records whose sequence length is
equal or shorter than ``lmax``.
*Example:*
.. code-block:: bash
> obigrep -L 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence
length equal or shorter than 100bp.
.. cmdoption:: -l <##>, --lmin=<##>
Selects sequence records whose sequence length is
equal or longer than ``lmin``.
*Examples:*
.. code-block:: bash
> obigrep -l 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence length
equal or longer than 100bp.
.. cmdoption:: -v, --inverse-match
Inverts the sequence record selection.
*Examples:*
.. code-block:: bash
> obigrep -v -l 100 seq1.fasta > seq2.fasta
Selects only the sequence records that have a sequence length shorter than 100bp.
|