1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
|
# Copyright 2024 by Samuel Prince. All rights reserved.
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.
"""Bio.SearchIO support for Infernal output formats.
This module adds support for parsing Infernal outputs. Infernal is a
suite of programs for searching DNA sequence databases for RNA structure
and sequence similarities using covariance models (CMs).
Bio.SearchIO.InfernalIO was tested on the following Infernal versions and flavors:
- Infernal (1.0.0+): cmscan and cmsearch
More information on HMMER are available through these links:
- Web page: http://eddylab.org/infernal/
- User guide: http://eddylab.org/infernal/Userguide.pdf
Supported formats
=================
Bio.SearchIO.InfernalIO supports the following Infernal output formats:
- Plain text - 'infernal-text' - parsing, indexing
- Tabular - 'infernal-tab' - parsing, indexing
For all output formats, Infernal uses 'mdl' for 'query' and 'seq' for 'hit'.
InfernalIO is aware of this different naming scheme, and will use 'query'
and 'hit' to fit SearchIO's object model.
Infernal sometime reports 'local ends' (i.e., a large insertion or deletion
in the optimal alignment), which are expresented by a number in brackets in
the alignment (ex. AUUAC*[88]*GUAGU). In InfernalIO, these local alignment
are split into fragments of the same HSP.
infernal-text
=============
The Infernal plain text parser supports output files with alignment
blocks (default) or without (with the '-noali' flag). If the alignment blocks
are present, it can parse files with variable alignment width (using the
'-notextw' or '-textw' flag). Both CM or HMM searches (with the '--hmmonly'
flag) output are supported. The parser only supports non-verbose output formats.
The following SearchIO objects attributes are provided.
+-----------------+-------------------------+----------------------------------+
| Object | Attribute | Value |
+=================+=========================+==================================+
| QueryResult | accession | query accession |
| +-------------------------+----------------------------------+
| | description | query sequence description |
| +-------------------------+----------------------------------+
| | id | query sequence ID |
| +-------------------------+----------------------------------+
| | program | Infernal flavor |
| +-------------------------+----------------------------------+
| | seq_len | full length of query sequence |
| +-------------------------+----------------------------------+
| | target | target search database |
| +-------------------------+----------------------------------+
| | version | Infernal version |
+-----------------+-------------------------+----------------------------------+
| Hit | description | hit sequence description |
| +-------------------------+----------------------------------+
| | id | hit sequence ID |
+-----------------+-------------------------+----------------------------------+
| HSP | evalue | hsp evalue |
| +-------------------------+----------------------------------+
| | bias | hsp bias |
| +-------------------------+----------------------------------+
| | bitscore | hsp score |
| +-------------------------+----------------------------------+
| | gc | gc fraction |
| +-------------------------+----------------------------------+
| | is_included | boolean, whether the hit of the |
| | | hsp is in the inclusion |
| | | threshold |
| +-------------------------+----------------------------------+
| | query_start | query start position |
| +-------------------------+----------------------------------+
| | query_end | query end position |
| +-------------------------+----------------------------------+
| | query_endtype | query sequence end types (e.g., |
| | | '[]', '..', '[.', '.]', etc.) |
| +-------------------------+----------------------------------+
| | hit_start | hit start position |
| +-------------------------+----------------------------------+
| | hit_end | hit end position |
| +-------------------------+----------------------------------+
| | hit_endtype | hit sequence end types |
| +-------------------------+----------------------------------+
| | acc_avg | expected accuracy per alignment |
| | | residue (acc column) |
| +-------------------------+----------------------------------+
| | model | type of model used (cm or hmm) |
| +-------------------------+----------------------------------+
| | truncated | indicate if the hit is truncated |
| | | (5', 3' or both) or not |
+-----------------+-------------------------+----------------------------------+
| HSPFragment | aln_annotation | alignment similarity string and |
| | | other annotations (PP, CS, |
| | | similarity and NC (except for |
| | | --hmmonly)) |
| +-------------------------+----------------------------------+
| | aln_span | length of alignment fragment |
| +-------------------------+----------------------------------+
| | hit | hit sequence |
| +-------------------------+----------------------------------+
| | hit_start | local alignment sequence start |
| | | coordinate (seq from) |
| +-------------------------+----------------------------------+
| | hit_end | local alignment sequence end |
| | | coordinate (seq to) |
| +-------------------------+----------------------------------+
| | hit_strand | hit sequence strand |
| +-------------------------+----------------------------------+
| | query | query sequence |
| +-------------------------+----------------------------------+
| | query_start | local model alignment start |
| | | coordinate (mdl from) |
| +-------------------------+----------------------------------+
| | query_end | local model alignment end |
| | | coordinate (mdl to) |
+-----------------+-------------------------+----------------------------------+
infernal-tab
============
The Infernal plain text parser supports the standard cmsearch tabular output and
cmscan tabular output files formats 1, 2 and 3 (inferred automatically from the
header).
Rows marked with '*' denotes attributes not available in the default format.
+-----------------+-------------------------+----------------------------------+
| Object | Attribute | Value |
+=================+=========================+==================================+
| QueryResult | accession | query accession |
| +-------------------------+----------------------------------+
| | id | query sequence ID |
| +-------------------------+----------------------------------+
| | clan* | Rfam clan |
| +-------------------------+----------------------------------+
| | seq_len* | query sequence length |
+-----------------+-------------------------+----------------------------------+
| Hit | description | hit sequence description |
| +-------------------------+----------------------------------+
| | id | hit sequence ID |
| +-------------------------+----------------------------------+
| | accession | hit accession |
| +-------------------------+----------------------------------+
| | seq_len* | hit sequence length |
+-----------------+-------------------------+----------------------------------+
| HSP | evalue | hsp evalue |
| +-------------------------+----------------------------------+
| | bias | hsp bias |
| +-------------------------+----------------------------------+
| | bitscore | hsp score |
| +-------------------------+----------------------------------+
| | gc | gc fraction |
| +-------------------------+----------------------------------+
| | is_included | boolean, whether the hit of the |
| | | hsp is in the inclusion |
| | | threshold |
| +-------------------------+----------------------------------+
| | model | type of model used (cm or hmm) |
| +-------------------------+----------------------------------+
| | truncated | indicate if the hit is truncated |
| | | (5', 3' or both) or not |
| +-------------------------+----------------------------------+
| | pipeline_pass | pipeline pass at which the hit |
| | | was identified |
| +-------------------------+----------------------------------+
| | olp* | overlap status of this hit ('*', |
| | | '^', '$' or '=') |
| +-------------------------+----------------------------------+
| | anyidx* | index of the best scoring |
| | | overlapping hit (or none if |
| | | there are no overlap) |
| +-------------------------+----------------------------------+
| | afrct1* | fraction of this hit that |
| | | overlap with anyidx hit (or none |
| | | if there are no overlap) |
| +-------------------------+----------------------------------+
| | afrct2* | fraction of anyidx hit |
| | | with this hit (or none if there |
| | | are no overlap) |
| +-------------------------+----------------------------------+
| | winidx* | index of the best scoring hit |
| | | that overlaps with this hit that |
| | | is marked as '^' (or none if |
| | | there are no overlap) |
| +-------------------------+----------------------------------+
| | wfrct1* | fraction of this hit that |
| | | overlap with winidx hit (or none |
| | | if there are no overlap) |
| +-------------------------+----------------------------------+
| | wfrct2* | fraction of winidx hit |
| | | with this hit (or none if there |
| | | are no overlap) |
+-----------------+-------------------------+----------------------------------+
| HSPFragment | hit_start | local alignment sequence start |
| (also via HSP) | | coordinate (seq from) |
| +-------------------------+----------------------------------+
| | hit_end | local alignment sequence end |
| | | coordinate (seq to) |
| +-------------------------+----------------------------------+
| | hit_strand | hit sequence strand |
| +-------------------------+----------------------------------+
| | query_start | local model alignment start |
| | | coordinate (mdl from) |
| +-------------------------+----------------------------------+
| | query_end | local model alignment end |
| | | coordinate (mdl to) |
+-----------------+-------------------------+----------------------------------+
"""
from .infernal_tab import InfernalTabParser
from .infernal_tab import InfernalTabIndexer
from .infernal_text import InfernalTextParser
from .infernal_text import InfernalTextIndexer
# if not used as a module, run the doctest
if __name__ == "__main__":
from Bio._utils import run_doctest
run_doctest()
|