File: __init__.py

package info (click to toggle)
python-biopython 1.78%2Bdfsg-4
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 65,756 kB
  • sloc: python: 221,141; xml: 178,777; ansic: 13,369; sql: 1,208; makefile: 131; sh: 70
file content (96 lines) | stat: -rw-r--r-- 5,155 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Copyright 2018 by Adhemar Zerlotini. All rights reserved.
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.
"""Bio.SearchIO support for InterProScan output formats.

This module adds support for parsing InterProScan XML output.
The InterProScan is available as a command line program or on
EMBL-EBI's web page.
Bio.SearchIO.InterproscanIO was tested on the following version:

- versions: 5.26-65.0 (interproscan-model-2.1.xsd)

More information about InterProScan are available through these links:
- Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/
- Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search
- Documentation: https://github.com/ebi-pf-team/interproscan/wiki


Supported format
================

Bio.SearchIO.InterproscanIO supports the following format:

- XML   - 'interproscan-xml' - parsing


interproscan-xml
================

The interproscan-xml parser follows the InterProScan XML described here:
https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats

+--------------+--------------------+------------------------------------------+
| Object       | Attribute          | XML Element                              |
+==============+====================+==========================================+
| QueryResult  | target             | ``InterPro``                             |
|              +--------------------+------------------------------------------+
|              | program            | ``InterProScan``                         |
|              +--------------------+------------------------------------------+
|              | version            | ``protein-matches.interproscan-version`` |
+--------------+--------------------+------------------------------------------+
| Hit          | accession          | ``signature.name``                       |
|              +--------------------+------------------------------------------+
|              | id                 | ``signature.ac``                         |
|              +--------------------+------------------------------------------+
|              | description        | ``signature.desc``                       |
|              +--------------------+------------------------------------------+
|              | dbxrefs            | ``IPR:entry.ac``                         |
|              |                    | ``go-xref.id``                           |
|              |                    | ``pathway-xref.db:pathway-xref.id``      |
|              +--------------------+------------------------------------------+
|              | attributes         |                                          |
|              | ['Target']         | ``*-match`` / ``*-location``             |
|              | ['Target version'] | ``signature-library-release.library``    |
|              | ['Hit type']       | ``signature-library-release.version``    |
+--------------+--------------------+------------------------------------------+
| HSP          | bitscore           | ``*-location.score``                     |
|              +--------------------+------------------------------------------+
|              | evalue             | ``*-location.evalue``                    |
+--------------+--------------------+------------------------------------------+
| HSPFragment  | query_start        | ``*-location.start``                     |
| (also via    +--------------------+------------------------------------------+
| HSP)         | query_end          | ``*-location.end``                       |
|              +--------------------+------------------------------------------+
|              | hit_start          | ``*-location.hmm-start``                 |
|              +--------------------+------------------------------------------+
|              | hit_end            | ``*-location.hmm-end``                   |
|              +--------------------+------------------------------------------+
|              | query              | ``sequence``                             |
+--------------+--------------------+------------------------------------------+

InterProScan XML files may contain a match with multiple locations or multiple
matches to the same protein with a single location. In both cases, the match
is uniquely stored as a HIT object and the locations as HSP objects.

``HSP.*start == *start - 1`` (Since every start position is 0-based in Biopython)

``HSP.aln_span ==  query-end - query-start``

The types of matches or locations (eg. hmmer3-match, hmmer3-location,
coils-match, panther-location) are stored in hit.attributes['Hit type'].
For instance, for every 'phobious-match', there will be a 'phobious-location'.
Therefore, Hit.type will store the string excluding '-match' or '-location'
('phobious', in this example).
"""

from .interproscan_xml import InterproscanXmlParser

# if not used as a module, run the doctest
if __name__ == "__main__":
    from Bio._utils import run_doctest

    run_doctest()