File: __init__.py

package info (click to toggle)
python-biopython 1.73%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 57,852 kB
  • sloc: python: 169,977; xml: 97,539; ansic: 15,653; sql: 1,208; makefile: 159; sh: 63
file content (94 lines) | stat: -rw-r--r-- 5,073 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Copyright 2018 by Adhemar Zerlotini. All rights reserved.
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.
"""Bio.SearchIO support for InterProScan output formats.

This module adds support for parsing InterProScan XML output.
The InterProScan is available as a command line program or on
EMBL-EBI's web page.
Bio.SearchIO.InterproscanIO was tested on the following version:

- versions: 5.26-65.0 (interproscan-model-2.1.xsd)

More information about InterProScan are available through these links:
- Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/
- Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search
- Documentation: https://github.com/ebi-pf-team/interproscan/wiki


Supported format
================

Bio.SearchIO.InterproscanIO supports the following format:

- XML   - 'interproscan-xml' - parsing


interproscan-xml
================

The interproscan-xml parser follows the InterProScan XML described here:
https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats

+--------------+--------------------+----------------------------------------+
| Object       | Attribute          | XML Element                            |
+==============+====================+========================================+
| QueryResult  | target             | `InterPro`                             |
|              +--------------------+----------------------------------------+
|              | program            | `InterProScan`                         |
|              +--------------------+----------------------------------------+
|              | version            | `protein-matches.interproscan-version` |
+--------------+--------------------+----------------------------------------+
| Hit          | accession          | `signature.name`                       |
|              +--------------------+----------------------------------------+
|              | id                 | `signature.ac`                         |
|              +--------------------+----------------------------------------+
|              | description        | `signature.desc`                       |
|              +--------------------+----------------------------------------+
|              | dbxrefs            | `IPR:entry.ac`                         |
|              |                    | `go-xref.id`                           |
|              |                    | `pathway-xref.db:pathway-xref.id`      |
|              +--------------------+----------------------------------------+
|              | attributes         |                                        |
|              | ['Target']         | `*-match` / `*-location`               |
|              | ['Target version'] | `signature-library-release.library`    |
|              | ['Hit type']       | `signature-library-release.version`    |
+--------------+--------------------+----------------------------------------+
| HSP          | bitscore           | `*-location.score`                     |
|              +--------------------+----------------------------------------+
|              | evalue             | `*-location.evalue`                    |
+--------------+--------------------+----------------------------------------+
| HSPFragment  | query_start        | `*-location.start`                     |
| (also via    +--------------------+----------------------------------------+
| HSP)         | query_end          | `*-location.end`                       |
|              +--------------------+----------------------------------------+
|              | hit_start          | `*-location.hmm-start`                 |
|              +--------------------+----------------------------------------+
|              | hit_end            | `*-location.hmm-end`                   |
|              +--------------------+----------------------------------------+
|              | query              | `sequence`                             |
+--------------+--------------------+----------------------------------------+

InterProScan XML files may contain a match with multiple locations or multiple
matches to the same protein with a single location. In both cases, the match
is uniquely stored as a HIT object and the locations as HSP objects.

`HSP.*start == *start - 1` (Since every start position is 0-based in Biopython)
`HSP.aln_span ==  query-end - query-start`

The types of matches or locations (eg. hmmer3-match, hmmer3-location,
coils-match, panther-location) are stored in hit.attributes['Hit type'].
For instance, for every 'phobious-match', there will be a 'phobious-location'.
Therefore, Hit.type will store the string excluding '-match' or '-location'
('phobious', in this example).
"""

from .interproscan_xml import InterproscanXmlParser

# if not used as a module, run the doctest
if __name__ == "__main__":
    from Bio._utils import run_doctest
    run_doctest()