1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
|
# Copyright 2018 by Adhemar Zerlotini. All rights reserved.
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.
"""Bio.SearchIO support for InterProScan output formats.
This module adds support for parsing InterProScan XML output.
The InterProScan is available as a command line program or on
EMBL-EBI's web page.
Bio.SearchIO.InterproscanIO was tested on the following version:
- versions: 5.26-65.0 (interproscan-model-2.1.xsd)
More information about InterProScan are available through these links:
- Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/
- Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search
- Documentation: https://github.com/ebi-pf-team/interproscan/wiki
Supported format
================
Bio.SearchIO.InterproscanIO supports the following format:
- XML - 'interproscan-xml' - parsing
interproscan-xml
================
The interproscan-xml parser follows the InterProScan XML described here:
https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats
+--------------+--------------------+----------------------------------------+
| Object | Attribute | XML Element |
+==============+====================+========================================+
| QueryResult | target | `InterPro` |
| +--------------------+----------------------------------------+
| | program | `InterProScan` |
| +--------------------+----------------------------------------+
| | version | `protein-matches.interproscan-version` |
+--------------+--------------------+----------------------------------------+
| Hit | accession | `signature.name` |
| +--------------------+----------------------------------------+
| | id | `signature.ac` |
| +--------------------+----------------------------------------+
| | description | `signature.desc` |
| +--------------------+----------------------------------------+
| | dbxrefs | `IPR:entry.ac` |
| | | `go-xref.id` |
| | | `pathway-xref.db:pathway-xref.id` |
| +--------------------+----------------------------------------+
| | attributes | |
| | ['Target'] | `*-match` / `*-location` |
| | ['Target version'] | `signature-library-release.library` |
| | ['Hit type'] | `signature-library-release.version` |
+--------------+--------------------+----------------------------------------+
| HSP | bitscore | `*-location.score` |
| +--------------------+----------------------------------------+
| | evalue | `*-location.evalue` |
+--------------+--------------------+----------------------------------------+
| HSPFragment | query_start | `*-location.start` |
| (also via +--------------------+----------------------------------------+
| HSP) | query_end | `*-location.end` |
| +--------------------+----------------------------------------+
| | hit_start | `*-location.hmm-start` |
| +--------------------+----------------------------------------+
| | hit_end | `*-location.hmm-end` |
| +--------------------+----------------------------------------+
| | query | `sequence` |
+--------------+--------------------+----------------------------------------+
InterProScan XML files may contain a match with multiple locations or multiple
matches to the same protein with a single location. In both cases, the match
is uniquely stored as a HIT object and the locations as HSP objects.
`HSP.*start == *start - 1` (Since every start position is 0-based in Biopython)
`HSP.aln_span == query-end - query-start`
The types of matches or locations (eg. hmmer3-match, hmmer3-location,
coils-match, panther-location) are stored in hit.attributes['Hit type'].
For instance, for every 'phobious-match', there will be a 'phobious-location'.
Therefore, Hit.type will store the string excluding '-match' or '-location'
('phobious', in this example).
"""
from .interproscan_xml import InterproscanXmlParser
# if not used as a module, run the doctest
if __name__ == "__main__":
from Bio._utils import run_doctest
run_doctest()
|