File: advanced.rst

package info (click to toggle)
python-pubchempy 1.0.4-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 388 kB
  • sloc: python: 1,377; makefile: 147
file content (115 lines) | stat: -rw-r--r-- 5,035 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
.. _advanced:

Advanced
========

.. _avoiding_timeouterror:

Avoiding TimeoutError
---------------------

If there are too many results for a request, you will receive a TimeoutError. There are different ways to avoid this,
depending on what type of request you are doing.

If retrieving full compound or substance records, instead request a list of cids or sids for your input, and then
request the full records for those identifiers individually or in small groups. For example::

	sids = get_sids('Aspirin', 'name')
	for sid in sids:
	    s = Substance.from_sid(sid)

When using the ``formula`` namespace or a ``searchtype``, you can also alternatively use the ``listkey_count`` and
``listkey_start`` keyword arguments to specify pagination. The ``listkey_count`` value specifies the number of
results per page, and the ``listkey_start`` value specifies which page to return. For example::

	get_compounds('CC', 'smiles', searchtype='substructure', listkey_count=5)
	get('C10H21N', 'formula', listkey_count=3, listkey_start=6)


Logging
-------

PubChemPy can generate logging statements if required. Just set the desired logging level::

    import logging
    logging.basicConfig(level=logging.DEBUG)

The logger is named 'pubchempy'. There is more information on logging in the `Python logging documentation`_.

Using behind a proxy
--------------------

When using PubChemPy behind a proxy, you may receive a ``URLError``::

    URLError: <urlopen error [Errno 65] No route to host>

A simple fix is to specify the proxy information via urllib. For Python 3::

    import urllib
    proxy_support = urllib.request.ProxyHandler({
        'http': 'http://<proxy.address>:<port>',
        'https': 'https://<proxy.address>:<port>'
    })
    opener = urllib.request.build_opener(proxy_support)
    urllib.request.install_opener(opener)

For Python 2::

    import urllib2
    proxy_support = urllib2.ProxyHandler({
        'http': 'http://<proxy.address>:<port>',
        'https': 'https://<proxy.address>:<port>'
    })
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)

Custom requests
---------------

If you wish to perform more complicated requests, you can use the ``request`` function. This is an extremely simple
wrapper around the REST API that allows you to construct any sort of request from a few parameters. The
`PUG REST Specification`_ has all the information you will need to formulate your requests.

The ``request`` function simply returns the exact response from the PubChem server as a string. This can be parsed in
different ways depending on the output format you choose. See the Python `json`_, `xml`_ and `csv`_ packages for more
information. Additionally, cheminformatics toolkits such as `Open Babel`_ and `RDKit`_ offer tools for handling SDF
files in Python.

The ``get`` function is very similar to the ``request`` function, except it handles ``listkey`` type responses
automatically for you. This makes things simpler, however it means you can't take advantage of using the same
``listkey`` repeatedly to obtain different types of information. See the `PUG REST specification`_ for more information
on how `listkey` responses work.

Summary of possible inputs
~~~~~~~~~~~~~~~~~~~~~~~~~~

::

    <identifier> = list of cid, sid, aid, source, inchikey, listkey; string of name, smiles, xref, inchi, sdf;
    <domain> = substance | compound | assay

    compound domain
    <namespace> = cid | name | smiles | inchi | sdf | inchikey | <structure search> | <xref> | listkey | formula
    <operation> = record | property/[comma-separated list of property tags] | synonyms | sids | cids | aids | assaysummary | classification

    substance domain
    <namespace> = sid | sourceid/<source name> | sourceall/<source name> | name | <xref> | listkey
    <operation> = record | synonyms | sids | cids | aids | assaysummary | classification

    assay domain
    <namespace> = aid | listkey | type/<assay type> | sourceall/<source name>
    <assay type> = all | confirmatory | doseresponse | onhold | panel | rnai | screening | summary
    <operation> = record | aids | sids | cids | description | targets/{ProteinGI, ProteinName, GeneID, GeneSymbol} | doseresponse/sid

    <structure search> = {substructure | superstructure | similarity | identity}/{smiles | inchi | sdf | cid}
    <xref> = xref/{RegistryID | RN | PubMedID | MMDBID | ProteinGI | NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID}
    <output> = XML | ASNT | ASNB | JSON | JSONP [ ?callback=<callback name> ] | SDF | CSV | PNG | TXT


.. _`Python logging documentation`: http://docs.python.org/2/howto/logging.html
.. _`json`: http://docs.python.org/2/library/json.html
.. _`xml`: http://docs.python.org/2/library/xml.etree.elementtree.html
.. _`csv`: http://docs.python.org/2/library/csv.html
.. _`PUG REST Specification`: https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html
.. _`Open Babel`: http://openbabel.org/docs/current/UseTheLibrary/Python.html
.. _`RDKit`: http://www.rdkit.org