1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442
|
PycURL Quick Start
==================
Retrieving A Network Resource
-----------------------------
Once PycURL is installed we can perform network operations. The simplest
one is retrieving a resource by its URL. To issue a network request with
PycURL, the following steps are required:
1. Create a ``pycurl.Curl`` instance.
2. Use ``setopt`` to set options.
3. Call ``perform`` to perform the operation.
Here is how we can retrieve a network resource in Python 3::
import pycurl
import certifi
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()
body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))
This code is available as ``examples/quickstart/get_python3.py``.
For a Python 2 only example, see ``examples/quickstart/get_python2.py``.
For an example targeting Python 2 and 3, see ``examples/quickstart/get.py``.
PycURL does not provide storage for the network response - that is the
application's job. Therefore we must setup a buffer (in the form of a
StringIO object) and instruct PycURL to write to that buffer.
Most of the existing PycURL code uses WRITEFUNCTION instead of WRITEDATA
as follows::
c.setopt(c.WRITEFUNCTION, buffer.write)
While the WRITEFUNCTION idiom continues to work, it is now unnecessary.
As of PycURL 7.19.3 WRITEDATA accepts any Python object with a ``write``
method.
Working With HTTPS
------------------
Most web sites today use HTTPS which is HTTP over TLS/SSL. In order to
take advantage of security that HTTPS provides, PycURL needs to utilize
a *certificate bundle*. As certificates change over time PycURL does not
provide such a bundle; one may be supplied by your operating system, but
if not, consider using the `certifi`_ Python package::
import pycurl
import certifi
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://python.org/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()
body = buffer.getvalue()
# Body is a byte string.
# We have to know the encoding in order to print it to a text file
# such as standard output.
print(body.decode('iso-8859-1'))
This code is available as ``examples/quickstart/get_python3_https.py``.
For a Python 2 example, see ``examples/quickstart/get_python2_https.py``.
Troubleshooting
---------------
When things don't work as expected, use libcurl's ``VERBOSE`` option to
receive lots of debugging output pertaining to the request::
c.setopt(c.VERBOSE, True)
It is often helpful to compare verbose output from the program using PycURL
with that of ``curl`` command line tool when the latter is invoked with
``-v`` option::
curl -v http://pycurl.io/
Examining Response Headers
--------------------------
In reality we want to decode the response using the encoding specified by
the server rather than assuming an encoding. To do this we need to
examine the response headers::
import pycurl
import re
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
headers = {}
def header_function(header_line):
# HTTP standard specifies that headers are encoded in iso-8859-1.
# On Python 2, decoding step can be skipped.
# On Python 3, decoding step is required.
header_line = header_line.decode('iso-8859-1')
# Header lines include the first status line (HTTP/1.x ...).
# We are going to ignore all lines that don't have a colon in them.
# This will botch headers that are split on multiple lines...
if ':' not in header_line:
return
# Break the header line into header name and value.
name, value = header_line.split(':', 1)
# Remove whitespace that may be present.
# Header lines include the trailing newline, and there may be whitespace
# around the colon.
name = name.strip()
value = value.strip()
# Header names are case insensitive.
# Lowercase name here.
name = name.lower()
# Now we can actually record the header name and value.
# Note: this only works when headers are not duplicated, see below.
headers[name] = value
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io')
c.setopt(c.WRITEFUNCTION, buffer.write)
# Set our header function.
c.setopt(c.HEADERFUNCTION, header_function)
c.perform()
c.close()
# Figure out what encoding was sent with the response, if any.
# Check against lowercased header name.
encoding = None
if 'content-type' in headers:
content_type = headers['content-type'].lower()
match = re.search('charset=(\S+)', content_type)
if match:
encoding = match.group(1)
print('Decoding using %s' % encoding)
if encoding is None:
# Default encoding for HTML is iso-8859-1.
# Other content types may have different default encoding,
# or in case of binary data, may have no encoding at all.
encoding = 'iso-8859-1'
print('Assuming encoding is %s' % encoding)
body = buffer.getvalue()
# Decode using the encoding we figured out.
print(body.decode(encoding))
This code is available as ``examples/quickstart/response_headers.py``.
That was a lot of code for something very straightforward. Unfortunately,
as libcurl refrains from allocating memory for response data, it is on our
application to perform this grunt work.
One caveat with the above code is that if there are multiple headers
for the same name, such as Set-Cookie, only the last header value will be
stored. To record all values in multi-valued headers as a list the following
code can be used instead of ``headers[name] = value`` line::
if name in headers:
if isinstance(headers[name], list):
headers[name].append(value)
else:
headers[name] = [headers[name], value]
else:
headers[name] = value
Writing To A File
-----------------
Suppose we want to save response body to a file. This is actually easy
for a change::
import pycurl
# As long as the file is opened in binary mode, both Python 2 and Python 3
# can write response body to it without decoding.
with open('out.html', 'wb') as f:
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, f)
c.perform()
c.close()
This code is available as ``examples/quickstart/write_file.py``.
The important part is opening the file in binary mode - then response body
can be written bytewise without decoding or encoding steps.
Following Redirects
-------------------
By default libcurl, and PycURL, do not follow redirects. Changing this
behavior involves using ``setopt`` like so::
import pycurl
c = pycurl.Curl()
# Redirects to https://www.python.org/.
c.setopt(c.URL, 'http://www.python.org/')
# Follow redirect.
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()
This code is available as ``examples/quickstart/follow_redirect.py``.
As we did not set a write callback, the default libcurl and PycURL behavior
to write response body to standard output takes effect.
Setting Options
---------------
Following redirects is one option that libcurl provides. There are many more
such options, and they are documented on `curl_easy_setopt`_ page.
With very few exceptions, PycURL option names are derived from libcurl
option names by removing the ``CURLOPT_`` prefix. Thus, ``CURLOPT_URL``
becomes simply ``URL``.
.. _curl_easy_setopt: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
Examining Response
------------------
We already covered examining response headers. Other response information is
accessible via ``getinfo`` call as follows::
import pycurl
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://pycurl.io/')
c.setopt(c.WRITEDATA, buffer)
c.perform()
# HTTP response code, e.g. 200.
print('Status: %d' % c.getinfo(c.RESPONSE_CODE))
# Elapsed time for the transfer.
print('Time: %f' % c.getinfo(c.TOTAL_TIME))
# getinfo must be called before close.
c.close()
This code is available as ``examples/quickstart/response_info.py``.
Here we write the body to a buffer to avoid printing uninteresting output
to standard out.
Response information that libcurl exposes is documented on
`curl_easy_getinfo`_ page. With very few exceptions, PycURL constants
are derived from libcurl constants by removing the ``CURLINFO_`` prefix.
Thus, ``CURLINFO_RESPONSE_CODE`` becomes simply ``RESPONSE_CODE``.
.. _curl_easy_getinfo: https://curl.haxx.se/libcurl/c/curl_easy_getinfo.html
Sending Form Data
-----------------
To send form data, use ``POSTFIELDS`` option. Form data must be URL-encoded
beforehand::
import pycurl
try:
# python 3
from urllib.parse import urlencode
except ImportError:
# python 2
from urllib import urlencode
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
post_data = {'field': 'value'}
# Form data must be provided already urlencoded.
postfields = urlencode(post_data)
# Sets request method to POST,
# Content-Type header to application/x-www-form-urlencoded
# and data to send in request body.
c.setopt(c.POSTFIELDS, postfields)
c.perform()
c.close()
This code is available as ``examples/quickstart/form_post.py``.
``POSTFIELDS`` automatically sets HTTP request method to POST. Other request
methods can be specified via ``CUSTOMREQUEST`` option::
c.setopt(c.CUSTOMREQUEST, 'PATCH')
File Upload - Multipart POST
----------------------------
To replicate the behavior of file upload in an HTML form (specifically,
a multipart form),
use ``HTTPPOST`` option. Such an upload is performed with a ``POST`` request.
See the next example for how to upload a file with a ``PUT`` request.
If the data to be uploaded is located in a physical file,
use ``FORM_FILE``::
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
# upload the contents of this file
c.FORM_FILE, __file__,
)),
])
c.perform()
c.close()
This code is available as ``examples/quickstart/file_upload_real.py``.
``libcurl`` provides a number of options to tweak file uploads and multipart
form submissions in general. These are documented on `curl_formadd page`_.
For example, to set a different filename and content type::
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
# upload the contents of this file
c.FORM_FILE, __file__,
# specify a different file name for the upload
c.FORM_FILENAME, 'helloworld.py',
# specify a different content type
c.FORM_CONTENTTYPE, 'application/x-python',
)),
])
c.perform()
c.close()
This code is available as ``examples/quickstart/file_upload_real_fancy.py``.
If the file data is in memory, use ``BUFFER``/``BUFFERPTR`` as follows::
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(c.HTTPPOST, [
('fileupload', (
c.FORM_BUFFER, 'readme.txt',
c.FORM_BUFFERPTR, 'This is a fancy readme file',
)),
])
c.perform()
c.close()
This code is available as ``examples/quickstart/file_upload_buffer.py``.
File Upload - PUT
-----------------
A file can also be uploaded in request body, via a ``PUT`` request.
Here is how this can be arranged with a physical file::
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')
c.setopt(c.UPLOAD, 1)
file = open('body.json')
c.setopt(c.READDATA, file)
c.perform()
c.close()
# File must be kept open while Curl object is using it
file.close()
This code is available as ``examples/quickstart/put_file.py``.
And if the data is stored in a buffer::
import pycurl
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/put')
c.setopt(c.UPLOAD, 1)
data = '{"json":true}'
# READDATA requires an IO-like object; a string is not accepted
# encode() is necessary for Python 3
buffer = BytesIO(data.encode('utf-8'))
c.setopt(c.READDATA, buffer)
c.perform()
c.close()
This code is available as ``examples/quickstart/put_buffer.py``.
.. _curl_formadd page: https://curl.haxx.se/libcurl/c/curl_formadd.html
.. _certifi: https://pypi.org/project/certifi/
|