File: README.md

package info (click to toggle)
geventhttpclient 2.3.5-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,456 kB
  • sloc: ansic: 16,557; python: 3,823; makefile: 24
file content (185 lines) | stat: -rw-r--r-- 6,798 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
[![GitHub Workflow CI Status](https://img.shields.io/github/actions/workflow/status/geventhttpclient/geventhttpclient/test.yml?branch=master&logo=github&style=flat)](https://github.com/geventhttpclient/geventhttpclient/actions)
[![PyPI](https://img.shields.io/pypi/v/geventhttpclient.svg?style=flat)](https://pypi.org/project/geventhttpclient/)
![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fgeventhttpclient%2Fgeventhttpclient%2Fmaster%2Fpyproject.toml)
![PyPI - Downloads](https://img.shields.io/pypi/dm/geventhttpclient)

# geventhttpclient

A [high performance](https://github.com/geventhttpclient/geventhttpclient/blob/master/README.md#Benchmarks),
concurrent HTTP client library for python using
[gevent](http://gevent.org).

`gevent.httplib` support for patching `http.client` was removed in
[gevent 1.0](https://github.com/surfly/gevent/commit/b45b83b1bc4de14e3c4859362825044b8e3df7d6),
`geventhttpclient` now provides that missing functionality.

`geventhttpclient` uses a fast [http parser](https://github.com/nodejs/llhttp),
written in C.

`geventhttpclient` has been specifically designed for high concurrency,
streaming and support HTTP 1.1 persistent connections. More generally it is
designed for efficiently pulling from REST APIs and streaming APIs
like Twitter's.

Safe SSL support is provided by default. `geventhttpclient` depends on
the certifi CA Bundle. This is the same CA Bundle which ships with the
Requests codebase, and is derived from Mozilla Firefox's canonical set.

Since version 2.3, `geventhttpclient` features a largely `requests`
compatible interface. It covers basic HTTP usage including cookie
management, form data encoding or decoding of compressed data,
but otherwise isn't as feature rich as the original `requests`. For
simple use-cases, it can serve as a drop-in replacement.

```python
import geventhttpclient as requests
requests.get("https://github.com").text
requests.post("http://httpbingo.org/post", data="asdfasd").json()

from geventhttpclient import Session
s = Session()
s.get("http://httpbingo.org/headers").json()
s.get("https://github.com").content
```

This interface builds on top of the lower level `HTTPClient`.

```python
from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

url = URL("http://gevent.org/")
client = HTTPClient(url.host)
response = client.get(url.request_uri)
response.status_code
body = response.read()
client.close()
```

## httplib compatibility and monkey patch

`geventhttpclient.httplib` module contains classes for drop in
replacement of `http.client` connection and response objects.
If you use http.client directly you can replace the `httplib` imports
by `geventhttpclient.httplib`.

```python
# from http.client import HTTPConnection
from geventhttpclient.httplib import HTTPConnection
```

If you use `httplib2`, `urllib` or `urllib2`; you can patch `httplib` to
use the wrappers from `geventhttpclient`. For `httplib2`, make sure you
patch before you import or the `super()` calls will fail.

```python
import geventhttpclient.httplib
geventhttpclient.httplib.patch()

import httplib2
```

## High Concurrency

`HTTPClient` has a connection pool built in and is greenlet safe by design.
You can use the same instance among several greenlets. It is the low
level building block of this library.

```python
import gevent.pool
import json

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL


# go to http://developers.facebook.com/tools/explorer and copy the access token
TOKEN = "<MY_DEV_TOKEN>"

url = URL("https://graph.facebook.com/me/friends", params={"access_token": TOKEN})

# setting the concurrency to 10 allow to create 10 connections and
# reuse them.
client = HTTPClient.from_url(url, concurrency=10)

response = client.get(url.request_uri)
assert response.status_code == 200

# response comply to the read protocol. It passes the stream to
# the json parser as it's being read.
data = json.load(response)["data"]

def print_friend_username(client, friend_id):
    friend_url = URL(f"/{friend_id}", params={"access_token": TOKEN})
    # the greenlet will block until a connection is available
    response = client.get(friend_url.request_uri)
    assert response.status_code == 200
    friend = json.load(response)
    if "username" in friend:
        print(f"{friend['username']}: {friend['name']}")
    else:
        print(f"{friend['name']} has no username.")

# allow to run 20 greenlet at a time, this is more than concurrency
# of the http client but isn't a problem since the client has its own
# connection pool.
pool = gevent.pool.Pool(20)
for item in data:
    friend_id = item["id"]
    pool.spawn(print_friend_username, client, friend_id)

pool.join()
client.close()
```

## Streaming

`geventhttpclient` supports streaming. Response objects have a `read(n)` and
`readline()` method that read the stream incrementally.
See [examples/twitter_streaming.py](https://github.com/geventhttpclient/geventhttpclient/blob/master/examples/twitter_streaming.py)
for pulling twitter stream API.

Here is an example on how to download a big file chunk by chunk to save memory:

```python
from geventhttpclient import HTTPClient, URL

url = URL("http://127.0.0.1:80/100.dat")
client = HTTPClient.from_url(url)
response = client.get(url.query_string)
assert response.status_code == 200

CHUNK_SIZE = 1024 * 16 # 16KB
with open("/tmp/100.dat", "w") as f:
    data = response.read(CHUNK_SIZE)
    while data:
        f.write(data)
        data = response.read(CHUNK_SIZE)
```

## Benchmarks

The benchmark runs 10000 `GET` requests against a local nginx server in the default
configuration with a concurrency of 10. See `benchmarks` folder. The requests per
second for a couple of popular clients is given in the table below. Please read
[benchmarks/README.md](https://github.com/geventhttpclient/geventhttpclient/blob/master/benchmarks/README.md)
for more details. Also note, [HTTPX](https://www.python-httpx.org/) is better be
used with `asyncio`, not `gevent`.

| HTTP Client        | RPS    |
|--------------------|--------|
| GeventHTTPClient   | 7268.9 |
| Httplib2 (patched) | 2323.9 |
| Urllib3            | 2242.5 |
| Requests           | 1046.1 |
| Httpx              | 770.3  |

*Linux(x86_64), Python 3.11.6 @ Intel i7-7560U*

## License

This package is distributed under the [MIT license](https://github.com/geventhttpclient/geventhttpclient/blob/master/LICENSE-MIT).
Previous versions of geventhttpclient used `http_parser.c`, which in turn was
based on `http/ngx_http_parse.c` from [NGINX](https://nginx.org), copyright Igor
Sysoev, Joyent, Inc., and other Node contributors. For more information, see
http://github.com/joyent/http-parser